is in VF or SRIOV mode typically in a virtual machine environment.
Submitted by: Sepherosa Ziehau <sephe@dragonflybsd.org>
MFC after: 3 days
Sponsored by: Mellanox Technologies
How network VF works with hn(4) on Hyper-V in transparent mode:
- Each network VF has a cooresponding hn(4).
- The network VF and the it's cooresponding hn(4) have the same hardware
address.
- Once the network VF is attached, the cooresponding hn(4) waits several
seconds to make sure that the network VF attach routing completes, then:
o Set the intersection of the network VF's if_capabilities and the
cooresponding hn(4)'s if_capabilities to the cooresponding hn(4)'s
if_capabilities. And adjust the cooresponding hn(4) if_capable and
if_hwassist accordingly. (*)
o Make sure that the cooresponding hn(4)'s TSO parameters meet the
constraints posed by both the network VF and the cooresponding hn(4).
(*)
o The network VF's if_input is overridden. The overriding if_input
changes the input packet's rcvif to the cooreponding hn(4). The
network layers are tricked into thinking that all packets are
neceived by the cooresponding hn(4).
o If the cooresponding hn(4) was brought up, bring up the network VF.
The transmission dispatched to the cooresponding hn(4) are
redispatched to the network VF.
o Bringing down the cooresponding hn(4) also brings down the network
VF.
o All IOCTLs issued to the cooresponding hn(4) are pass-through'ed to
the network VF; the cooresponding hn(4) changes its internal state
if necessary.
o The media status of the cooresponding hn(4) solely relies on the
network VF.
o If there are multicast filters on the cooresponding hn(4), allmulti
will be enabled on the network VF. (**)
- Once the network VF is detached. Undo all damages did to the
cooresponding hn(4) in the above item.
NOTE:
No operation should be issued directly to the network VF, if the
network VF transparent mode is enabled. The network VF transparent mode
can be enabled by setting tunable hw.hn.vf_transparent to 1. The network
VF transparent mode is _not_ enabled by default, as of this commit.
The benefit of the network VF transparent mode is that the network VF
attachment and detachment are transparent to all network layers; e.g. live
migration detaches and reattaches the network VF.
The major drawbacks of the network VF transparent mode:
- The netmap(4) support is lost, even if the VF supports it.
- ALTQ does not work, since if_start method cannot be properly supported.
(*)
These decisions were made so that things will not be messed up too much
during the transition period.
(**)
This does _not_ need to go through the fancy multicast filter management
stuffs like what vlan(4) has, at least currently:
- As of this write, multicast does not work in Azure.
- As of this write, multicast packets go through the cooresponding hn(4).
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11803
Before this patch function ofw_bus_find_compatible was using
memory allocations in order to find compatible node and the property's
length. This way there was always a suited buffer for property,
however this approach had also disadvantages - ofw_bus_find_compatible
couldn't be used when malloc is not available, e.g. during fdt fixup stage.
In order to remove the usage limitation of ofw_bus_find_compatible(),
this patch modifies the function to use ofw_bus_node_is_compatible()
(instead of the one without _int suffix), which uses a fixed
buffer on stack instead of dynamic allocations.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11880
Sometimes it's convenient to provide fixup to many boards
that use the same SoC family (eg. Marvell Armada 38x).
Instead of putting multiple entries in fdt_fixup_table,
use one entry which refers to all boards with given SoC.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11878
This patch makes possible to boot with up to 8 ranges in soc.
Dynamic allocation cannot be used, because ftd_get_ranges
function is called early, when malloc is not available.
Change is required for the alignment of Marvell Armada 38x
device trees present in sys/gnu/dts/arm - originally
the platform has 6 entries in simple-bus 'ranges'.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: manu, nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11876
libefivar expects opening /dev/efi to indicate if the we can make efi
runtime calls. With a null routine, it was always succeeding leading
efi_variables_supported() to return the wrong value. Only succeed if
we have an efi_runtime table. Also, while I'm hear, out of an
abundance of caution, add a likely redundant check to make sure
efi_systbl is not NULL before dereferencing it. I know it can't be
NULL if efi_cfgtbl is non-NULL, but the compiler doesn't.
If mly_user_command fails to allocate a command slot it jumps to an 'out'
label used for error handling. The error handling code checks for a data
buffer in 'mc->mc_data' to free before checking if 'mc' is NULL. Fix by
just returning directly if we fail to allocate a command and only using
the 'out' label for subsequent errors when there is actual cleanup to
perform.
PR: 217747
Reported by: PVS-Studio
Reviewed by: emaste
MFC after: 1 week
Also improve the formatting of the corresponding KASSERT message.
Based on the submission by: Svyatoslav <razmyslov@viva64.com>
Found by: PVS-Studio
PR: 217741
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
input errors in the mlx5en(4) driver. This improves the sysadmin view of
physical port errors.
Submitted by: gallatin@
MFC after: 1 week
Sponsored by: Mellanox Technologies
The m_defrag() function can only defrag mbuf chains which have a valid
mbuf packet header. In r291699 when the mlx4en(4) driver was converted
into using BUSDMA(9), the call to m_defrag() was moved after the part
of the transmit routine which strips the header from the mbuf chain.
This effectivly disabled the mbuf defrag mechanism and such packets
simply got dropped.
This patch removes the stripping of mbufs from a chain and loads all
mbufs using busdma. If busdma finds there are no segments, unload
the DMA map and free the mbuf right away, because that means all
data in the mbuf has been inlined in the TX ring. Else proceed
as usual.
Add a per-ring rounter for the number of defrag attempts and
make sure the oversized_packets counter gets zeroed while at it.
The counters are per-ring to avoid excessive cache misses in the
TX path.
Submitted by: mjoras@
Differential Revision: https://reviews.freebsd.org/D11683
MFC after: 1 week
Sponsored by: Mellanox Technologies
This also involves adding a quirk table as TRIM is broken for some
Kingston eMMC devices, though. Compared to ERASE (declared "legacy"
in the eMMC specification v5.1), TRIM has the advantage of operating
on write sectors rather than on erase sectors, which typically are
of a much larger size. Thus, employing TRIM, we don't need to fiddle
with coalescing BIO_DELETE requests that are also of (write) sector
units into erase sectors, which might not even add up in all cases.
- For some SanDisk iNAND devices, the CMD38 argument, e. g. ERASE,
TRIM etc., has to be specified via EXT_CSD[113], which now is also
handled via a quirk.
- My initial understanding was that for eMMC partitions, the granularity
should be used as erase sector size, e. g. 128 KB for boot partitions.
However, rereading the relevant parts of the eMMC specification v5.1,
this isn't actually correct. So drop the code which used partition
granularities for delmaxsize and stripesize. For the most part, this
change is a NOP, though, because a) for ERASE, mmcsd_delete() used
the erase sector size unconditionally for all partitions anyway and
b) g_disk_limit() doesn't actually take the stripesize into account.
- Take some more advantage of mmcsd_errmsg() in mmcsd(4) for making
error codes human readable.
o Replace __riscv64 with (__riscv && __riscv_xlen == 64)
This is required to support new GCC 7.1 compiler.
This is compatible with current GCC 6.1 compiler.
RISC-V is extensible ISA and the idea here is to have built-in define
per each extension, so together with __riscv we will have some subset
of these as well (depending on -march string passed to compiler):
__riscv_compressed
__riscv_atomic
__riscv_mul
__riscv_div
__riscv_muldiv
__riscv_fdiv
__riscv_fsqrt
__riscv_float_abi_soft
__riscv_float_abi_single
__riscv_float_abi_double
__riscv_cmodel_medlow
__riscv_cmodel_medany
__riscv_cmodel_pic
__riscv_xlen
Reviewed by: ngie
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11901
precision. These timers are already displayed in microseconds in the
sysctl MIB. Add variables to track these tunables while here.
MFC after: 3 days
Sponsored by: Chelsio Communications
Introduce hw.nvme.use_nvd tunable. This tunable allows both nvd and
nda to be installed in the kernel, while allowing only one of them to
create devices. This is an all-or-nothing setting, and you can't
change it after boot-time. However, it will allow easier A/B testing.
Differential Revision: https://reviews.freebsd.org/D11825
debug (cudbg) code, hooked up to the main driver via an ioctl.
The ioctl can be used to collect the chip's internal state in a
compressed dump file. These dumps can be decoded with the "view"
component of cudbg.
Obtained from: Chelsio Communications
MFC after: 2 months
Sponsored by: Chelsio Communications
Code inspection reveals the busdma unload and free functions
do not write to the belonging dma tag and does not need to be
serialized. This allows mlx5_fwp_free() to be called from
software interrupt context.
MFC after: 3 days
Sponsored by: Mellanox Technologies
- Store the symbol table contents in an anonymous swap-backed object. Have
mmap(/dev/ksyms) map that object, and stop mapping the symbol table into
the calling process in ksyms_open(). Previously we would cache a pointer
to the pmap of the opening process, and mmap(/dev/ksyms) would create a
mapping using the physical address found by a pmap lookup at the initial
mapping address. However, this assumes that the cached pmap is valid,
which may not be the case. [1]
- Remove the ksyms ioctl interface. It appears to have been added to work
around a limitation in libelf that no longer exists; see r321842.
Moreover, the interface is difficult to support and isn't present in
illumos. Since ksyms was added specifically to support lockstat(1), it
is expected that this removal won't have any real impact.
- Simplify ksyms_read() to avoid unnecessary copying.
- Don't call the device handle destructor if we fail to capture a snapshot
of the kernel's symbol table. devfs will do that for us.
Reported by: Ilja van Sprundel <ivansprundel@ioactive.com> [1]
Reviewed by: kib (previous revision)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11789
"br" or "bridge" where - according to the terminology outlined in
comments of bridge.h and mmcbr_if.m around since their addition in
r163516 - the bus is meant and used instead. Some of these instances
are also rather old, while those in e. g. mmc_subr.c are as new as
r315430 and were caused by choosing mmc_wait_for_request(), i. e. the
one pre-r315430 outliner existing in mmc.c, as template for function
parameters in mmc_subr.c inadvertently. This correction translates to
renaming "brdev" to "busdev" and "mmcbr" to "mmcbus" respectively as
appropriate.
While at it, also rename "reqdev" to just "dev" in mmc_subr.[c,h]
for consistency with was already used in mmm.c pre-r315430, again
modulo mmc_wait_for_request() that is.
- Remove comment lines from bridge.h incorrectly suggesting that there
would be a MMC bridge base class driver.
- Update comments in bridge.h regarding the star topology of SD and SDIO;
since version 3.00 of the SDHCI specification, for eSD and eSDIO bus
topologies are actually possible in form of so called "shared buses"
(in some subcontext later on renamed to "embedded" buses).
According to the PCI Local Specification rev. 3.0 in case of a 64-bit
BAR both the low and the high parts of the register should be set to
~0 before attempting to read back the size.
So far I have found no single device that has problems with the
previous approach, but I think it's better to stay on the safe size.
This commit should not introduce any functional change.
MFC after: 3 weeks
Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D11750
This prepares for the upcoming transparent VF support.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11708
Don't enable the oscillator when it is found to be stopped at init time,
just let the first setting of valid time start it. But still report a dead
battery if it's stopped at init time.
Don't force the chip into 24hr mode, just cope with whatever mode it is
already in.
Schedule the clock_settime() callbacks to align the RTC clock to top of
second when setting it.
subr_rtc code, switch from CLOCKF_SETTIME_NO_TS to CLOCKF_SETTIME_NO_ADJ
so that we get fed a timestamp, but it's not adjusted to compensate for
inaccuracy in setting time.
Since device can pass multiple frames in a single payload temporary
Rx buffer was big enough to hold all of them; now the driver can
concatenate a single frame from multiple payloads.
The Rx buffer size may be configured via tunable (dev.rtwn.%d.rx_buf_size).
Tested with:
- rtl8188cus, rtl8188eu and rtl8821au (STA mode).
- (by kevlo) rtl8192cu and rtl8188eu.
PR: 218527
Reviewed by: kevlo
Differential Revision: https://reviews.freebsd.org/D11705