own region in the TCAM starting with T6, unlike previous chips where
they were in the same region as normal filters.
These filters "hit" before anything else in the LE's lookup. The exact
order is:
a) High priority filters
b) TOE's active region (TCAM and/or hash)
c) Servers (TOE hw listeners)
d) Normal filters
MFC after: 1 week
Sponsored by: Chelsio Communications
traffic class for rate limiting.
Add experimental knobs that allow the user to specify a default pktsize
and burstsize for traffic classes associated with a port:
dev.<ifname>.<instance>.tc.pktsize
dev.<ifname>.<instance>.tc.burstsize
Sponsored by: Chelsio Communications
The VGA "text mode" buffer has a pair of bytes for each character: One
byte for the character symbol, and an "attribute" byte encoding the
foreground and background colours. When updating the screen, we were
writing these two bytes separately.
On some virtualized systems, every write results in a glyph being redrawn
into a (graphical) virtual screen; writing these two bytes separately
results in twice as much work being done to draw characters, whereas if
we perform a single 16-bit write instead, the character only needs to be
redrawn once.
On an EC2 c5.4xlarge instance, this change cuts 1.30s from the kernel boot,
speeding it up from 8.90s to 7.60s.
MFC after: 1 week
them display the current value of the bitfield rather than the fixed
value that was provided when the sysctl node was created.
MFC after: 1 week
Sponsored by: Chelsio Communications
To compile this driver with evdev support enabled, place
following lines into the kernel configuration file:
options EVDEV_SUPPORT
device evdev
Note: Native and evdev modes are mutually exclusive.
Reviewed by: gonzo, wblock (docs)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11156
We didn't allowed a divider register value of 0 which can exists and
also didn't wrote the value but the divider, which result of a wrong
frequency to be selected
efi_enter here was needed because efi_runtime dereference causes a fault
outside of EFI context, due to runtime table living in runtime service
space. This may cause problems early in boot, though, so instead access it
by converting paddr to KVA for access.
While here, remove the other direct PHYS_TO_DMAP calls and the explicit DMAP
requirement from efidev.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D16591
Summary:
Base gcc fails to compile `sys/dev/hyperv/pcib/vmbus_pcib.c` for i386,
with the following -Werror warnings:
cc1: warnings being treated as errors
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'new_pcichild_device':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:567: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'vmbus_pcib_on_channel_callback':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:940: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_pci_protocol_negotiation':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1012: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_pci_enter_d0':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1073: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_send_resources_allocated':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1125: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'vmbus_pcib_map_msi':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1730: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
This is because on i386, several casts from `uint64_t` to a pointer
reduce the value from 64 bit to 32 bit.
For gcc, this can be fixed by an intermediate cast to uintptr_t. Note
that I am assuming the incoming values will always fit into 32 bit!
Differential Revision: https://reviews.freebsd.org/D15753
MFC after: 3 days
Usbhid's hid_report_size() calculates integral size of all reports of given
kind found in the HID descriptor rather then exact size of report with given
ID as its userland counterpart does. As all input data processed by the
driver is located within the same report, calculate required driver's buffer
size with userland version, imported in one of the previous commits.
This allows us to skip zeroing of buffer on processing of each report.
While here do some minor refactoring.
MFC after: 2 weeks
if present to enable some devices like WaveShare touchscreens. Unlike
Windows we discard content of the blob. We try mimic Windows driver
behaviour from the USB device point of view.
Submitted by: glebius (initial version)
rather than from HID descriptor to match Microsoft documentation.
Fall back to HID descriptor provided value if 'Get Report' request failed.
MFC after: 2 weeks
Summary:
Some architectures, in this case powerpc64, need explicit synchronization
barriers vs device accesses.
Prior to this change, when running 'make buildworld -j72' on a 18-core
(72-thread) POWER9, I would see controller resets often. With this change, I
don't see these resets messages, though another tester still does, for yet to be
determined reasons, so this may not be a complete fix. Additionally, I see a
~5-10% speed up in buildworld times, likely due to not needing to reset the
controller.
Reviewed By: jimharris
Differential Revision: https://reviews.freebsd.org/D16570
- Properly handle snprintf return value for truncation and avoid
overflowing the later write with the bogus length.
- Increase the msgbufr size to handle a rename of 2 full files.
The larger allocation causes a slight performance hit which will be mitigated
in the future. A rewrite with sbufs will likely be done as well.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
MFC after: 2 weeks
Approved by: so (gtetlow)
Reviewed by: kib
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D16098
This actually makes the rights requirements for accessing PCI config
space and BARs using /dev/pci same. Since unchanged /dev/pci mode
only allows write open for root, default configuration de-facto limits
the BAR read to root only. In particular, state-changing reads of the
registers are limited to root.
Discussed with: se
Suggested and reviewed by: jhb (kernel part)
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Differential revision: https://reviews.freebsd.org/D16580
- Ignore any type of TID where the start/end values are not in the
correct order. There are situations where the firmware isn't able to
reserve room for the number requested in the config file but doesn't
report a failure during configuration and instead sets end <= start.
- Track start/end in tid_tab and remove some redundant copies from
adapter->params.
- Move all the start/end and other read-only parameters to a quiet part
of tid_tab, away from the tid locks.
MFC after: 1 week
Sponsored by: Chelsio Communications
initializing the softc for a per-flow rate limiter. The limit happens
to be the same for both and the existing code worked by accident for
common configurations.
Reported by: gallatin@
Sponsored by: Chelsio Communications
Add the ioctl PCIOCBARMMAP on /dev/pci to conveniently create
userspace mapping of a PCI device BAR. This is enormously superior to
read the BAR value with PCIOCREAD and then try to mmap /dev/mem, and
should allow to automatically activate the mapped BARs when needed in
future.
Current implementation creates new sg pager for each user mmap
request. If the pointer (and reference) to a managed device pager is
stored in pci_map, we would be able to revoke all mappings on the BAR
deactivation or relocation. This is related to the unimplemented BAR
activation on mmap, and is postponed for the future.
Discussed with: imp, jhb
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D15583
Make sure both sides of the DMA buffer memory accesses for the CORB and RIRB
(control buffers) in snd_hda (device and CPU) can see coherent memory. This
is needed on weakly ordered architectures including PowerPC and ARM. Patch
originally by mmel, with small changes.
This does not cover the data path of snd_hda. We don't have sync operations
for in-progress DMA buffers, to sync ranges of a map.
Reviewed By: mmel
Differential Revision: https://reviews.freebsd.org/D16517
The jedec_ts(4) driver has been marked as deprecated in stable/11, and is
now being removed from -HEAD. Add a notice in UPDATING, and update the few
remaining references (regarding jedec_dimm(4)'s compatibility and history)
to reflect the fact that jedec_ts(4) is now deleted.
Reviewed by: avg
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D16537
Nominal Mhz is either expressed via the clock-frequency property
or can be get via the clock property that holds the cpu clock.
Add support for the later.
Reviewed by: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D16346
The nvmem interface helps provider of nvmem data to expose themselves to consumer.
NVMEM is generally present on some embedded board in a form of eeprom or fuses.
The nvmem api are helpers for consumer to read/write the cell data from a provider.
Differential Revision: https://reviews.freebsd.org/D16419
The buffer descriptor list entries should be in little endian format. Byte swap
them on BE. This is the last piece of the puzzle for snd_hda(4) to work on
PowerPC.
This calls into the Arm Trusted Firmware to enable and disable the
workaround for the Speculative Store Bypass Disable (SSBD) issue, also
known as Spectre Variant 4.
As this may have a large performance overhead, and how exploitable SSBD is
is unknown we follow the Linux lead of allowing the administrator to select
between always on, always off, or only enabled in the kernel, with the
latter being the default.
PR: 228955
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15819
The CORB and RIRB buffers exist in DMA memory, but the device reads them as
little-endian only. Read and write as LE into the DMA memory block, to work on
BE platforms.
The latter matches the rest of the tree better [0]. The UPDATING entry has
been updated to reflect this, and the new tunable is now documented in
loader(8) [1].
Reported by: imp [0], Shawn Webb [1]
Leading up to enabling EFIRT in GENERIC, allow runtime services to be
disabled with a new tunable: efi.rt_disabled. This makes it so that EFIRT
can be disabled easily in case we run into some buggy UEFI implementation
and fail to boot.
Discussed with: imp, kib
MFC after: 1 week
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.
Our kernel currently defines two-argument versions of timespecadd and
timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions. Solaris also defines a three-argument version, but
only in its kernel. This revision changes our definition to match the
common three-argument version.
Bump _FreeBSD_version due to the breaking KPI change.
Discussed with: cem, jilles, ian, bde
Differential Revision: https://reviews.freebsd.org/D14725
It's easy to confuse the error code as naked it looks decimal (EINVAL is
reported as error 16, instead of error 22, so first reading looks like EBUSY).
This fixes the panic caused by deadlocking when grant-table free
callbacks are used.
The cause of the recursion is: check_free_callbacks() is always called
with the lock gnttab_list_lock held. In turn the callback function is
also called with the lock held. Then when the client uses any of the grant
reference methods which also attempt the lock the gnttab_list_lock
mutex from within the free callback a deadlock happens.
Fix this by making the gnttab_list_lock recursive.
Submitted by: Pratyush Yadav <pratyush@freebsd.org>
Differential Revision: https://reviews.freebsd.org/D16505
If gnttab_grant_foreign_access() fails for any of the indirection
pages, the code breaks out of both the loops without freeing the local
variable indirectpages, causing a memory leak.
Submitted by: Pratyush Yadav <pratyush@freebsd.org>
Differential Review: https://reviews.freebsd.org/D16136
Length is an unsigned integer, so checking against < 0 doesn't make
sense. While there also make clear that a length of 0 always succeeds.
Submitted by: Pratyush Yadav <pratyush@freebsd.org>
Differential Review: https://reviews.freebsd.org/D16045
Ifuncs selectors dispatch copyin(9) family to the suitable variant, to
set rflags.AC around userspace access. Rflags.AC bit is cleared in
all kernel entry points unconditionally even on machines not
supporting SMAP.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D13838
r336773 removed all things xscale. However, some things xscale are
really armv5. Revert that entirely. A more modest removal will follow.
Noticed by: andrew@
The OLD XSCALE stuff hasn't been useful in a while. The original
committer (cognet@) was the only one that had boards for it. He's
blessed this removal. Newer XSCALE (GUMSTIX) is for hardware that's
quite old. After discussion on arm@, it was clear there was no support
for keeping it.
Differential Review: https://reviews.freebsd.org/D16313
The last known robust version of this code base was FreeBSD 8.2. There
are no users of this on current, and all users of it have abandoned
this platform or are in legacy mode with a prior version of FreeBSD.
All known users on arm@ approved this removal, and there were no
objections.
Differential Revision: https://reviews.freebsd.org/D16312
There were some bits that were being set in cmd_flags (a field of AHCI's
command list structure) after cmd_flags was converted to little endian.
On a big endian host, such as PowerPC, this would set the wrong bits.
This was preventing AHCI driver from working on these hosts.
Reviewed by: jhibbits
Approved by: jhibbits (mentor)
Start in "class" instead of "flow" mode. This eliminates the need to
specify an MTU, which is not available that early anyway. It also
allows the user to manually configure ch-rl rate limiting after attach.
This used to fail because ch-rl isn't supported if cl-rl "flow" mode is
configured.
Set all traffic classes to 1Gbps during initialization. The goal is to
start off with _any_ valid configuration and 1Gbps works even for
gigabit cards.
Sponsored by: Chelsio Communications
jedec_dimm(4) is a superset of the functionality of jedec_ts(4). Mark
jedec_ts(4) as removed in FreeBSD 12, and include a pointer to the migration
instructions in the jedec_dimm(4) manpage, in both the jedec_ts(4) code and
the jedec_ts(4) manpage. Add a note to the jedec_dimm(4) manpage about the
fact that it is a superset of jedec_ts(4).
This change will be MFCed to stable/11 and stable/10; the followup change
to actually remove jedec_ts(4) from -HEAD will not.
Reviewed by: avg
MFC after: 1 week
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D16412
This change allows one to set the busy_detect flag
required by the synopsys UART at the loader prompt.
This is needed by the EPYC 3000 SoC.
This will give users a working console up to the point where getty is required:
hw.uart.console="mm:0xfedc9000,rs:2,bd:1"
Reviewed by: imp
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D16399
- Don't bother calling if_setbaudrate(9) as iflib_link_state_change(9)
takes care of that,
- correctly check for E1000_CTRL_EXT_LINK_MODE_GMII in E1000_CTRL_EXT [1],
- properly convert the uint16_t link_speed to a uint64_t baudrate by
using IF_Mbps() which contains an appropriate cast [2],
- remove the duplicate link down announcement when bootverbose isn't
zero and bring the remaining one in line with the other link state
messages.
o Remove a dead store to rid in em_if_msix_intr_assign(). [3]
o Or in the DMA coalescing Rx threshold so the other bits set in E1000_DMACR
remain intact as intended in igb_init_dmac(). [4]
Reported by: Coverity
CID: 1378464 [1], 1368765 [2], 1381681 [3], 1304929 [4]
This fixes usage/report size calculation of Microsoft`s "Touch Hardware
Quality Assurance" certificate blob found in many touchscreens.
While here, join several "c->flags = dval" lines in to single line.
Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D16357
/chosen/stdout-path is a string, not ihandle. Treat it as such.
With this, ofwfb now starts correctly on a POWER9 system when launched from
the local console (not serial).
The FDT implementation of OF_instance_to_package() backend checks the
cross-reference to get the node. On failure, this returns the input handle
unchanged. In the case of ofwfb attachment, if /chosen/stdout property does not
exist, sc->sc_handle is either garbage or 0, which then gets propagated to node.
This will prevent "screen" from being used, resulting in not properly attaching.
Correct this by matching the code in ofwfb_probe().
Oppv2 add more flexibility on regulator value for the core voltage amongst
other new thing.
For now only shared opp table is supported as I don't have hardware with
non-shared opp table.
Tested-On: OrangePi One (with oppv1 and oppv2)
Tested-On: Pine64-LTS
When booted as PVHv2, there's no ACPI CPU object, so attach the PV CPU
device in order to take it's place.
This is required in case some device or driver tries to poke at the
PCPU device field.
Sponsored by: Citrix Systems R&D
HYPERVISOR_start_info is only available to PV and PVHv1 guests, HVM
and PVHv2 guests get this data from HVM parameters that are fetched
using a hypercall.
Instead provide a set of helper functions that should be used to fetch
this data. The helper functions have different implementations
depending on whether FreeBSD is running as PVHv1 or HVM/PVHv2 guest
type.
This helps to cleanup generic Xen code by removing quite a lot of
xen_pv_domain and xen_hvm_domain macro usages.
Sponsored by: Citrix Systems R&D
found in some MacBook Pro.
PR: 229727
Reported by: Stephan Neuhaus <sten@artdecode.de> and others
Tested by: Stephan Neuhaus <sten@artdecode.de>
Approved by: mav (mentor)
MFC after: 1 month
Query the minimal inline mode supported by the card.
When creating a send queue, cache the queried mode and optimize the transmit
if no inlining is required. In this case, we can avoid touching the headers
cache line and avoid dirtying several more lines by copying headers into
the send WQEs. Also, if no inline headers are used, hardware assists in
the VLAN tag framing.
Submitted by: kib@, slavash@
MFC after: 1 week
Sponsored by: Mellanox Technologies
Issue: IO fails immediately after doing port-toggle.
Fix: Added LDT(Device Lost Timer)- we wait a specific period of time prior to telling the OS about lost device.
Approved by: ken, mav
MFC after: 3 days
Differential Revision: D16196
Track session objects in the framework, and pass handles between the
framework (OCF), consumers, and drivers. Avoid redundancy and complexity in
individual drivers by allocating session memory in the framework and
providing it to drivers in ::newsession().
Session handles are no longer integers with information encoded in various
high bits. Use of the CRYPTO_SESID2FOO() macros should be replaced with the
appropriate crypto_ses2foo() function on the opaque session handle.
Convert OCF drivers (in particular, cryptosoft, as well as myriad others) to
the opaque handle interface. Discard existing session tracking as much as
possible (quick pass). There may be additional code ripe for deletion.
Convert OCF consumers (ipsec, geom_eli, krb5, cryptodev) to handle-style
interface. The conversion is largely mechnical.
The change is documented in crypto.9.
Inspired by
https://lists.freebsd.org/pipermail/freebsd-arch/2018-January/018835.html .
No objection from: ae (ipsec portion)
Reported by: jhb
1. Fix taskqueues drain/free to fix panic seen when interface is being
bought down and in parallel asynchronous link events happening.
2. Fix bxe_ifmedia_status()
Submitted by:Vaishali.Kulkarni@cavium.com and Anand.Khoje@cavium.com
MFC after:5 days
Remove all the big-endian arm architectures (ixp425 and ixp435)
support in the kernel and associated drivers.
Differential Revision: https://reviews.freebsd.org/D16257
For setups having a large amount of PCI devices, it makes sense to limit the
number of MSIX vectors per PCI device, in order to avoid running out of IRQ
vectors.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The scatter list is formed by the chunks of MCLBYTES each, and larger
than default packets are returned to the stack as the mbuf chain.
Submitted by: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies
To access the data, set sysctl dev.mce.N.conf.debug_stats to 1.
This enables the sysctl node dev.mce.N.hw_ctx_debug. Its content is
the mapping of each channel' number to used receive queue and associated
completion queue, set of the transmit queues numbers and corresponding
completion queues.
Trimmed example output:
channel 30 rq 188 cq 1085
channel 30 tc 0 sq 187 cq 1084
channel 31 rq 191 cq 1087
channel 31 tc 0 sq 190 cq 1086
MFC after: 1 week
Sponsored by: Mellanox Technologies
Device detach and setting error state may deadlock over the interface mutex
like this:
a) Detach code in mlx5en waits until error state is set while the interface
mutex is locked.
b) The set error handler needs to lock the interface mutex before it can
set the error state.
The solution is to use atomics to set the error state.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When resetting mlx5core instances it can happen that the order of attach and
detach for mlx5ib instances is changed. Take the unit number for mlx5_%d from
the parent PCI device, similarly to what is done in mlx5en(4), so that there
is a direct relationship between mce<N> and mlx5_<N>.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The DSCP feature is controlled using a set of sysctl(8) fields under
the qos sysctl directory entry for mlx5en(4).
For Routable RoCE QPs, the DSCP should be set in the QP's address path.
The DSCP's value is derived from the traffic class.
Linux commit:
ed88451e1f2d400fd6a743d0a481631cf9f97550
MFC after: 1 week
Sponsored by: Mellanox Technologies
When receiving a PCP change all GID entries are reloaded.
This ensures the relevant GID entries use prio tagging,
by setting VLAN present and VLAN ID to zero.
The priority for prio tagged traffic is set using the regular
rdma_set_service_type() function.
Fake the real network device to have a VLAN ID of zero
when prio tagging is enabled. This is logic is hidden inside
the rdma_vlan_dev_vlan_id() function which must always be used
to retrieve the VLAN ID throughout all of ibcore and the
infiniband network drivers.
The VLAN presence information then propagates through all
of ibcore and so incoming connections will have the VLAN
bit set. The incoming VLAN ID is then checked against the
return value of rdma_vlan_dev_vlan_id().
MFC after: 1 week
Sponsored by: Mellanox Technologies
ig4(4) does not support suspend/resume but present on the hardware where
such functionality is critical, like laptops. Remove PNP info to avoid
breaking suspend/resume on the systems where ig4(4) load is not explicitly
requested by the user.
PR: 229791
Reported by: Ali Abdallah
- Ever since the workaround for the silicon bug of TSO4 causing MAC hangs
was committed in r295133, CSUM_TSO always got disabled unconditionally
by em(4) on the first invocation of em_init_locked(). However, even with
that problem fixed, it turned out that for at least e. g. 82579 not all
necessary TSO workarounds are in place, still causing MAC hangs even at
Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled
in r323292 (r323293 for stable/10) for the EM-class by default, allowing
users to turn it on if it happens to work with their particular EM MAC
in a Gigabit-only environment.
In head, the TSO workaround for speeds other than Gigabit was lost with
the conversion to iflib(9) in r311849 (possibly along with another one
or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4
got enabled by default again, causing device hangs. Therefore, change the
default for this hardware class back to have TSO4 off, allowing users
to turn it on manually if it happens to work in their environment as
we do in stable/{10,11}. An alternative would be to add a whitelist of
EM-class devices where TSO4 actually is reliable with the workarounds in
place, but given that the advantage of TSO at Gigabit speed is rather
limited - especially with the overhead of these workarounds -, that's
really not worth it. [1]
This change includes the addition of an isc_capabilities to struct
if_softc_ctx so iflib(9) can also handle interface capabilities that
shouldn't be enabled by default which is used to handle the default-off
capabilities of e1000 as suggested by shurd@ and moving their handling
from em_setup_interface() to em_if_attach_pre() accordingly.
- Although 82543 support TSO4 in theory, the former lem(4) didn't have
support for TSO4, presumably because TSO4 is even more broken in the
LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class
devices was enabled as part of the conversion to iflib(9) in r311849,
causing device hangs. So revert back to the pre-r311849 behavior of
not supporting TSO4 for LEM-class at all, which includes not creating
a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2]
- In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET
(65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO
DMA must have a maxsize of the maximum TSO size plus the size of a
VLAN header for software VLAN tagging. The iflib(9) converted em(4),
thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE
in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET
in em_setup_interface() (apparently, left-over from pre-iflib(9)
times). So remove the later and correct iflib(9) to correctly cap
the maximum TSO size reported to the stack at IP_MAXPACKET. While at
it, let iflib(9) use if_sethwtsomax*().
This change includes the addition of isc_tso_max{seg,}size DMA engine
constraints for the TSO DMA tag to struct if_shared_ctx and letting
iflib_txsd_alloc() automatically adjust the maxsize of that tag in case
IFCAP_VLAN_MTU is supported as requested by shurd@.
- Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet
header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9)
so adjustment is automatically done in case IFCAP_VLAN_MTU is supported.
As a consequence, this adjustment now is also done in case of bnxt(4)
which missed it previously.
- Move the reduction of the maximum TSO segment count reported to the
stack by the number of m_pullup(9) calls (which in the worst case,
can add another mbuf and, thus, the requirement for another DMA
segment each) in the transmit path for performance reasons from
em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now
done in iflib_parse_header() rather than in the no longer existing
em_xmit(). Moreover, this optimization applies to all drivers using
iflib(9) and not just em(4); all in-tree iflib(9) consumers still
have enough room to handle full size TSO packets. Also, reduce the
adjustment to the maximum number of m_pullup(9)'s now performed in
iflib_parse_header().
- Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9)
in r311849 and r335338 respectively, these drivers didn't enable
IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed
through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on
by default but also lagg(4) was fixed in that regard in r203548. So
just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling
in {em,ixl,ixlv}_setup_interface().
- Nuke other redundant IFCAP_* setting in {em,ixl,ixlv}_setup_interface()
which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre()
now.
- Remove some redundant/dead setting of scctx->isc_tx_csum_flags in
em_if_attach_pre().
- Remove some IFCAP_* duplicated either directly or indirectly (e. g.
via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS.
- Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as
iflib(9) adds that capability unconditionally.
- Remove some unused macros from em(4).
- Bump __FreeBSD_version as some of the above changes require the modules
of drivers using iflib(9) to be recompiled.
Okayed by: sbruno@ at 201806 DevSummit Transport Working Group [1]
Reviewed by: sbruno (earlier version), erj
PR: 219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2]
Differential Revision: https://reviews.freebsd.org/D15720