Commit Graph

273 Commits

Author SHA1 Message Date
Justin Hibbits
25c92cd2f6 iflib: Further convert to use IfAPI accessors
Summary:
When iflib was first converted some IfAPI APIs were not yet present, so
were tagged with "XXX" comments.  Finish the conversion by using these
new APIs.

Reviewed by:	gallatin, erj
Sponsored by:	Juniper Networks, Inc
Differential Revision: https://reviews.freebsd.org/D38928
2023-03-07 09:47:00 -05:00
Gleb Smirnoff
5f7bea2952 iflib: fix regression with new pfil(9) KPI
Do not pass the pointer to our valid mbuf to pfil(9).  Pass an
uninitialized one only.  This was unsafe with the old KPI, too,
but for some reason didn't fail.

Fixes:	caf32b260a
2023-02-28 08:56:20 -08:00
Gleb Smirnoff
caf32b260a pfil: add pfil_mem_{in,out}() and retire pfil_run_hooks()
The 0b70e3e78b changed the original design of a single entry point
into pfil(9) chains providing separate functions for the filtering
points that always provide mbufs and know the direction of a flow.
The motivation was to reduce branching.  The logical continuation
would be to do the same for the filtering points that always provide
a memory pointer and retire the single entry point.

o Hooks now provide two functions: one for mbufs and optional for
  memory pointers.
o pfil_hook_args() has a new member and pfil_add_hook() has a
  requirement to zero out uninitialized data. Bump PFIL_VERSION.
o As it was before, a hook function for a memory pointer may realloc
  into an mbuf.  Such mbuf would be returned via a pointer that must
  be provided in argument.
o The only hook that supports memory pointers is ipfw:default-link.
  It is rewritten to provide two functions.
o All remaining uses of pfil_run_hooks() are converted to
  pfil_mem_in().
o Transparent union of pfil_packet_t and tricks to fix pointer
  alignment are retired. Internal pfil_realloc() reduces down to
  m_devget() and thus is retired, too.

Reviewed by:		mjg, ocochard
Differential revision:	https://reviews.freebsd.org/D37977
2023-02-14 10:02:49 -08:00
Przemyslaw Lewandowski
9147969bc2
iflib: Add null check to iflib_stop()
Ever since gtaskqueue_drain() was added to iflib_stop(), a kernel panic
occurs when the ice(4) driver is in recovery mode. Queues are not
initialized in this mode, so gt_taskqueue is not initialized, and
gtaskqueue_drain() will panic.

Fix this by only doing a drain if an RX queue's gt_taskqueue is
initialized.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	erj@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D37892
2023-01-24 15:45:15 -08:00
Justin Hibbits
2c2b37ad25 ifnet/API: Move struct ifnet definition to a <net/if_private.h>
Hide the ifnet structure definition, no user serviceable parts inside,
it's a netstack implementation detail.  Include it temporarily in
<net/if_var.h> until all drivers are updated to use the accessors
exclusively.

Reviewed by:	glebius
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38046
2023-01-24 14:36:30 -05:00
Justin Hibbits
402810d32e Convert iflib(4) and iflib-based drivers to the DrvAPI
Summary:
Convert iflib(4) and the following drivers:
* axgbe
* em
* ice
* ixl
* vmxnet

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	kbowling, #iflib
Differential Revision: https://reviews.freebsd.org/D37768
2022-12-21 09:20:06 -05:00
Eric Joyner
9c95013905
iflib: Introduce v2 of TX Queue Select Functionality
For v2, iflib will parse packet headers before queueing a packet.

This commit also adds a new field in the structure that holds parsed
header information from packets; it stores the IP ToS/traffic class
field found in the IPv4/IPv6 header.

To help, it will only partially parse header packets before queueing
them by using a new header parsing function that does less than the
current parsing header function; for our purposes we only need up to the
minimal IP header in order to get the IP ToS infromation and don't need
to pull up more data.

For now, v1 and v2 co-exist in this patch; v1 still offers a
less-invasive method where none of the packet is parsed in iflib before
queueing.

This also bumps the sys/param.h version.

Signed-off-by:	Eric Joyner <erj@FreeBSD.org>
Tested by:	IntelNetworking
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision: 	https://reviews.freebsd.org/D34742
2022-10-17 14:59:55 -07:00
Dimitry Andric
0294e95da4 Fix unused variable warning in iflib.c
With clang 15, the following -Werror warning is produced:

    sys/net/iflib.c:993:8: error: variable 'n' set but not used [-Werror,-Wunused-but-set-variable]
            u_int n;
                  ^

The 'n' variable appears to have been a debugging aid that has never
been used for anything, so remove it.

MFC after:	3 days
2022-07-21 21:19:39 +02:00
John Baldwin
d08cb45362 iflib: Use empty inline functions for prefetch*() on non-x86.
This avoids warnings about unused variables in expressions passed to
prefetch*().
2022-04-08 17:25:14 -07:00
Eric Joyner
213e91399b
iflib: Allow drivers to determine which queue to TX on
Adds a new function pointer to struct if_txrx in order to allow
drivers to set their own function that will determine which queue
a packet should be sent on.

Since this includes a kernel ABI change, bump the __FreeBSD_version
as well.

(This motivation behind this is to allow the driver to examine the
UP in the VLAN tag and determine which queue to TX on based on
that, in support of HW TX traffic shaping.)

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	kbowling@, stallamr@netapp.com
Tested by:	jeffrey.e.pieper@intel.com
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D31485
2022-01-24 18:22:02 -08:00
Vincenzo Maffione
e0e1240528 netmap: fix LOR in iflib_netmap_register
In iflib_device_register(), the CTX_LOCK is acquired first and then
IFNET_WLOCK is acquired by ether_ifattach(). However, in netmap_hw_reg()
we do the opposite: IFNET_RLOCK is acquired first, and then CTX_LOCK
is acquired by iflib_netmap_register(). Fix this LOR issue by wrapping
the CTX_LOCK/UNLOCK calls in iflib_device_register with an additional
IFNET_WLOCK. This is safe since the IFNET_WLOCK is recursive.

MFC after:	1 month
2022-01-14 21:09:04 +00:00
Alexander Motin
618d49f5ca Revert "iflib: Relax timer period from 0.5 to 0.5-0.75s."
I've noticed relations between iflib_timer() vs ixl_admin_timer().
Both scheduled at the same 2Hz rate, but the second is rescheduling
the first each time, so if the first get any slower, it won't be
executed at all.  Revert this until deeper investigation.

This reverts commit 90bc1cf657.
2022-01-10 09:40:38 -05:00
Alexander Motin
90bc1cf657 iflib: Relax timer period from 0.5 to 0.5-0.75s.
While there switch it from hardclock ticks to milliseconds.

MFC after:	2 weeks
2022-01-09 20:32:50 -05:00
Stefan Eßer
e2650af157 Make CPU_SET macros compliant with other implementations
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.

Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.

The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).

The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.

This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.

One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.

Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.

The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.

Reviewed by:	kib
MFC after:	2 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D33451
2021-12-30 12:20:32 +01:00
Vincenzo Maffione
4561c4f0ca net: iflib: sync isc_capenable to if_capenable
On SIOCSIFCAP, some bits in ifp->if_capenable may be toggled.
When this happens, apply the same change to isc_capenable, which
is the iflib private copy of if_capenable (for a subset of the
IFCAP_* bits). In this way the iflib drivers can check the bits
using isc_capenable rather than if_capenable. This is convenient
because the latter access requires an additional indirection
through the ifp, and it is also less likely to be in cache.

PR:		260068
Reviewed by:	kbowling, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33156
2021-12-28 10:55:21 +00:00
Andriy Gapon
1bfdb812c7 iflib_stop: drain rx tasks to prevent any data races
iflib_stop modifies iflib data structures that are used by _task_fn_rx,
most prominently the free lists.  So, iflib_stop has to ensure that the
rx task threads are not active.

This should help to fix a crash seen when iflib_if_ioctl (e.g.,
SIOCSIFCAP) is called while there is already traffic flowing.

The crash has been seen on VMWare guests with vmxnet3 driver.

My guess is that on physical hardware the couple of 1ms delays that
iflib_stop has after disabling interrupts are enough for the queued work
to be completed before any iflib state is touched.

But on busy hypervisors the guests might not get enough CPU time to
complete the work, thus there can be a race between the taskqueue
threads and the work done to handle an ioctl, specifically in iflib_stop
and iflib_init_locked.

PR:		259458
Reviewed by:	markj
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D32926
2021-11-19 10:00:38 +02:00
Stephan de Wit
66fa12d8fb iflib: emulate counters in netmap mode
When iflib devices are in netmap mode the driver
counters are no longer updated making it look from
userspace tools that traffic has stopped.

Reported by:	Franco Fichtner <franco@opnsense.org>
Reviewed by:	vmaffione, iflib (erj, gallatin)
Obtained from:	OPNsense
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31550
2021-08-18 00:17:43 -07:00
Mateusz Guzik
a56888534d iflib: use m_gethdr_raw
Reviewed by:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31081
2021-07-07 11:05:46 +00:00
Mateusz Guzik
bad5f0b6c2 iflib: switch bare zone_mbuf use to m_free_raw
Reviewed by:	kbowling
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30961
2021-07-02 08:30:22 +00:00
Marcin Wojtas
58632fa7a3 iflib: Add a new quirk
ENETC NIC found in LS1028A has a bug where clearing TX pidx/cidx
causes the ring to hang after being re-enabled.
Add a new flag, if set iflib will preserve the indices during restart.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: gallatin, erj
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30728
2021-06-24 13:00:56 +02:00
Marcin Wojtas
cd945dc08a iflib: Take iri_pad into account when processing small frames
Drivers can specify padding of received frames with iri_pad field.
This can be used to enforce ip alignment by hardware.
Iflib ignored that padding when processing small frames,
which rendered this feature inoperable.
I found it while writing a driver for a NIC that can ip align
received packets. Note that this doesn't change behavior of existing
drivers as they all set iri_pad to 0.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: gallatin
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30009
2021-04-30 12:46:17 +02:00
Patrick Kelsey
ca7005f189 iflib: Improve mapping of TX/RX queues to CPUs
iflib now supports mapping each (TX,RX) queue pair to the same CPU
(default), to separate CPUs, or to a pair of physical and logical CPUs
that share the same L2 cache.  The mapping mechanism supports unequal
numbers of TX and RX queues, with the excess queues always being
mapped to consecutive physical CPUs.  When the platform cannot
distinguish between physical and logical CPUs, all are treated as
physical CPUs.  See the comment on get_cpuid_for_queue() for the
entire matrix.

The following device-specific tunables influence the mapping process:
dev.<device>.<unit>.iflib.core_offset       (existing)
dev.<device>.<unit>.iflib.separate_txrx     (existing)
dev.<device>.<unit>.iflib.use_logical_cores (new)

The following new, read-only sysctls provide visibility of the mapping
results:
dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu

When an iflib driver allocates TX softirqs without providing reference
RX IRQs, iflib now binds those TX softirqs to CPUs using the above
mapping mechanism (that is, treats them as if they were TX IRQs).
Previously, such bindings were left up to the grouptaskqueue code and
thus fell outside of the iflib CPU mapping strategy.

Reviewed by:	kbowling
Tested by:	olivier, pkelsey
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D24094
2021-04-26 01:06:34 -04:00
Andrew Gallatin
3183d0b680 iflib: initialize LRO unconditionally
Changes to the LRO code have exposed a bug in iflib where devices
which are not capable of doing LRO are still calling
tcp_lro_flush_all(), even when they have not initialized the LRO
context. This used to be mostly harmless, but the LRO code now sets
the VNET based on the ifp in the lro context and will try to access it
through a NULL ifp resulting in a panic at boot.

To fix this, we unconditionally initializes LRO so that we have a
valid LRO context when calling tcp_lro_flush_all(). One alternative is
to check the device capabilities before calling tcp_lro_flush_all() or
adding a new state flag in the ctx. However, it seems unwise to add an
extra, mostly useless test for higher performance devices when we can
just initialize LRO for all devices.

Reviewed by: erj, hselasky, markj, olivier
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29928
2021-04-23 05:55:20 -04:00
Vincenzo Maffione
361e950180 iflib: add support for netmap offsets
Follow-up change to a6d768d845.
This change adds iflib support for netmap offsets, enabling
applications to use offsets on any driver backed by iflib.
2021-04-05 07:54:47 +00:00
you@x
21d0c01226 netmap: iflib: add nm_config callback
This per-driver callback is invoked by netmap when it wants
to align the number of TX/RX netmap rings and/or the number of
TX/RX netmap slots to the actual state configured in the hardware.
The alignment happens when netmap mode is switched on (with no
active netmap file descriptors for that netmap port), or when
collecting netmap port information.

MFC after:	1 week
2021-03-29 09:31:18 +00:00
Marcin Wojtas
09c3f04ff3 iflib: add support for admin completion queues
For interfaces with admin completion queues, introduce a new devmethod
IFDI_ADMIN_COMPLETION_HANDLE and a corresponding flag IFLIB_HAS_ADMINCQ.

This provides an option for handling any admin cq logic, which cannot be
run from an interrupt context.

Said method is called from within iflib's admin task, making it safe to
sleep.

Reviewed by: mmacy
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D28708
2021-03-03 00:40:47 +01:00
Marcin Wojtas
ef567155d3 Fix powerpc build after 6dd69f0064
Commit 6dd69f0064 ("iflib: introduce isc_dma_width")
failed to build on powerpc due to implicit type conversion
error. Fix that.

Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
2021-02-25 02:35:41 +01:00
Marcin Wojtas
6dd69f0064 iflib: introduce isc_dma_width
Some DMA controllers are unable to address the full host memory space
and are instead limited to a subset of address range (e.g. 48-bit).

Allow the driver to specify the maximum allowed DMA addressing width
(in bits) for the NIC hardware, by introducing a new field in
if_softc_ctx.

If said field is omitted (set to 0), the lowaddr of DMA window bounds
defaults to BUS_SPACE_MAXADDR.

Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D28706
2021-02-25 00:25:39 +01:00
Mark Johnston
b6999635b1 iflib: Avoid double counting in rxeof
iflib_rxeof() was counting everything twice.  This was introduced when
pfil hooks were added to the iflib receive path.  We want to count rx
packets/bytes before the pfil hooks are executed, so remove the counter
adjustments that are executed after.

PR:		253583
Reviewed by:	gallatin, erj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28900
2021-02-24 10:08:53 -05:00
John Baldwin
2ccf971ace iflib: Cast the result of iflib_netmap_txq_init() to void.
This fixes a warning from GCC for kernels without netmap since the
return value is never used.

Reviewed by:	vmaffione, erj
Differential Revision:	https://reviews.freebsd.org/D28598
2021-02-19 12:52:53 -08:00
Allan Jude
922cf8ac43 Use iflib_if_init_locked() during media change instead of iflib_init_locked().
iflib_init_locked() assumes that iflib_stop() has been called, however,
it is not called for media changes.
iflib_if_init_locked() calls stop then init, so fixes the problem.

PR:	253473
MFC after:	3 days
Reviewed by:	markj
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D28667
2021-02-16 19:02:00 +00:00
Sai Rajesh Tallamraju
38bfc6dee3 iflib: Free resources in a consistent order during detach
Memory and PCI resources are freed with no particular order.  This could
cause use-after-frees when detaching following a failed attach.  For
instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but
iflib_tqg_detach() attempts to access this array. Similarly, adapter
queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to
access adapter queues to free PCI resources.

MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D27634
2021-02-01 11:15:54 -05:00
Gleb Smirnoff
3f43ada98c Catch up with 6edfd179c8: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG.
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer
that points to unmapped address.  Then, this was extended to array of
such pointers.  Then, such mbufs were augmented with header/trailer.
Basically, extended mbufs are extended, and set of features is subject
to change.  The new name should be generic enough to avoid further
renaming.
2021-01-29 11:46:24 -08:00
Vincenzo Maffione
f80efe5016 iflib: netmap: move per-packet operation out of fragments loop
MFC after:	1 week
2021-01-24 21:38:59 +00:00
Vincenzo Maffione
aceaccab65 iflib: netmap: add support for NS_MOREFRAG
The NS_MOREFRAG flag can be set in a netmap slot to represent a
multi-fragment packet. Only the last fragment of a packet does
not have the flag set. On TX rings, the flag may be set by the
userspace application. The kernel will look at the flag and use it
to properly set up the NIC TX descriptors.
On RX rings, the kernel may set the flag if the packet received
was split across multiple netmap buffers. The userspace application
should look at the flag to know when the packet is complete.

Submitted by:	rajesh1.kumar_amd.com
Reviewed by:	vmaffione
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27799
2021-01-24 21:20:59 +00:00
Andrew Gallatin
0c864213ef iflib: Fix a NULL pointer deref
rxd_frag_to_sd() have pf_rv parameter as NULL with the current
code. This patch fixes the NULL pointer dereference in that
case thus avoiding a possible panic.

Submitted by: rajesh1.kumar at amd.com
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D28115
2021-01-21 09:47:06 -05:00
Vincenzo Maffione
55f0ad5fde netmap: restore hwofs and support it in iflib
Restore the hwofs functionality temporarily disabled by
7ba6ecf216 to prevent issues with iflib.
This patch brings the necessary changes to iflib to
enable howfs to allow interface restarts without
disrupting netmap applications actively using its
rings.
After this change, it becomes possible for multiple
non-cooperating netmap applications to use non-overlapping
subsets of the available netmap rings without clashing
with each other.

PR:		252453
MFC after:	1 week
2021-01-10 22:51:15 +00:00
Vincenzo Maffione
8aa8484cbf iflib: fix build failure in case DEV_NETMAP is not defined
This addresses the build failure introduced by
3d65fd97e8.

MFC with: 3d65fd97e8
2021-01-10 14:43:58 +00:00
Vincenzo Maffione
4ba9ad0dc3 iflib: add assert to prevent out-of-bounds array access
The iflib_queues_alloc() allocates isc_nrxqs iflib_dma_info structs
for each rxqset, and links each struct to a different free list.
As a result, it must be isc_nrxqs >= isc_nfl (plus the completion
queue, if present).
Add an assertion to make this constraint explicit.

MFC after:	2 weeks
2021-01-10 13:59:20 +00:00
Vincenzo Maffione
3d65fd97e8 netmap: iflib: enable/disable krings on any interface reinit
Since 1d238b07d5, krings are disabled before
a reinit cycle triggered by iflib_netmap_register.
However, this operation is actually necessary also for
any interface reinit triggered by other causes (i.e.,
ifconfig commands).
We achieve this goal by moving the krings enable/disable
operation inside iflib_stop() and iflib_init_locked().

Once here, this change also removes some redundant operations
from iflib_netmap_register(), that are already performed by
iflib_stop().

PR:		252453
MFC after:	1 week
2021-01-10 12:04:08 +00:00
Vincenzo Maffione
3189ba6167 netmap: iflib: fix asserts in netmap_fl_refill()
When netmap_fl_refill() is called at initialization time (e.g.,
during netmap_iflib_register()), nic_i must be 0, since the
free list is reinitialized. At the end of the refill cycle, nic_i
must still be zero, because exactly N descriptors (N is the ring size)
are refilled.
This patch therefore fixes the assertions to check on nic_i rather
than on nm_i. The current netmap_reset() may in fact cause nm_i
to be != 0 while the device is resetting: this may happen when
multiple non-cooperating processes open different subsets of the
available netmap rings.

PR:	    252518
MFC after:  1 week
2021-01-09 21:35:07 +00:00
Vincenzo Maffione
1d238b07d5 netmap: iflib: stop krings during interface reset
When different processes open separate subsets of the
available rings of a same netmap interface, a device
reset may be performed while one of the processes
is actively using some rings (e.g., caused by another
process executing a nmport_open()).
With this patch, such situation will cause the
active process to get a POLLERR, so that it can
have a chance to detect the situation.
We also guarantee that no process is running a txsync
or rxsync (ioctl or poll) while an iflib device reset
is in progress.

PR:	    252453
MFC after:  1 week
2021-01-09 21:01:46 +00:00
Matt Macy
81be655266 iflib: ensure that tx interrupts enabled and cleanups
Doing a 'dd' over iscsi will reliably cause stalls. Tx
cleaning _should_ reliably happen as data is sent.
However, currently if the transmit queue fills it will
wait until the iflib timer (hz/2) runs.

This change causes the the tx taskq thread to be run
if there are completed descriptors.

While here:

- make timer interrupt delay a sysctl

- simplify txd_db_check handling

- comment on INTR types

Background on the change:

Initially doorbell updates were minimized by only writing to the register
on every fourth packet. If txq_drain would return without writing to the
doorbell it scheduled a callout on the next tick to do the doorbell write
to ensure that the write otherwise happened "soon". At that time a sysctl
was added for users to avoid the potential added latency by simply writing
to the doorbell register on every packet. This worked perfectly well for
e1000 and ixgbe ... and appeared to work well on ixl. However, as it
turned out there was a race to this approach that would lockup the ixl MAC.
It was possible for a lower producer index to be written after a higher one.
On e1000 and ixgbe this was harmless - on ixl it was fatal. My initial
response was to add a lock around doorbell writes - fixing the problem but
adding an unacceptable amount of lock contention.

The next iteration was to use transmit interrupts to drive delayed doorbell
writes. If there were no packets in the queue all doorbell writes would be
immediate as the queue started to fill up we could delay doorbell writes
further and further. At the start of drain if we've cleaned any packets we
know we've moved the state machine along and we write the doorbell (an
obvious missing optimization was to skip that doorbell write if db_pending
is zero). This change required that tx interrupts be scheduled periodically
as opposed to just when the hardware txq was full. However, that just leads
to our next problem.

Initially dedicated msix vectors were used for both tx and rx. However, it
was often possible to use up all available vectors before we set up all the
queues we wanted. By having rx and tx share a vector for a given queue we
could halve the number of vectors used by a given configuration. The problem
here is that with this change only e1000 passed the necessary value to have
the fast interrupt drive tx when appropriate.

Reported by: mav@
Tested by: mav@
Reviewed by:    gallatin@
MFC after:      1 month
Sponsored by:   iXsystems
Differential Revision:  https://reviews.freebsd.org/D27683
2021-01-07 14:07:35 -08:00
Mark Johnston
c065d4e5e9 iflib: Avoid leaking the freelist bitmaps upon driver detach
Submitted by:	Sai Rajesh Tallamraju <stallamr@netapp.com>
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D27342
2020-12-07 14:53:14 +00:00
Mark Johnston
102540192c iflib: Detach tasks upon device registration failure
In some error paths we would fail to detach from the iflib taskqueue
groups.  Also move the detach code into its own subroutine instead of
duplicating it.

Submitted by:	Sai Rajesh Tallamraju <stallamr@netapp.com>
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D27342
2020-12-07 14:52:57 +00:00
Mark Johnston
54bf96fb4f iflib: Free full mbuf chains when draining transmit queues
Submitted by:	Sai Rajesh Tallamraju <stallamr@netapp.com>
Reviewed by:	gallatin, hselasky
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D27179
2020-11-11 18:00:06 +00:00
Vincenzo Maffione
be7a6b3d84 iflib: fix typo bug introduced by r367093
Code was supposed to call callout_reset_sbt_on() rather than
callout_reset_sbt(). This resulted into passing a "cpu" value
to a "flag" argument. A recipe for subtle errors.

PR:	248652
Reported by:	sg@efficientip.com
MFC with: r367093
2020-10-28 21:06:17 +00:00
Vincenzo Maffione
17cec474c0 iflib: add per-tx-queue netmap timer
The way netmap TX is handled in iflib when TX interrupts are not
used (IFC_NETMAP_TX_IRQ not set) has some issues:
  - The netmap_tx_irq() function gets called by iflib_timer(), which
    gets scheduled with tick granularity (hz). This is not frequent
    enough for 10Gbps NICs and beyond (e.g., ixgbe or ixl). The end
    result is that the transmitting netmap application is not woken
    up fast enough to saturate the link with small packets.
  - The iflib_timer() functions also calls isc_txd_credits_update()
    to ask for more TX completion updates. However, this violates
    the netmap requirement that only txsync can access the TX queue
    for datapath operations. Only netmap_tx_irq() may be called out
    of the txsync context.

This change introduces per-tx-queue netmap timers, using microsecond
granularity to ensure that netmap_tx_irq() can be called often enough
to allow for maximum packet rate. The timer routine simply calls
netmap_tx_irq() to wake up the netmap application. The latter will
wake up and call txsync to collect TX completion updates.

This change brings back line rate speed with small packets for ixgbe.
For the time being, timer expiration is hardcoded to 90 microseconds,
in order to avoid introducing a new sysctl.
We may eventually implement an adaptive expiration period or use another
deferred work mechanism in place of timers.

Also, fix the timers usage to make sure that each queue is serviced
by a different CPU.

PR:	248652
Reported by:	sg@efficientip.com
MFC after:	2 weeks
2020-10-27 21:53:33 +00:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Vincenzo Maffione
35d8a463e8 iflib: leave only 1 receive descriptor unused
The pidx argument of isc_rxd_flush() indicates which is the last valid
receive descriptor to be used by the NIC. However, current code has
multiple issues:
  - Intel drivers write pidx to their RDT register, which means that
    NICs will only use the descriptors up to pidx-1 (modulo ring size N),
    and won't actually use the one pointed by pidx. This does not break
    reception, but it is anyway confusing and suboptimal (the NIC will
    actually see only N-2 descriptors as available, rather than N-1).
    Other drivers (if_vmx, if_bnxt, if_mgb) adhere to this semantic).
  - The semantic used by Intel (RDT is one descriptor past the last
     valid one) is used by most (if not all) NICs, and it is also used
     on the TX side (also in iflib). Since iflib is not currently
     using this semantic for RX, it must decrement fl->ifl_pidx
     (modulo N) before calling isc_rxd_flush(), and then the
     per-driver callback implementation must increment the index
     again (to match the real semantic). This is confusing and suboptimal.
  -  The iflib refill function is also called at initialization.
     However, in case the ring size is smaller than 128 (e.g. if_mgb),
     the refill function will actually prepare all the receive
     descriptors (N), without leaving one unused, as most of NICs assume
     (e.g. to avoid RDT to overrun RDH). I can speculate that the code
     looks like this right now because this issue showed up during
     testing (e.g. with if_mgb), and it was easy to workaround by
     decrementing pidx before isc_rxd_flush().

The goal of this change is to simplify the code (removing a bunch
of instructions from the RX fast path), and to make the semantic of
isc_rxd_flush() consistent across drivers. To achieve this, we:
  - change the semantics of the pidx argument to the usual one (that
    is the index one past the last valid one), so that both iflib and
    drivers avoid the decrement/increment dance.
  - fix the initialization code to prepare at most N-1 descriptors.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26191
2020-09-01 20:41:47 +00:00