Commit Graph

205 Commits

Author SHA1 Message Date
Mark Johnston
814fa34dfb Increase the iflib txq callout mutex name length to 32 bytes.
With a length of 16, the name ("<if name>:TX(<qid>):callout") typically
gets truncated.

PR:		245712
Reported by:	ghuckriede@blackberry.com
MFC after:	1 week
2020-04-30 15:39:04 +00:00
Eric Joyner
45818bf1a0 iflib: Stop interface before (un)registering VLAN
This patch is intended to solve a specific problem that iavf(4)
encounters, but what it does can be extended to solve other issues.

To summarize the iavf(4) issue, if the PF driver configures VLAN
anti-spoof, then the VF driver needs to make sure no untagged traffic is
sent if a VLAN is configured, and vice-versa. This can be an issue when
a VLAN is being registered or unregistered, e.g. when a packet may be on
the ring with a VLAN in it, but the VLANs are being unregistered. This
can cause that tagged packet to go out and cause an MDD event.

To fix this, include a new interface-dependent function that drivers can
implement named IFDI_NEEDS_RESTART(). Right now, this function is called
in iflib_vlan_unregister/register() to determine whether the interface
needs to be stopped and started when a VLAN is registered or
unregistered. The default return value of IFDI_NEEDS_RESTART() is true,
so this fixes the MDD problem that iavf(4) encounters, since the
interface rings are flushed during a stop/init.

A future change to iavf(4) will implement that function just in case the
default value changes, and to make it explicit that this interface reset
is required when a VLAN is added or removed.

Reviewed by:	gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22086
2020-04-27 22:02:44 +00:00
Mark Johnston
59d50fe5ef Simplify taskqgroup inititialization.
taskqgroup initialization was broken into two steps:

1. allocate the taskqgroup structure, at SI_SUB_TASKQ;
2. initialize taskqueues, start taskqueue threads, enqueue "binder"
   tasks to bind threads to specific CPUs, at SI_SUB_SMP.

Step 2 tries to handle the case where tasks have already been attached
to a queue, by migrating them to their intended queue.  In particular,
tasks can't be enqueued before step 2 has completed.  This breaks NFS
mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP
is not defined, since mountroot happens before SI_SUB_SMP in this case.

Simplify initialization: do all initialization except for CPU binding at
SI_SUB_TASKQ.  This means that until CPU binding is completed, group
tasks may be executed on a CPU other than that to which they were bound,
but this should not be a problem for existing users of the taskqgroup
KPIs.

Reported by:	sbruno
Tested by:	bdragon, sbruno
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24188
2020-03-30 14:22:52 +00:00
Ed Maste
ed6611cc8c iflib: simplify MPASS assertion
Submitted by:	andrew
2020-03-24 17:54:34 +00:00
Ed Maste
68af0153a7 iflib: split compound assertion
ThunderX cluster systems are panicking on boot with a failed assertion
MPASS(gtask != NULL && gtask->gt_taskqueue != NULL).  Split the
assertion so that it's clear which part is failing.
2020-03-24 17:25:56 +00:00
Patrick Kelsey
876996910a Remove extraneous code from iflib
ifsd_cidx is never used, and the line removed from rxd_frag_to_sd() is
just dead code.

Reviewed by:	erj, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23951
2020-03-14 20:13:42 +00:00
Patrick Kelsey
3caff1885f Remove refill budget from iflib
Reviewed by:	gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23948
2020-03-14 19:58:50 +00:00
Patrick Kelsey
b38136097a Allow iflib drivers to specify the buffer size used for each receive queue
Reviewed by:	erj, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23947
2020-03-14 19:56:46 +00:00
Patrick Kelsey
e503049067 Remove freelist contiguous-indexes assertion from rxd_frag_to_sd()
The vmx driver is an example of an iflib driver that might report
packets using non-contiguous descriptors (with unused descriptors
either between received packets or between the fragments of a received
packet), so this assertion needs to be removed.

For such drivers, the freelist producer and consumer indexes don't
relate directly to driver ring slots (the driver deals directly with
freelist buffer indexes supplied by iflib during refill, and reports
them with each fragment during packet reception), but do continue to
be used by iflib for accounting, such as determining the number of
ring slots that are refillable.

PR:		243126, 243392, 240628
Reported by:	avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer
Reviewed by:	gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23946
2020-03-14 19:55:05 +00:00
Patrick Kelsey
4f2beb721b Fix iflib zero-length fragment handling
The dmamap for zero-length fragments should not be unloaded, as doing
so breaks the the cluster-reuse logic in _iflib_fl_refill().

All zero-length fragments are now handled by the assemble_segments()
path so that the cluster-reuse logic there does not have to be
replicated in the small-single-fragment-packet path of
iflib_rxd_pkt_get().

Packets consisting entirely of zero-length fragments (which result in
a NULL mbuf pointer) are now properly tolerated.  This allows drivers
(such as the vmx driver) to pass such packets to iflib when a
descriptor error occurs during packet reception, the advantage being
that the refill of descriptors associated with the error packet are
handled via the existing iflib machinery without having to duplicate
parts of that machinery in the driver to handle that error case.

Reviewed by:	avg, erj, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23945
2020-03-14 19:51:55 +00:00
Patrick Kelsey
9e9b738ac5 Fix iflib freelist state corruption
This fixes a bug in iflib freelist management that breaks the required
correspondence between freelist indexes and driver ring slots.

PR:		243126, 243392, 240628
Reported by:	avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer
Reviewed by:	avg, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23943
2020-03-14 19:43:44 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Gleb Smirnoff
e87c494015 Although most of the NIC drivers are epoch ready, due to peer pressure
switch over to opt-in instead of opt-out for epoch.

Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.

Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.

Mark several tested drivers as IFF_KNOWSEPOCH.

Reviewed by:		hselasky, jeff, bz, gallatin
Differential Revision:	https://reviews.freebsd.org/D23674
2020-02-24 21:07:30 +00:00
Hans Petter Selasky
f98977b521 Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process
incoming packets in taskqueue context.

This patch extends r357772.

Tested by:	yp@mm.st
Sponsored by:	Mellanox Technologies
2020-02-12 09:19:47 +00:00
Hans Petter Selasky
fb1a29b45e Make sure the so-called end of receive interrupts don't starve in iflib.
When the receive ring cannot be filled with mbufs, due to lack of memory,
no more interrupts may be generated to fill the receive ring later on.
Make sure to have a watchdog, to try refilling the receive ring from time
to time, hopefully when more mbufs are available.

Differential Revision:	https://reviews.freebsd.org/D23315
MFC after:	1 week
Reviewed by:	gallatin@
Sponsored by:	Mellanox Technologies
2020-02-12 08:30:07 +00:00
Gleb Smirnoff
6c3e93cb5a Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process
incoming packets in taskqueue context.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D23518
2020-02-11 18:57:07 +00:00
Gleb Smirnoff
0b8df657a4 Enter network epoch in iflib rxeof task.
In upcoming changes ether_input() is going to be changed not
to enter the network epoch.  It is going to be responsibility
of network interrupt.  In case of iflib - its taskqueue.
2020-01-23 01:27:58 +00:00
Eric Joyner
f6afed726b iflib: Prevent watchdog from resetting idle queues
While changing link state in iflib_link_state_change(), queues are
marked as IFLIB_QUEUE_IDLE to disable watchdog. Currently, iflib_timer()
watchdog does not check for previous queue status before marking it as
IFLIB_QUEUE_HUNG.

This patch adds check of queue status before marking it as hung.

Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>

PR:		239240
Submitted by:	Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reported by:	ultima@
Reviewed by:	gallatin@, erj@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21712
2020-01-02 23:35:06 +00:00
Eric Joyner
db8e8f1ede iflib: properly release memory allocated for DMA
DMA memory allocations using the bus_dma.h interface are not properly
released in all cases for both Tx and Rx. This causes ~448 bytes of
M_DEVBUF allocations to be leaked.

First, the DMA maps for Rx are not properly destroyed. A slight attempt
is made in iflib_fl_bufs_free to destroy the maps if we're detaching.
However, this function may not be reliably called during detach. Indeed,
there is a comment "asking" if this should be moved out.

Fix this by moving the bus_dmamap_destroy call into iflib_rx_sds_free,
where we already sync and unload the DMA.

Second, the DMA tag associated with the ifr_ifdi descriptor DMA is not
released properly anywhere. Add a call to iflib_dma_free in
iflib_rx_structures_free.

Third, use of NULL as a canary value on the map pointer returned by
bus_dmamap_create is not valid. On some platforms, notably x86, this
value may be NULL. In this case, we fail to properly release the related
resources.

Remove the NULL checks on map values in both iflib_fl_bufs_free and
iflib_txsd_destroy.

With all of these fixes applied, the leaks to M_DEVBUF are squelched,
and iflib drivers now seem to properly cleanup when detaching.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22203
2019-11-04 23:06:57 +00:00
Eric Joyner
244e7cffa5 iflib: cleanup memory leaks on driver detach
From Jake:
The iflib stack failed to release all of the memory allocated under
M_IFLIB during device detach.

Specifically, the ifmp_ring, the ift_ifdi Tx DMA info, and the ifr_ifdi Rx
DMA info were not being released.

Release this memory so that iflib won't leak memory when a device
detaches.

Since we're freeing the ift_ifdi pointer during iflib_txq_destroy we
need to call this only after iflib_dma_free in iflib_tx_structures_free.

Additionally, also ensure that we destroy the callout mutex associated
with each Tx queue when we free it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22157
2019-10-30 20:45:12 +00:00
Eric Joyner
1558015e3e iflib: call ether_ifdetach and netmap_detach before stop
From Jake:
Calling ether_ifdetach after iflib_stop leads to a potential race where
a stale ifp pointer can remain in the route entry list for IPv6 traffic.
This will potentially cause a page fault or other system instability if
the ifp pointer is accessed.

Move both iflib_netmap_detach and ether_ifdetach to be called prior to
iflib_stop. This avoids the race above, and helps ensure that other ifp
references are removed before stopping the interface.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, jhb@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22071
2019-10-23 23:20:49 +00:00
Conrad Meyer
7790c8c199 Split out a more generic debugnet(4) from netdump(4)
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport.  It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).

It is mostly a verbatim code lift from netdump(4).  Netdump(4) remains
the only consumer (until the rest of this patch series lands).

The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c.  UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c.  The
separation is not perfect and future improvement is welcome.  Supporting
INET6 is a long-term goal.

Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry.  I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.

The only functional change here is the mbuf allocation / tracking.  Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time.  If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark.  Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.

No other functional change intended.

Reviewed by:	markj (earlier version)
Some discussion with:	emaste, jhb
Objection from:	marius
Differential Revision:	https://reviews.freebsd.org/D21421
2019-10-17 16:23:03 +00:00
Mark Johnston
4166913371 Add IFLIB_SINGLE_IRQ_RX_ONLY.
As of r347221 the iflib legacy interrupt mode setup assumes that drivers
perform both receive and transmit processing from the interrupt handler.
This assumption is invalid in the vmxnet3 driver, so introduce the
IFLIB_SINGLE_IRQ_RX_ONLY flag to make iflib avoid tx processing in the
interrupt handler.

PR:		239118
Reported and tested by:	Juraj Lutter <otis@sk.freebsd.org>
Obtained from:	marius
Reviewed by:	gallatin
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D21831
2019-09-30 15:59:07 +00:00
Andrew Gallatin
6554362c66 kTLS support for TLS 1.3
TLS 1.3 requires a few changes because 1.3 pretends to be 1.2
with a record type of application data. The "real" record type is
then included at the end of the user-supplied plaintext
data. This required adding a field to the mbuf_ext_pgs struct to
save the record type, and passing the real record type to the
sw_encrypt() ktls backend functions.

Reviewed by:	jhb, hselasky
Sponsored by:	Netflix
Differential Revision:	D21801
2019-09-27 19:17:40 +00:00
Eric Joyner
53b5b9b049 iflib: Remove redundant VLAN events deregistration
From Piotr:
r351152 introduced iflib_deregister() function calling
EVENTHANDLER_DEREGISTER() to unregister VLAN events. This patch removes
duplicate of EVENTHANDLER_DEREGISTER() calls placed in
iflib_device_deregister() as this function is now calling
iflib_deregister(). This is to avoid deregistering same event twice.

This patch also adds check in iflib_vlan_register() to prevent
registering VLAN while being in detach.

Patch co-authored by Krzysztof Galazka <krzysztof.galazka@intel.com>,
erj <erj@FreeBSD.org> and Jacob Keller <jacob.e.keller@intel.com>.

Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>

Submitted by:	Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed by:	gallatin@, erj@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21711
2019-09-24 17:03:31 +00:00
Eric Joyner
566144142e iflib: add iflib_deregister to help cleanup on exit
Commit message by Jake:
The iflib_register function exists to allocate and setup some common
structures used by both iflib_device_register and iflib_pseudo_register.

There is no associated cleanup function used to undo the steps taken in
this function.

Both iflib_device_deregister and iflib_pseudo_deregister have some of
the necessary steps scattered in their flow. However, most of the
necessary cleanup is not done during the error path of
iflib_device_register and iflib_pseudo_register.

Some examples of missed cleanup include:

the ifp pointer is not free'd during error cleanup
the STATE and CTX locks are not destroyed during error cleanup
the vlan event handlers are not removed during error cleanup
media added to the ifmedia structure is not removed
the kobject reference is never deleted
Additionally, when initializing the kobject class reference counter is
increased even though kobj_init already increases it. This results in
the class never being free'd again because the reference count would
never hit zero even after all driver instances are unloaded.

To aid in proper cleanup, implement an iflib_deregister function that
goes through the reverse steps taken by iflib_register.

Call this function during the error cleanup for iflib_device_register
and iflib_pseudo_register. Additionally call the function in the
iflib_device_deregister and iflib_pseudo_deregister functions near the
end of their flow. This helps reduce code duplication and ensures that
proper steps are taken to cleanup allocations and references in both the
regular and error cleanup flows.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, erj@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21005
2019-08-16 23:33:44 +00:00
Eric Joyner
197c679824 iflib: Prevent kernel panic caused by loading driver with a specific interrupt configuration
If a device has only 1 MSI-X interrupt available and does not support either
MSI or legacy interrupts, iflib_device_register() will fail, leak memory and
MSI resources, and the driver will not load. Worse, if another iflib-using
driver tries to unload afterwards, a kernel panic will occur because the
previous failed iflib driver loead did not properly call "taskqgroup_detach()"
during it's cleanup.

This patch is band-aid for this situation -- don't try allocating MSI or legacy
interrupts if a single MSI-X interrupt was allocated, but fail to load instead.
As well, during the cleanup, properly call taskqgroup_detach() on the admin
task to prevent panics when other iflib drivers unload.

This whole interrupt allocation process actually needs re-doing to properly
support devices with only a single MSI-X interrupt, devices that only support
MSI-X, non-PCI devices, and multiple non-MSIX interrupts, as well.

Signed-off-by: Eric Joyner <erj@freebsd.org>

Reviewed by:	marius@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D20747
2019-08-01 17:37:25 +00:00
Eric Joyner
6a3f243b04 iflib: remove kobject class reference increment
Commit message from Jake:
In iflib_register, the context is initialized as a kobject using the
device driver's "driver" kobject class. As part of this, the function
mistakenly increments the ref counter.

The ref counter is incremented twice, once in the code directly, and
once again by kobj_class_compile. However, there is no associated
decrement in the detach path. Because of this, the ref counter will
never go back down to zero, and thus the kobject method table will never
be released.

Remove this unnecessary reference count increment.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	jhb@, erj@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21125
2019-08-01 17:28:36 +00:00
Eric Joyner
7f3f6aad3e iflib: fix dangling device softc pointer
Commit text by Jake:
If a driver's IFDI_ATTACH_PRE function fails, the iflib_device_register
function will free the ctx pointer. However, it does not reset the
device softc pointer to NULL.

This will result in memory corruption as a future access to the now
invalid pointer will corrupt memory that is later allocated on top of
the same memory location.

The iflib_device_deregister function correctly resets the softc pointer
by using device_set_softc().

This clears up the invalid dangling pointer and prevents memory
corruption that could lead to a panic or undefined behavior if the
device's driver failed to attach.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21003
2019-07-24 21:43:41 +00:00
Marius Strobl
c2c5d1e787 o In iflib_txq_drain():
- Remove desc_used, which is only ever written to.
  - Remove a dead store to reclaimed.
  - Don't recycle avail.
  - Sort variables according to style(9).
  These changes will make a subsequent commit easier to read.
o In iflib_tx_credits_update(), don't bother checking whether the
  ift_txd_credits_update method pointer is NULL; _iflib_pre_assert()
  asserts upfront that this method has been assigned and functions
  like iflib_{fast_intr_rxtx,netmap_timer_adjust,txq_can_drain}()
  and _task_fn_tx() were already unconditionally relying on the
  method being callable.
2019-06-26 15:28:21 +00:00
Marko Zec
188adcb7e4 V_ip6_forwarding and V_ipforwarding have been defined in ip6_var.h /
ip_var.h since at least 2008, so make use of those definitions here.

MFC after:	3 days
2019-06-19 08:49:24 +00:00
Marko Zec
6aee0bfa85 Evaluating htons() at compile time is more efficient than doing ntohs()
at runtime.  This change removes a dependency on a barrel shifter pass
before branch resolution, while reducing the instruction stream size
by 9 bytes on amd64.

MFC after:	3 days
2019-06-19 08:39:19 +00:00
Marius Strobl
d49e83eac3 - Replace unused and only ever written to members of public iflib(9)
structs with placeholders (in the latter case, IFLIB_MAX_TX_BYTES
  etc. are also only ever used for these write-only members if at all,
  so both these macros and members can just go). Using these spares
  may render it possible to merge certain iflib(9) fixes to stable/12.
  Otherwise, changes extending struct if_irq or struct if_shared_ctx
  in any way would break KBI as instances of these are allocated by
  the driver front-ends (by contrast, struct if_pkt_info as well as
  struct if_softc_ctx instances are provided by iflib(9) and, thus,
  may grow at least at the end without breaking KBI).
- Make the pvi_name in struct pci_vendor_info const char * as device
  identifiers in hardware lookup tables aren't to be expected to ever
  change at runtime.
- Similarly, make the pci_vendor_info_t of struct if_shared_ctx which
  is used to point to the struct pci_vendor_info arrays provided by
  the driver front-ends const.
- Remove the ETH_ADDR_LEN macro from iflib.h; this was duplicating
  ETHER_ADDR_LEN of <net/ethernet.h> with iflib(9) actually only
  consuming the latter macro.
- Make the name argument of iflib_io_tqg_attach(9) const, matching
  the taskqgroup_attach_cpu(9) this function wraps as well as e. g.
  iflib_config_gtask_init(9).
- Remove the orphaned iflib_qset_lock_get() prototype.
- Remove some extraneous empty lines.
2019-06-15 11:07:41 +00:00
Eric Joyner
668d6dbb4c iflib: provide probe wrapper for vendor drivers
From Jake:
Vendor drivers that exist out-of-tree generally should return
BUS_PROBE_VENDOR from their device probe functions. This helps ensure
that a vendor replacement driver will supersede the in-kernel driver for
a given device.

Currently, if a vendor wants to implement a driver based on iflib, it
will always report BUS_PROBE_DEFAULT.

Add a wrapper function, iflib_device_probe_vendor() which can be used in
place of iflib_device_probe(). This function will just return
BUS_PROBE_VENDOR whenever iflib_device_probe() would return
BUS_PROBE_DEFAULT.

While vendor drivers can already implement such a wrapper themselves,
providing it in the iflib.h header makes it easier for the vendor driver
to do the right thing.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, marius@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D20221
2019-05-29 22:24:10 +00:00
Eric Joyner
afb7737237 iflib: use default ntxd and nrxd when user value is not power of 2
From Jake:
A user may set a sysctl to override the default number of Tx or Rx
descriptors. However, certain calculations in the iflib core expect the
number of descriptors to be a power of 2.

Update _iflib_assert to verify that all of the shared context parameters
for the number of descriptors are powers of 2.

Modify iflib_reset_qvalues to check that the provided isc_nrxd value is
a power of 2. If it's not, print a warning message and then use the
default value.

An alternative might be to try rounding the number down instead.
However, this creates problems in case the rounded down value is below
the minimum value that the driver would support.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	marius@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19880
2019-05-10 00:41:42 +00:00
Marius Strobl
007b804fc7 Allow to build without INET and INET6 again after r347221.
Submitted by:	cam
2019-05-08 09:03:43 +00:00
Marius Strobl
3d10e9ed62 o Use iflib_fast_intr_rxtx() also for "legacy" interrupts, i. e. INTx and
MSI. Unlike as with iflib_fast_intr_ctx(), the former will also enqueue
  _task_fn_tx() in addition to _task_fn_rx() if appropriate, bringing TCP
  TX throughput of EM-class devices on par with the MSI-X case and, thus,
  close to wirespeed/pre-iflib(4) times again. [1]
  Note that independently of the interrupt type, the UDP performance with
  these MACs still is abysmal and nowhere near to where it was before the
  conversion of em(4) to iflib(4).
o In iflib_init_locked(), announce which free list failed to set up.
o In _task_fn_tx() when running netmap(4), issue ifdi_intr_enable instead
  of the ifdi_tx_queue_intr_enable method in case of a "legacy" interrupt
  as the latter is valid with MSI-X only.
o Instead of adding the missing - and apparently convoluted enough that a
  DBG_COUNTER_INC was put into a wrong spot in _task_fn_rx() - checks for
  ifdi_{r,t}x_queue_intr_enable being available in the MSI-X case also to
  iflib_fast_intr_rxtx(), factor these out to iflib_device_register() and
  make the checks fail gracefully rather than panic. This avoids invoking
  the checks at runtime over and over again in iflib_fast_intr_rxtx() and
  _task_fn_{r,t}x() - even if it's just in case of INVARIANTS - and makes
  these functions more readable.
o In iflib_rx_structures_setup(), only initialize LRO resources if device
  and driver have LRO capability in order to not waste memory. Also, free
  the LRO resources again if setting them up fails for one of the queues.
  However, don't bother invoking iflib_rx_sds_free() in that case because
  iflib_rx_structures_setup() doesn't call iflib_rxsd_alloc() either (and
  iflib_{device,pseudo}_register() will issue iflib_rx_sds_free() in case
  of failure via iflib_rx_structures_free(), but there definitely is some
  asymmetry left to be fixed, though).
o Similarly, free LRO resources again in iflib_rx_structures_free().
o In iflib_irq_set_affinity(), handle get_core_offset() errors gracefully
  instead of panicing (but only in case of INVARIANTS). This is a follow-
  up to r344132, as such driver bugs shouldn't be fatal.
o Likewise, handle unknown iflib_intr_type_t in iflib_irq_alloc_generic()
  gracefully, too.
o Bring yet more sanity to iflib_msix_init():
  - If the device doesn't provide enough MSI-X vectors or not all vectors
    can be allocate so the expected number of queues in addition to admin
    interrupts can't be supported, try MSI next (and then INTx) as proper
    MSI-X vector distribution can't be assured in such cases. In essence,
    this change brings r254008 forward to iflib(4). Also, this is the fix
    alluded to in the commit message of r343934.
  - If the MSI-X allocation has failed, don't prematurely announce MSI is
    going to be used as the latter in fact may not be available either.
  - When falling back to MSI, only release the MSI-X table resource again
    if it was allocated in iflib_msix_init(), i. e. isn't supplied by the
    driver, in the first place.
o In mp_ndesc_handler(), handle unknown type arguments gracefully, too.

PR:		235031 (likely) [1]
Reviewed by:	shurd
Differential Revision:	https://reviews.freebsd.org/D20175
2019-05-07 08:28:35 +00:00
Marius Strobl
1722eeac95 - Remove the unused ifc_link_irq and ifc_mtx_name members of struct iflib_ctx.
- Remove the only ever written to ift_db_mtx_name member of struct iflib_txq.
- Remove the unused or only ever written to ifr_size, ifr_cq_pidx, ifr_cq_gen
  and ifr_lro_enabled members of struct iflib_rxq.
- Consistently spell DMA, RX and TX uppercase in comments, messages etc.
  instead of mixing with some lowercase variants.
- Consistently use if_t instead of a mix of if_t and struct ifnet pointers.
- Bring the function comments of _iflib_fl_refill(), iflib_rx_sds_free() and
  iflib_fl_setup() in line with reality.
- Judging problem reports, people are wondering what on earth messages like:
  "TX(0) desc avail = 1024, pidx = 0"
  are trying to indicate. Thus, extend this string to be more like that of
  non-iflib(4) Ethernet MAC drivers, notifying about a watchdog timeout due
  to which the interface will be reset.
- Take advantage of the M_HAS_VLANTAG macro.
- Use false/true rather than FALSE/TRUE for variables of type bool.
- Use FALLTHROUGH as advocated by style(9).
2019-05-06 20:56:41 +00:00
Matt Macy
e2621d9657 Allow iflib drivers to pass a pointer to their own ifmedia structure.
Tested by: emaste@

Differential Revision:	https://reviews.freebsd.org/D19946
2019-05-03 20:05:31 +00:00
Ed Maste
ce3da455e9 iflib: remove assertion that isc_capabilities is nonzero
It's atypical, but not invalid, for a driver to pass no capabilities.

Submitted by:	Gerald Aryeetey <aryeeteygerald_rogers.com>
Reviewed by:	shurd
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20142
2019-05-02 19:13:31 +00:00
Stephen Hurd
f154ece02e iflib: Better control over queue core assignment
By default, cores are now assigned to queues in a sequential
manner rather than all NICs starting at the first core. On a four-core
system with two NICs each using two queue pairs, the nic:queue -> core
mapping has changed from this:

0:0 -> 0, 0:1 -> 1
1:0 -> 0, 1:1 -> 1

To this:

0:0 -> 0, 0:1 -> 1
1:0 -> 2, 1:1 -> 3

Additionally, a device can now be configured to use separate cores for TX
and RX queues.

Two new tunables have been added, dev.X.Y.iflib.separate_txrx and
dev.X.Y.iflib.core_offset. If core_offset is set, the NIC is not part
of the auto-assigned sequence.

Reviewed by:	marius
MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D20029
2019-04-25 21:24:56 +00:00
Andrew Gallatin
6d49b41ee8 iflib: Add pfil hooks
As with mlx5en, the idea is to drop unwanted traffic as early
in receive as possible, before mbufs are allocated and anything
is passed up the stack.  This can save considerable CPU time
when a machine is under a flooding style DOS attack.

The major change here is to remove the unneeded abstraction where
callers of rxd_frag_to_sd() get back a pointer to the mbuf ring, and
are responsible for NULL'ing that mbuf themselves. Now this happens
directly in rxd_frag_to_sd(), and it returns an mbuf. This allows us
to use the decision (and potentially mbuf) returned by the pfil
hooks. The driver can now recycle mbufs to avoid re-allocation when
packets are dropped.

Reviewed by:	marius  (shurd and erj also provided feedback)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19645
2019-04-24 13:32:04 +00:00
Kyle Evans
1fd8c72c0a iflib: Use new ether_gen_addr, restricting addresses to that subset
Differential Revision:	https://reviews.freebsd.org/D19587
2019-04-17 17:19:54 +00:00
Eric Joyner
225eae1bb7 iflib: return ENETDOWN when the network device is down
From Jake:
iflib_if_transmit returns ENOBUFS when the device is down, or when the
link isn't active.

This was changed in r308792 from return (0), so that the function
correctly reports an error that it was unable to transmit.

However, using ENOBUFS can cause some network applications to produce
the following or similar errors:

"ping: sendto: No buffer space available"

This is a bit confusing as the real cause of the issue is that the
network device is down.

Replace the ENOBUFS return with ENETDOWN to indicate more clearly that
the reason for the failure to send is due to the network device is
offline.

This will cause the error message to be reported as

"ping: sendto: Network is down"

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, sbruno@, bz@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19652
2019-03-28 20:46:45 +00:00
Eric Joyner
aac9c817af iflib: hold the CTX lock in iflib_pseudo_register
From Jake:
The iflib_device_register function takes the CTX lock before calling
IFDI_ATTACH_PRE, and releases it upon finishing the registration.

Mirror this process in iflib_pseudo_register, so that we always hold the
CTX lock during the attach process when registering a pseudo interface
or a regular interface.

This was caught by code inspection while attempting to analyze where the
CTX lock was held.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, erj@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19604
2019-03-28 20:43:47 +00:00
Eric Joyner
10a1e981d4 iflib: mark isc_driver_version as constant
From Jake:
The iflib core never modifies the isc_driver_version string. Allow
drivers to safely assign pointers to constant buffers by marking this
parameter const.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, jhb@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19577
2019-03-19 23:44:26 +00:00
Eric Joyner
1b9d93948a iflib: expose the Rx mbuf buffer size to drivers
From Jake:
iflib_fl_setup calculates a suitable buffer size for the Rx mbufs based
on the isc_max_frame_size value that drivers setup. This calculation is
repeated by drivers when programming their hardware with the size of
each Rx buffer.

This can lead to a mismatch where the iflib mbuf size is different from
the expected size of the buffer as programmed by the hardware. This can
lead to unexpected results.

If iflib ever wants to support mbuf sizes larger than one page, every
driver must be updated to account for the new possible buffer sizes.

Fix this by calculating the mbuf size prior to calling IFDI_INIT, and
adding the iflib_get_rx_mbuf_sz function which will expose this value to
drivers, so that they do not repeat the same calculation.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, erj@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19489
2019-03-19 17:59:56 +00:00
Eric Joyner
3e8d1bae5f iflib: prevent possible infinite loop in iflib_encap
From Jake:
iflib_encap calls bus_dmamap_load_mbuf_sg. Upon it returning EFBIG, an
m_collapse and an m_defrag are attempted to shrink the mbuf cluster to
fit within the DMA segment limitations.

However, if we call m_defrag, and then bus_dmamap_load_mbuf_sg returns
EFBIG on the now defragmented mbuf, we will continuously re-call
bus_dmamap_load_mbuf_sg over and over.

This happens because m_head isn't NULL, and remap is >1, so we don't try
to m_collapse or m_defrag again. The only way we exit the loop is if
m_head is NULL. However, m_head can't be modified by the call to
bus_dmamap_load_mbuf_sg, because we don't pass it as a double pointer.

I believe this will be an incredibly rare occurrence, because it is
unlikely that bus_dmamap_load_mbuf_sg will actually fail on the second
defragment with an EFBIG error. However, it still seems like
a possibility that we should account for.

Fix the exit check to ensure that if remap is >1, we will also exit,
even if m_head is not NULL.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19468
2019-03-19 17:49:03 +00:00
Eric Joyner
bc408c7d61 Remove references to CONTIGMALLOC_WORKS in iflib and em
From Jake:
"The iflib_fl_setup() function tries to pick various buffer sizes based
on the max_frame_size value defined by the parent driver. However, this
code was wrapped under CONTIGMALLOC_WORKS, which was never actually
defined anywhere.

This same code pattern was used in if_em.c, likely trying to match
what iflib uses.

Since CONTIGMALLOC_WORKS is not defined, remove this dead code from
iflib_fl_setup and if_em.c

Given that various iflib drivers appear to be using a similar
calculation, it might be worth making this buffer size a value that the
driver can peek at in the future."

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19199
2019-03-05 19:12:51 +00:00
Stephen Hurd
ca62461bc6 iflib: Improve return values of interrupt handlers.
iflib was returning FILTER_HANDLED, in cases where FILTER_STRAY was more
correct. This potentially caused issues with shared legacy interrupts.

Driver filters returning FILTER_STRAY are now properly handled.

Submitted by:	Augustin Cavalier <waddlesplash@gmail.com>
Reviewed by:	marius, gallatin
Obtained from:	Haiku (a84bb9, 4947d1)
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D19201
2019-02-15 18:51:43 +00:00