Commit Graph

195 Commits

Author SHA1 Message Date
Stephen Hurd
5c1d8c4b73 Split out flag manipulation from general context manipulation in iflib
To avoid blocking on the context lock in the swi thread and risk potential
deadlocks, this change protects lighter weight updates that only need to
be consistent with each other with their own lock.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	shurd
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14967
2018-04-10 19:48:24 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Mark Johnston
18628b74bd Clamp IFLIB_RX_COPY_THRESH to MHLEN in iflib_rxd_pkt_get().
If one has added fields to struct mbuf such that MHLEN is smaller than
this threshold (128), iflib_rxd_pkt_get() may otherwise overrun the
internal mbuf buffer while copying.

Reviewed by:	mmacy
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14843
2018-03-25 23:23:19 +00:00
Stephen Hurd
226fb85d19 iflib: stop timer callout when stopping
iflib_timer has been seen running after the interface had been removed.
This change prevents that.

Submitted by:	matt.macy@joyent.com
2018-03-02 18:48:07 +00:00
Navdeep Parhar
7cb7c6e37a Catch up with the removal of nktr_slot_flags from upstream netmap. No
functional impact intended.

Submitted by:	Vincenzo Maffione <v.maffione@gmail.com>
2018-02-20 21:42:45 +00:00
Stephen Hurd
a4e5960730 IFLIB: do not remove dmamap on buffer unload
Dmamap is created only on IFC attach. If we remove it on
buffer release, we won't be able to do ifconfig down&up. Only destroy
when in detach.

Reported by:	wma
Reviewed by:	wma
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14060
2018-02-20 18:33:45 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Pedro F. Giffuni
443133416b net*: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:21:51 +00:00
Stephen Hurd
9c58cafaa3 Don't pass rids to taskqgroup_attach()
As everywhere else, we want to pass rman_get_start(irq->ii_res).  This
caused set affinity errors when not using MSI-X vectors (legacy and MSI
interrupts).

Reported by:	sbruno
Sponsored by:	Limelight Networks
2017-12-27 20:42:30 +00:00
Stephen Hurd
ca03863c85 Remove assertion that's not true for !EARLY_AP_STARTUP
gtask->gt_taskqueue is NULL when EARLY_AP_STARTUP is not enabled.
Remove assertion to allow this config to work.

Reported by:	oleg
Sponsored by:	Limelight Networks
2017-12-27 19:14:15 +00:00
Stephen Hurd
de13095409 Fix indentation.
Sponsored by:	Limelight Networks
2017-12-27 19:12:32 +00:00
Konstantin Belousov
97755e83f5 Fix build for kernels with SCHED_4BSD.
Sponsored by:	The FreeBSD Foundation
2017-12-21 23:05:13 +00:00
Stephen Hurd
25ac1dd5c7 Don't call tcp_lro_rx() unless hardware verified TCP/UDP csum
It seems that tcp_lro_rx() doesn't verify TCP checksums, so
if there are bad checksums in the packets caused by invalid data, the
invalid data will pass through without errors.

This was noticed with the igb driver and a specific internet host:
fetch http://www.mpfr.org/mpfr-current/mpfr-3.1.6.tar.xz -o test.bin && sha256 test.bin
Would result in a different value sometimes.

This ends up making LRO require RXCSUM to be enabled, and RXCSUM to
support TCP and UDP checksums.

PR:		224346
Reported by:	gjb
Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13561
2017-12-21 01:22:36 +00:00
Li-Wen Hsu
40cf51c438 Add missing ;
Approved by:	kevlo
2017-12-20 06:08:16 +00:00
Stephen Hurd
b103855e18 Support attaching tx queues to cpus
This will attempt to use a different thread/core on the same L2
cache when possible, or use the same cpu as the rx thread when not.
If SMP isn't enabled, don't go looking for cores to use. This is mostly
useful when using shared TX/RX queues.

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12446
2017-12-20 01:03:34 +00:00
Stephen Hurd
96fc97c81f Update Matthew Macy contact info
Email address has changed, uses consistent name (Matthew, not Matt)

Reported by:	Matthew Macy <mmacy@mattmacy.io>
Differential Revision:	https://reviews.freebsd.org/D13537
2017-12-19 17:59:00 +00:00
Stephen Hurd
06c47d48dc Increment encap_pad_mbuf_fail when m_dup() fails in padding
Previously, the counter was only incremented when m_append() failed.  Since
the function can also fail on m_dup() now, increment the counter there as
well.

Sponsored by:	Limelight Networks
2017-12-11 20:01:28 +00:00
Stephen Hurd
04993890dc Free mbuf chain when m_dup fails
Fix memory leak where mbuf chain wasn't free()d if iflib_ether_pad()
has a failure in m_dup().

Reported by:	"Ryan Stone" <rysto32@gmail.com>
Sponsored by:	Limelight Networks
2017-12-08 19:50:06 +00:00
Stephen Hurd
a15fbbb8fe Handle read-only mbufs in iflib ether pad function
If ethernet padding is enabled, and a read-only mbuf is passed,
it would modify the mbuf using m_append(). Instead, call m_dup() and
append to the new packet.

Reported by:	Pyun YongHyeon
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13414
2017-12-08 18:43:31 +00:00
Stephen Hurd
d14c853ba3 iflib: Support to padding Ethernet frames to a min size
Some bnxt devices do not correctly send frames smaller than
52 bytes (without CRC), so add a quirk that will pad frames to an
arbitrary size before passing off to the encap routine.

Reported by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13269
2017-12-05 21:00:31 +00:00
Stephen Hurd
fe1bcada9b Avoid calling CURVNET_[SET|RESTORE] for each packet
The LRO possible test was calling CURVNET_SET once for IPv4 or IPv6 for
each packet in a chain. Only call it once per chain instead.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	cem, ae
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13368
2017-12-05 20:43:24 +00:00
Stephen Hurd
a027c8e958 Add support for SIOCGIFXMEDIA to iflib
SIOCGIFXMEDIA is required for extended ethernet media types,
but iflib did not support it.

Reported by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13312
2017-12-01 17:58:20 +00:00
Stephen Hurd
772593dbd9 Fix comment introduced in r326369
The code uses the set of all CPUs, it doesn't zero out the set.

Sponsored by:	Limelight Networks
2017-11-29 18:21:17 +00:00
Stephen Hurd
e516b5353b Ensure that ctx->ifc_cpus is always initialized
If a device didn't support MSI-X, ctx->ifc_cpus would not be initialized,
but the IRQ allocation routines still uses the value.  Move the
initialization to common code.

Sponsored by:	Limelight Networks
2017-11-29 18:14:57 +00:00
Stephen Hurd
7274b2f6be Fix off-by-one error in bit_nclear() usage
bit_nclear() takes the bit numbers for the start and end bits, not the start
and a count.  This was resulting in memory corruption past the end of the
bitstr_t.

Sponsored by:	Limelight Networks
2017-11-20 21:57:04 +00:00
Stephen Hurd
d27352644a Fix default numbers of iflib queue sets
The intent appears to be having one RX/TX queue set per core,
but since scctx->isc_n[tr]xqsets is set to max before calling
iflib_msix_init(), both end up being set to total number of cores.

Use ctx->ifc_sysctl_n[rt]xqs as the selected value and
scctx->isc_n[rt]xqsets as the max. This should result in what appears
to be the intended behaviour

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D13096
2017-11-16 18:52:58 +00:00
Sean Bruno
abec47242a Fix NOINET/NOINET6 build during compilation of iflib.
Reported by:	kib
2017-11-06 19:54:25 +00:00
Stephen Hurd
35e4e998d8 Only chain non-LRO mbufs when LRO is not possible
Preserve packet order between tcp_lro_rx() and if_input() to avoid
creating extra corner cases. If no packets can be LROed, combine them
into one chain for submission via if_input(). If any packet can
potentially be LROed however, retain old behaviour and call if_input()
for each packet.

This should keep the 12% improvement for small packet forwarding intact,
but mostly avoids impacting the LRO case.

Reviewed by:	cem, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12876
2017-11-06 16:23:21 +00:00
Stephen Hurd
0b6c52b69f Preserve TSO checksum flags
r323941 incorrectly disabled TSO flags based on MTU.

Reported by:	Yuri Pankov <yuripv@gmx.com>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12880
2017-10-31 19:03:35 +00:00
Stephen Hurd
a1b799ca5b Fix PR221990 - Assertion at iflib.c:1947
ifl_pidx and ifl_credits are going out of sync in _iflib_fl_refill() as they
use different update log.  Use the same update logic for both, and add a
final call to isc_rxd_refill() to handle early exits from the loop.

PR:		221990
Reported by:	pho
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12798
2017-10-31 17:50:42 +00:00
Stephen Hurd
10e0d93811 Fix build with nodevice netmap
iru_init() was declared and used outside the DEV_NETMAP
conditional blocks, but was implemented inside one. Move the
implementation out of the DEV_NETMAP block to allow building with
netmap disabled.

Reported by:	Andrew Turner <andrew@fubar.geek.nz>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12842
2017-10-31 02:49:28 +00:00
Stephen Hurd
09b57b7f40 bnxt: HW_LRO Rx Pkt with > 32 fragments caused Crash (iflib)
Broadcom NIC with HW_LRO setting max_agg_segs >= 6 can generate Rx pkt with
64 (2^6) fragments, modify IFLIB_MAX_RX_SEGS to 64 to avoid memory
corruption / Crash.

Submitted by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed by:	shurd, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Broadcom Limited
Differential Revision:	https://reviews.freebsd.org/D12774
2017-10-30 21:20:33 +00:00
Stephen Hurd
2d873474b2 Fix PR222744 - netmap errors with iflib em driver
Fix error when refilling netmap buffers that resulted in the first
buffer of the successive passes through ifl_bus_addrs[] leaving the
first value unset (tmp_pidx started at 1, not zero after the first time
through the loop).

Leave the one unused buffer required by some NICs visible in the netmap
ring rather than hidden. There will always be a buffer in use by the
kernel now when an iflib driver is used via netmap.

Always get the netmap slot index via netmap_idx_n2k() to account for
nkr_hwofs in a consistent way.

Split shared functionality into new functions.
iru_init(): shared by _iflib_fl_refill() and netmap_fl_refill()
netmap_fl_refill(): shared by iflib_netmap_rxsync() and
iflib_netmap_rxq_init()

PR:		222744
Reported by:	Shirkdog <mshirk@daemon-security.com>
Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12769
2017-10-30 21:14:31 +00:00
Stephen Hurd
0fdea53954 Avoid enabling MSI-X if MSI-X is disabled globally
It was reported on the community call that with
hw.pci.enable_msix=0, iflib would enable MSI-X on the device and attempt
to use it, which caused issues. Test the sysctl explicitly and do not
enable MSI-X if it's disabled globally.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12805
2017-10-30 21:08:12 +00:00
Stephen Hurd
3429c02f82 Some cache related optimizations
1. prefetch 128 bytes of mbufs.
2. Re-order filling the pkt_info so cache stalls happen at the end
3. Define empty prefetch2cachelines() macro when the function isn't present.

Provides small performance improvments on some hardware

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12447
2017-10-23 20:50:08 +00:00
Stephen Hurd
1c0054d261 Fix "taskqgroup_attach: setaffinity failed: 3" with iflib drivers
Improved logging added in r323879 exposed an error during
attach. We need the irq, not the rid to work correctly. em uses
shared irqs, so it will use the same irq for TX as RX. bnxt does
not use shared irqs, or TX irqs at all, so there's no need to set
the TX irq affinity.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12496
2017-10-05 14:43:30 +00:00
Stephen Hurd
1225d9da9f Have ifmp_ring_enqueue() abdicate instead of switch to a consumer
Move TX out of the enqueue() path. As a result, we need
to have ifmp_ring_check_drainage() pick up from the abdicate state.

We also need to either enqueue the TX task, or check drainage
after calling ifmp_ring_enqueue() to ensure it's sent.

This change results in a 30% small packet forwarding improvement.

Reviewed by:	olivier, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12439
2017-09-23 16:46:30 +00:00
Stephen Hurd
f4d2154e0c Make the rx budget a tunable
This allows tuning the rx budget for special load profiles
as well as more easily testing to determine sane defaults.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12445
2017-09-23 01:37:01 +00:00
Stephen Hurd
20f63282f8 Chain mbufs before passing to if_input()
Build a list of mbufs to pass to if_input() after LRO. Results in
12% small packet forwarding rate improvement.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12444
2017-09-23 01:35:14 +00:00
Stephen Hurd
c5cf217261 Some small packet performance improvements
If the packet is smaller than MTU, disable the TSO flags.
Move TCP header parsing inside the IS_TSO?() test.
Add a new IFLIB_NEED_ZERO_CSUM flag to indicate the checksums need to be zeroed before TX.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12442
2017-09-23 01:33:20 +00:00
Stephen Hurd
d0d0ad0ae2 Fix iflib netmap RX
RXQ setup for netmap was broken because netmap_rxq_init was getting called
before IFDI_INIT - thus we ended up with ring tail pointer being reset to zero.

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12140
2017-09-20 20:40:49 +00:00
Stephen Hurd
ab2e3f7958 Revert r323516 (iflib rollup)
This was really too big of a commit even if everything worked, but there
are multiple new issues introduced in the one huge commit, so it's not
worth keeping this until it's fixed.

I'll work on splitting this up into logical chunks and introduce them one
at a time over the next week or two.

Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
2017-09-16 02:41:38 +00:00
Stephen Hurd
d300df0182 Roll up iflib commits from github. This pulls in most of the work done
by Matt Macy as well as other changes which he has accepted via pull
request to his github repo at https://github.com/mattmacy/networking/

This should bring -CURRENT and the github repo into close enough sync to
allow small feature branches rather than a large chain of interdependant
patches being developed out of tree.  The reset of the synchronization
should be able to be completed on github by splitting the remaining
changes that are not yet ready into short feature branches for later
review as smaller commits.

Here is a summary of changes included in this patch:

1)  More checks when INVARIANTS are enabled for eariler problem
    detection
2)  Group Task Queue cleanups
    - Fix use of duplicate shortdesc for gtaskqueue malloc type.
      Some interfaces such as memguard(9) use the short description to
      identify malloc types, so duplicates should be avoided.
3)  Allow gtaskqueues to use ithreads in addition to taskqueues
    - In some cases, this can improve performance
4)  Better logging when taskqgroup_attach*() fails to set interrupt
    affinity.
5)  Do not start gtaskqueues until they're needed
6)  Have mp_ring enqueue function enter the ABDICATED rather than BUSY
    state.  This moves the TX to the gtaskq and allows processing to
    continue faster as well as make TX batching more likely.
7)  Add an ift_txd_errata function to struct if_txrx.  This allows
    drivers to inspect/modify mbufs before transmission.
8)  Add a new IFLIB_NEED_ZERO_CSUM for drivers to indicate they need
    checksums zeroed for checksum offload to work.  This avoids modifying
    packet data in the TX path when possible.
9)  Use ithreads for iflib I/O instead of taskqueues
10) Clean up ioctl and support async ioctl functions
11) Prefetch two cachlines from each mbuf instead of one up to 128B.  We
    often need to parse packet header info beyond 64B.
12) Fix potential memory corruption due to fence post error in
    bit_nclear() usage.
13) Improved hang detection and handling
14) If the packet is smaller than MTU, disable the TSO flags.
    This avoids extra packet parsing when not needed.
15) Move TCP header parsing inside the IS_TSO?() test.
    This avoids extra packet parsing when not needed.
16) Pass chains of mbufs that are not consumed by lro to if_input()
    rather call if_input() for each mbuf.
17) Re-arrange packet header loads to get as much work as possible done
    before a cache stall.
18) Lock the context when calling IFDI_ATTACH_PRE()/IFDI_ATTACH_POST()/
    IFDI_DETACH();
19) Attempt to distribute RX/TX tasks across cores more sensibly,
    especially when RX and TX share an interrupt.  RX will attempt to
    take the first threads on a core, and TX will attempt to take
    successive threads.
20) Allow iflib_softirq_alloc_generic() to request affinity to the same
    cpus an interrupt has affinity with.  This allows TX queues to
    ensure they are serviced by the socket the device is on.
21) Add new iflib sysctls to net.iflib:
    - timer_int - interval at which to run per-queue timers in ticks
    - force_busdma
22) Add new per-device iflib sysctls to dev.X.Y.iflib
    - rx_budget allows tuning the batch size on the RX path
    - watchdog_events Count of watchdog events seen since load
23) Fix error where netmap_rxq_init() could get called before
    IFDI_INIT()
24) e1000: Fixed version of r323008: post-cold sleep instead of DELAY
    when waiting for firmware
    - After interrupts are enabled, convert all waits to sleeps
    - Eliminates e1000 software/firmware synchronization busy waits after
      startup
25) e1000: Remove special case for budget=1 in em_txrx.c
    - Premature optimization which may actually be incorrect with
      multi-segment packets
26) e1000: Split out TX interrupt rather than share an interrupt for
    RX and TX.
    - Allows better performance by keeping RX and TX paths separate
27) e1000: Separate igb from em code where suitable
    Much easier to understand separate functions and "if (is_igb)" than
    previous tests like "if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))"

#blamebruno

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12235
2017-09-13 01:18:42 +00:00
Gleb Smirnoff
2cc3b2eec5 Do not abuse flag that is clearly marked as unused.
This creates conflicts with FreeBSD variations that may use it.  The
usage of the flag M_TOOBIG is limited to iflib queue, thus using
one of M_PROTO flags is fine.  There is no need to grab global flag.

Silence from:	kmacy, sbruno (2 weeks)
2017-08-31 23:19:18 +00:00
Sean Bruno
a969350226 Revert r323008 and its conversion of e1000/iflib to using SX locks.
This seems to be missing something on the 82574L causing NFS root mounts
to hang.

Reported by:	kib
2017-08-30 18:56:24 +00:00
Sean Bruno
e17e5b4134 Continuation of lock cleanup in e1000.
Post-cold sleep instead of DELAY when waiting for firmware.

Convert softc mutex to an SX lock.  Change all waits to sleeps
once interrupts are enabled (and it is safe to sleep).

Submitted by:	Matt Macy <matt@mattmacy.io>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12101
2017-08-30 00:20:43 +00:00
Sean Bruno
21e10b16e0 iflib: call device's if_init function during vlan initialization.
Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	shurd
Sponsored by:	Broadcom
Differential Revision:	https://reviews.freebsd.org/D12098
2017-08-23 21:49:56 +00:00
Sean Bruno
5c5ca36ca2 Don't leak mbufs if clusers exceeds the number of segments. This would
leak mbufs over time causing crashes.

PR:		221202
Submitted by:	Matt Macy <matt@mattmacy.io>
Reported by:	gergely.czuczy@harmless.hu
Sponsored by:	Limelight Networks
2017-08-10 03:43:23 +00:00
Sean Bruno
18a660b344 Export IFCAP_HWSTATS so that we don't experience double stats counting
on iflib enabled devices.

PR:		220198
Submitted by:	Matt Macy <matt@mattmacy.io>
Reported by:	Ben Woods <woodsb02@freebsd.org>
Sponsored by:	Limelight Networks
2017-08-10 03:11:05 +00:00
Sean Bruno
9d35858f48 Slight restructure of iflib_busdma_load_mbuf_sg() to fix accounting
when m_collapse() fails.

Submitted by:	krzystof.galazka@intel.com
Reviewed by:	Jeb Cramer <cramerj@intel.com>
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D11476
2017-07-27 22:53:47 +00:00
Dimitry Andric
9d0a88de9b Fix printf format warning in iflib.c
Clang 5.0.0 got better warnings about printf format strings using %zd,
and this leads to the following -Werror warning on e.g. arm:

    sys/net/iflib.c:1517:8: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'bus_size_t' (aka 'unsigned long') [-Werror,-Wformat]
                                              sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
                                              ^~~~~~~~~~~~~~~~~~~~
    sys/net/iflib.c:1517:41: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'bus_size_t' (aka 'unsigned long') [-Werror,-Wformat]
                                              sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
                                                                               ^~~~~~~~~~~~~~~~~~~~~~~

Fix this by casting bus_size_t arguments to uintmax_t, and using %ju
instead.

Reviewed by:	emaste
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D11679
2017-07-20 20:28:31 +00:00
Sean Bruno
dcbc025ff1 Don't cache mbuf pointers if the number of descriptors is greater than
the number of buffers.

Submitted by:	Matt Macy <mmacy@mattmacy.io>
Sponsored by:	Limelight Networks
2017-07-19 21:18:04 +00:00
Sean Bruno
25d528119f iflib - flib_busdma_load_mbuf_sg used isc_tx_maxsize as max semgent size.
Submitted by:	krzysztof.galazka@intel.com
Differential Revision:	https://reviews.freebsd.org/D11403
2017-07-03 19:23:45 +00:00
Sean Bruno
87890dbaf6 bnxt(4) Enable LRO support, redux
iflib - reset fl-ifl_fragidx to 0 on iflib_fl_bufs_free().  This caused the
panic in em/igb when adding it to a bridge device.

iflib - Handle out of order packet delivery from hardware in support of LRO

Out of order updates to rxd's is fixed in r315217. However, it is not
completely fixed.  While refilling the buffers, iflib is not considering
the out of order descriptors. Hence, it is refilling sequentially.
"idx" variable in _iflib_fl_refill routine is incremented sequentially.
By doing refilling sequentially, it will override the SGEs that
are *IN USE* by other connections.  Fix is to maintain a bitmap of
rx descriptors and differentiate the used one with unused one and
refill only at the unused indices.  This patch also fixes a
few bugs in bnxt, related to the same feature.

Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	venkatkumar.duvvuru@broadcom.com shurd
Differential Revision:	https://reviews.freebsd.org/D10681
2017-07-03 18:23:35 +00:00
Sean Bruno
fa5416a819 Revert r319989 "bnxt(4) Enable LRO support"
This generates startup LORs and panics when adding elements to bridge
devices. I will document further in https://reviews.freebsd.org/D10681

PR:	220073
Submitted by:	dchagin
Reported by:	db
2017-06-17 17:42:52 +00:00
Sean Bruno
51a621f7c0 bnxt(4) Enable LRO support
iflib - Handle out of order packet delivery from hardware in support of LRO

Out of order updates to rxd's is fixed in r315217. However, it is not
completely fixed.  While refilling the buffers, iflib is not considering
the out of order descriptors. Hence, it is refilling sequentially.
"idx" variable in _iflib_fl_refill routine is incremented sequentially.
By doing refilling sequentially, it will override the SGEs that
are *IN USE* by other connections.  Fix is to maintain a bitmap of
rx descriptors and differentiate the used one with unused one and
refill only at the unused indices.  This patch also fixes a
few bugs in bnxt, related to the same feature.

Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	shurd@
Differential Revision:	https://reviews.freebsd.org/D10681
2017-06-15 21:06:03 +00:00
Sean Bruno
3600bd1e38 Revert r319921 which seems to cause NFS booting assertion panics in
various configurations.

Reported by:	pho@
2017-06-15 17:46:20 +00:00
Sean Bruno
f7587db036 Add new sysctl to allow changing of timing of the txq timers.
Add new sysctl to override use of busdma in the driver.

Submitted by:	Drew Gallitin <gallatin@netflix.com>
2017-06-13 23:16:38 +00:00
Sean Bruno
aa8fa07cd8 Plug mbuf leak in the busdma path of iflib.
Submitted by:	Michael Tuexen <tuexen@freebsd.org>
Reported by:	Drew Gallitin <gallatin@netflix.com>
2017-06-13 19:32:23 +00:00
Bjoern A. Zeeb
1d898b914e Adjust a comment.
MFC after:	3 days
2017-05-09 08:29:55 +00:00
Sean Bruno
60596476cf Move pause frame counter out of struct if_ctx and into struct if_softc_ctx_t
so that we can use it in iflib to detect pause frames.

The igb(4) driver definitely used to use this in its old timer function and
I see no reason to restrict it to that driver only.

Sponsored by:	Limelight Networks
2017-04-07 00:33:03 +00:00
Sean Bruno
ea351d3f14 Allow MSIX to be turned off by tuneable per interface, per driver.
Sponsored by:	Limelight Networks
2017-04-04 21:03:34 +00:00
Sean Bruno
2b2fc97356 Don't call init functions directly from the timer/watchdog function.
Enqueue this in the admin task now that it can process it.

Submitted by:	Matt Macy <mmacy@nextbsd.org>
Sponsored by:	Limelight Networks
2017-03-30 16:54:01 +00:00
Sean Bruno
5c1ff25517 Assert IFF_DRV_OACTIVE in iflib_timer() when the "hung" case is detected
so that iflib's admin task can still process the reset directive and restore
functionality.

Sponsored by:	Limelight Networks
2017-03-30 16:03:51 +00:00
Sean Bruno
5e88838850 Change casting to a uintptr_t to be compatible with non-x86 architectures.
Submitted by:	Matt Macy <mmacy@nextbsd.org>
Reported by:	rpokala
Sponsored by:	Limelight Networks
2017-03-14 22:25:07 +00:00
Sean Bruno
0a1b74a3d1 Fixup LINT by using uint64_t type as we do on all other calls to PNMB()
Found with Jenkins.

Reported by:	lwshu
Sponsored by:	Limelight Networks
2017-03-14 15:08:56 +00:00
Sean Bruno
95246abb21 IFLIB updates
- unconditionally enable BUS_DMA on non-x86 architectures
- speed up rxd zeroing via customized function
- support out of order updates to rxd's
- add prefetching to hardware descriptor rings
- only prefetch on 10G or faster hardware
- add seperate tx queue intr function
- preliminary rework of NETMAP interfaces, WIP

Submitted by:	Matt Macy <mmacy@nextbsd.org>
Sponsored by:	Limelight Networks
2017-03-13 22:53:06 +00:00
Sean Bruno
d945ed6472 Make gtaskqueue compatible with drm-next such that they can be used with the
linuxkpi tasklets.

Submitted by:	mmacy@nextbsd.org
Reported by:	hps
2017-03-01 18:37:35 +00:00
Pedro F. Giffuni
e099b90b80 sys: Replace zero with NULL for pointers.
Found with:	devel/coccinelle
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D9694
2017-02-22 02:35:59 +00:00
Sean Bruno
67af525c55 Delete duplicate break. 2017-02-04 18:25:09 +00:00
Sean Bruno
835809f99f Fix i386 compile failure by moving needed closing parenthesis out of
conditional block.

Submitted by:   hiren
Reported by:    cy
2017-01-28 15:44:14 +00:00
Sean Bruno
e035717e57 IFLIB updates:
We found routing performance dropped significantly when configuring
FreeBSD as a router, we are applying the following changes in order to
resolve those issues and hopefully perform better.
 - don't prefetch the flags array, we usually don't need it
 - prefetch the next cache line of each of the software descriptor arrays as
   well as the first cache line of each of the next four packets' mbufs and
   clusters
 - reduce max copy size to 63 bytes
 - convert rx soft descriptors from array of structures to a structure of arrays
 - update copyrights

Submitted by:	Matt Macy <mmacy@nextbsd.org>
2017-01-27 23:08:06 +00:00
Sean Bruno
96eeabefbe Replace customized busmaster code with standardized setup call.
Reported by:	jhb
2017-01-27 22:30:27 +00:00
Sean Bruno
69b7fc3e67 Minor style annoyance.
Submitted by:	bde
2017-01-26 13:50:09 +00:00
Sean Bruno
f7ae9a84e3 Add error checking to the pci_find_cap(, PCIY_MSIX,) call that is returns
success and a good value.  Only then try to use it and set the MSIX_ENABLE
bit.

With the current em(4) driver we have observed failures in this case in a
specific environment when pci_find_cap() would not return the assumed
value, which meant we ended up writing to PCI register 2 (PCI_DEVICE_ID)
which is read-only.

PR:		216456
Submitted by:	bz
2017-01-25 14:37:05 +00:00
Sean Bruno
bd84f70044 iflib:
Add internal tracking of smp startup status to reliably figure out
     what methods are to be used to get gtaskqueue up and running.

e1000:
     Calculating this pointer gives undefined behaviour when (last == -1)
     (it is before the buffer).  The pointer is always followed.  Panics
     occurred when it points to an unmapped page.  Otherwise, the pointed-to
     garbage tends to not have the E1000_TXD_STAT_DD bit set in it, so in the
     broken case the loop was usually null and the function just returned, and
     this was acidentally correct.

Submitted by:	bde
Reported by:	Matt Macy <mmacy@nextbsd.org>
2017-01-24 16:05:42 +00:00
Sean Bruno
36fa5d5b64 Revert 312696 due to build tests. 2017-01-24 15:55:52 +00:00
Sean Bruno
562a3182f6 iflib:
Add internal tracking of smp startup status to reliably figure out
   what methods are to be used to get gtaskqueue up and running.

e1000:
   Calculating this pointer gives undefined behaviour when (last == -1)
   (it is before the buffer).  The pointer is always followed.  Panics
   occurred when it points to an unmapped page.  Otherwise, the pointed-to
   garbage tends to not have the E1000_TXD_STAT_DD bit set in it, so in the
   broken case the loop was usually null and the function just returned, and
   this was acidentally correct.

Submitted by:	bde
Reviewed by:	Matt Macy <mmacy@nextbsd.org>
2017-01-24 14:48:32 +00:00
Sean Bruno
4ecb427a49 Fix hangs in a uniprocessor configuration (qemu, virtualbox, real hw).
sys/net/iflib.c:
  Add ctx to filter_info and don't skpi interrupt early on unless we're on an
  SMP system

sys/kern/subr_gtaskqueue.c:
  Skip smp check if we're running UP

Submitted by:	Matt Macy <mmacy@nextbsd.org>
Reported by:	emaste bde
2017-01-15 00:50:10 +00:00
Sean Bruno
5b51fcfc7a Remove unused mtx_held() macro. 2017-01-09 23:41:10 +00:00
Sean Bruno
1248952a50 2017 IFLIB updates in preparation for commits to e1000 and ixgbe.
- iflib - add checksum in place support (mmacy)
- iflib - initialize IP for TSO (going to be needed for e1000) (mmacy)
- iflib - move isc_txrx from shared context to softc context (mmacy)
- iflib - Normalize checks in TXQ drainage. (shurd)
- iflib - Fix queue capping checks (mmacy)
- iflib - Fix invalid assert, em can need 2 sentinels (mmacy)
- iflib - let the driver determine what capabilities are set and what
          tx csum flags are used (mmacy)
- add INVARIANTS debugging hooks to gtaskqueue enqueue (mmacy)
- update bnxt(4) to support the changes to iflib (shurd)

Some other various, sundry updates.  Slightly more verbose changelog:

Submitted by:	mmacy@nextbsd.org
Reviewed by:	shurd
mFC after:
Sponsored by:	LimeLight Networks and Dell EMC Isilon
2017-01-02 00:56:33 +00:00
Sean Bruno
da69b8f9d1 iflib updates and fixes:
-    reset gen on down
-    initialize admin task statically
-    drain mp_ring on down
-    don't drop context lock on stop
-    reset error stats on down
-    fix typo in min_latency sysctl
-    return ENOBUFS from if_transmit if the driver isn't running or the link is down

Submitted by:	mmacy@nextbsd.org
Reviewed by:	shurd
MFC after:	2 days
Sponsored by:	Isilon and Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D8558
2016-11-18 04:19:21 +00:00
Sean Bruno
2fe66646bd Set default capabilities at attach.
ref: 6425f45e5f

Submitted by:	mmacy@nextbsd.org
2016-10-18 14:02:45 +00:00
Sean Bruno
add6f7d069 When deciding whether or not to call tqg_attach_cpu(), reference rid
directly.

ref: c9b47b468b

Submitted by:	mmacy@nextbsd.org
2016-10-18 13:29:30 +00:00
Sean Bruno
8b2a1db901 Toggle v4/v6 rxcsum together
Only re-init if driver is running

ref: 106518e874

Submitted by:	mmacy@nextbsd.org
2016-10-18 13:22:44 +00:00
Sean Bruno
aa3c5dd8a8 Fix misusage of CPU_FFS when binding queues to cpus
ref: 922d0bdf22

Submitted by:	mmacy@nextbsd.org
2016-10-18 13:12:19 +00:00
Stephen Hurd
23ac9029f9 Update iflib to support more NIC designs
- Move group task queue into kern/subr_gtaskqueue.c
- Change intr_enable to return an int so it can be detected if it's not
  implemented
- Allow different TX/RX queues per set to be different sizes
- Don't split up TX mbufs before transmit
- Allow a completion queue for TX as well as RX
- Pass the RX budget to isc_rxd_available() to allow an earlier return
  and avoid multiple calls

Submitted by:	shurd
Reviewed by:	gallatin
Approved by:	scottl
Differential Revision:	https://reviews.freebsd.org/D7393
2016-08-12 21:29:44 +00:00
John Baldwin
f454e7ebf5 Add __printflike() to bus_describe_intr() to enable -Wformat checks.
Fix a few places that were passing a raw string as the format to use
a "%s" format string instead.

MFC after:	2 months
2016-08-04 18:29:16 +00:00
Conrad Meyer
91d546a05d iflib: Fix typo in 'iflib_rx_miss_bufs' sysctl name
It looks like these sysctls were copy-pasted from netmap.  Most were changed
from 'ixl_' prefix to 'iflib_', but this one was missed.

Fix the "can't re-use a leaf (ixl_rx_miss_bufs)!" warning.

Reported by:	dim@ and others
Sponsored by:	EMC / Isilon Storage Division
2016-07-08 17:04:21 +00:00
Nathan Whitehorn
96c85efb4b Replace a number of conflations of mp_ncpus and mp_maxid with either
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.

There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.

PR:		kern/210106
Reviewed by:	jhb
Approved by:	re (glebius)
2016-07-06 14:09:49 +00:00
Conrad Meyer
0d0338afc9 iflib: Improve cleanup on iflib_queues_alloc error path
Fix some memory leaks.  Some may remain.

Reported by:	Coverity
Discussed with:	mmacy
CIDs:		1356036, 1356037, 1356038
Sponsored by:	EMC / Isilon Storage Division
2016-06-07 20:26:00 +00:00
Conrad Meyer
16fb86ab35 iflib: Fix potential leak in iflib_if_transmit
Due to an accidental mismatch between allocation and release in the slow path
of iflib_if_transmit, if a caller passed 9-16 mbufs to the routine, the mbuf
array would be leaked.

Fix the mismatch by removing the magic numbers in favor of nitems() on the
stack array.  According to mmacy, this leak is unlikely.

Reported by:	Coverity
Discussed with:	mmacy
CID:		1356040
Sponsored by:	EMC / Isilon Storage Division
2016-06-07 19:49:08 +00:00
Scott Long
c7762913ac Remove assertions that don't make sense for the data type. 2016-05-18 15:44:45 +00:00
Bjoern A. Zeeb
aaeb188af3 Make compile without INET or without IP support in the kernel by hiding
variables and lro function calls behind approriate #ifdefs.

Also move the #includes for "opt_*" to the place where they should be.
2016-05-18 14:18:03 +00:00
Scott Long
4c7070db25 Import the 'iflib' API library for network drivers. From the author:
"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional.  The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation.  ixl and ixgbe driver conversions will be committed
shortly.  We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by:   mmacy@nextbsd.org
Reviewed by:    gallatin
Differential Revision:  D5211
2016-05-18 04:35:58 +00:00