This feature masks TSO capability when a link comes up at 10 or 100mbit
due to errata on the chips. This behavior matches previous versions of
FreeBSD as well as NetBSD and Linux.
A tunable, hw.em.unsupported_tso may be set if the admin desires to
disabling automasking and configure TSO settings manually.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41170
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
The ifp (struct ifnet) backpointer in the e1000 private ifnet
data is not used anymore since the iflib transition.
Remove it so that developers are not tempted to use it and
get a NULL pointer dereference.
Reviewed by: markj, kbowling, erj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33157
This is useful for diagnosing problems. In particular, the errata
sheets identify the EEPROM version for many fixes.
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32333
Rename the 'struct adapter' to 'struct e1000_sc' to avoid type ambiguity
in things like kgdb.
Reviewed by: jhb, markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D32129
* Fix 82574 Link Status Changes, carrying the OTHER mask bit around as
needed.
* Move igb-class LSC re-arming out of FAST back into the handler.
* Clarify spurious/other interrupt re-arms in FAST.
In MSI-X mode, 82574 and igb-class devices use an interrupt filter to
handle Link Status Changes. We want to do LSC re-arms in the handler
to take advantage of autoclear (EIAC) single shot behavior.
82574 uses 'Other' in ICR and IMS for LSC interrupt types when in MSI-X
mode, so we need to set and re-arm the 'Other' bit during attach and
after ICR reads in the FAST handler if not an LSC or after handling on
LSC due to autoclearing.
This work was primarily done to address the referenced PR, but inspired
some clarification and improvement for igb-class devices once the
intentions of previous bug fix attempts became clearer.
PR: 211219
Reported by: Alexey <aserp3@gmail.com>
Tested by: kbowling (I210 lagg), markj (I210)
Approved by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29943
Current way, hardcoded value plus heuristic is not conform to the PCI(e)
specification and it fails on systems where MSI-X bar is not initialized by
BIOS/ACPI (many arm or arm64 systems for example).
Instead, use the standard PCI(e) capability for determining of
MSIX table bar address.
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27265
This re-adds the opt_rss.h header to the driver and includes some
RSS-specific headers when RSS is defined.
PR: 249191
Submitted by: Milosz Kaniewski <milosz.kaniewski@gmail.com>
MFC after: 3 days
A couple of drivers and one place in if.c use ETH_ADDR_LEN, even though
net/ethernet.h provides an equivalent ETHER_ADDR_LEN definition.
Cleanup all of the locations which refer to ETH_ADDR_LEN to use the
standard ETHER_ADDR_LEN instead.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, jpaetzel@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21239
r241119 that's performed globally by device_attach(9).
- As for the EM-class of devices, em(4) supports multiple queues
and MSI-X respectively only with 82574 devices. However, since
the conversion to iflib(4), em(4) relies on the interrupt type
fallback mechanism, i. e. MSI-X -> MSI -> INTx, of iflib(4) to
figure out the interrupt type to use for the EM-class (as well
as the IGB-class) of MACs. Moreover, despite the datasheet for
82583V not mentioning any support of MSI-X, there actually are
82583V devices out there that report a varying number of MSI-X
messages as supported. The interrupt type fallback of iflib(4)
is causing two failure modes depending on the actual number of
MSI-X messages supported for such instances of 82583V:
1) With only one MSI-X message supported, none is left for the
RX/TX queues as that one message gets assigned to the admin
interrupt. Worse, later on - which will be addressed with a
separate fix - iflib(4) interprets that one messages as MSI
or INTx to be set up, but fails to actually do so as it has
previously called pci_alloc_msix(9). [1, 2]
2) With more message supported, their distribution is okay but
then em_if_msix_intr_assign() doesn't work for 82583V, with
the interface being left in a non-working state, too. [3]
Thus, let em_if_attach_pre() indicate to iflib(4) to try MSI-X
with 82574 only, and at most MSI for the remainder of EM-class
devices.
While at it, remove "try_second_bar" as it's polarity inverted
and not actually needed.
- Remove code from em_if_timer() that effectively is a NOP since
the conversion to iflib(4) ("trigger" is no longer read).
While at it, let the comment for em_if_timer() reflect reality
after said conversion.
- Implement an ifdi_watchdog_reset method which only updates the
em(4) "watchdog_events" counter but doesn't perform any reset,
so that the em(4) "watchdog_timeouts" SYSCTL (iflib(4) doesn't
provide a counterpart) reflects reality and these timeouts add
to IFCOUNTER_OERRORS again after the iflib(4) conversion.
- Remove the "mbuf_defrag_fail" and "tx_dma_fail" SYSCTLS; since
the iflib(4) conversion, associated counters are disconnected,
but iflib(4) provides "mbuf_defrag_failed" and "tx_map_failed"
respectively as equivalents.
- Move the description preceding lem_smartspeed() to the correct
spot before em_reset() and bring back appropriate comments for
{igb,em}_initialize_rss_mapping() and lem_smartspeed() lost in
the iflib(4) conversion.
- Adapt some other function descriptions and INIT_DEBUGOUT() use
to match reality after the iflib(4) conversion.
- Put the debugging message of em_enable_vectors_82574() (missed
in r343578) under bootverbose, too.
PR: 219428 [1], 235246 [2], 235147 [3]
Reviewed by: erj (previous version)
Differential Revision: https://reviews.freebsd.org/D19108
bus_teardown_intr(9) before pci_release_msi(9).
- Ensure that iflib(4) and associated drivers pass correct RIDs to
bus_release_resource(9) by obtaining the RIDs via rman_get_rid(9)
on the corresponding resources instead of using the RIDs initially
passed to bus_alloc_resource_any(9) as the latter function may
change those RIDs. Solely em(4) for the ioport resource (but not
others) and bnxt(4) were using the correct RIDs by caching the ones
returned by bus_alloc_resource_any(9).
- Change the logic of iflib_msix_init() around to only map the MSI-X
BAR if MSI-X is actually supported, i. e. pci_msix_count(9) returns
> 0. Otherwise the "Unable to map MSIX table " message triggers for
devices that simply don't support MSI-X and the user may think that
something is wrong while in fact everything works as expected.
- Put some (mostly redundant) debug messages emitted by iflib(4)
and em(4) during attachment under bootverbose. The non-verbose
output of em(4) seen during attachment now is close to the one
prior to the conversion to iflib(4).
- Replace various variants of spelling "MSI-X" (several in messages)
with "MSI-X" as used in the PCI specifications.
- Remove some trailing whitespace from messages emitted by iflib(4)
and change them to consistently start with uppercase.
- Remove some obsolete comments about releasing interrupts from
drivers and correct a few others.
Reviewed by: erj, Jacob Keller, shurd
Differential Revision: https://reviews.freebsd.org/D18980
- Ever since the workaround for the silicon bug of TSO4 causing MAC hangs
was committed in r295133, CSUM_TSO always got disabled unconditionally
by em(4) on the first invocation of em_init_locked(). However, even with
that problem fixed, it turned out that for at least e. g. 82579 not all
necessary TSO workarounds are in place, still causing MAC hangs even at
Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled
in r323292 (r323293 for stable/10) for the EM-class by default, allowing
users to turn it on if it happens to work with their particular EM MAC
in a Gigabit-only environment.
In head, the TSO workaround for speeds other than Gigabit was lost with
the conversion to iflib(9) in r311849 (possibly along with another one
or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4
got enabled by default again, causing device hangs. Therefore, change the
default for this hardware class back to have TSO4 off, allowing users
to turn it on manually if it happens to work in their environment as
we do in stable/{10,11}. An alternative would be to add a whitelist of
EM-class devices where TSO4 actually is reliable with the workarounds in
place, but given that the advantage of TSO at Gigabit speed is rather
limited - especially with the overhead of these workarounds -, that's
really not worth it. [1]
This change includes the addition of an isc_capabilities to struct
if_softc_ctx so iflib(9) can also handle interface capabilities that
shouldn't be enabled by default which is used to handle the default-off
capabilities of e1000 as suggested by shurd@ and moving their handling
from em_setup_interface() to em_if_attach_pre() accordingly.
- Although 82543 support TSO4 in theory, the former lem(4) didn't have
support for TSO4, presumably because TSO4 is even more broken in the
LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class
devices was enabled as part of the conversion to iflib(9) in r311849,
causing device hangs. So revert back to the pre-r311849 behavior of
not supporting TSO4 for LEM-class at all, which includes not creating
a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2]
- In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET
(65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO
DMA must have a maxsize of the maximum TSO size plus the size of a
VLAN header for software VLAN tagging. The iflib(9) converted em(4),
thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE
in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET
in em_setup_interface() (apparently, left-over from pre-iflib(9)
times). So remove the later and correct iflib(9) to correctly cap
the maximum TSO size reported to the stack at IP_MAXPACKET. While at
it, let iflib(9) use if_sethwtsomax*().
This change includes the addition of isc_tso_max{seg,}size DMA engine
constraints for the TSO DMA tag to struct if_shared_ctx and letting
iflib_txsd_alloc() automatically adjust the maxsize of that tag in case
IFCAP_VLAN_MTU is supported as requested by shurd@.
- Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet
header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9)
so adjustment is automatically done in case IFCAP_VLAN_MTU is supported.
As a consequence, this adjustment now is also done in case of bnxt(4)
which missed it previously.
- Move the reduction of the maximum TSO segment count reported to the
stack by the number of m_pullup(9) calls (which in the worst case,
can add another mbuf and, thus, the requirement for another DMA
segment each) in the transmit path for performance reasons from
em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now
done in iflib_parse_header() rather than in the no longer existing
em_xmit(). Moreover, this optimization applies to all drivers using
iflib(9) and not just em(4); all in-tree iflib(9) consumers still
have enough room to handle full size TSO packets. Also, reduce the
adjustment to the maximum number of m_pullup(9)'s now performed in
iflib_parse_header().
- Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9)
in r311849 and r335338 respectively, these drivers didn't enable
IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed
through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on
by default but also lagg(4) was fixed in that regard in r203548. So
just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling
in {em,ixl,ixlv}_setup_interface().
- Nuke other redundant IFCAP_* setting in {em,ixl,ixlv}_setup_interface()
which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre()
now.
- Remove some redundant/dead setting of scctx->isc_tx_csum_flags in
em_if_attach_pre().
- Remove some IFCAP_* duplicated either directly or indirectly (e. g.
via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS.
- Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as
iflib(9) adds that capability unconditionally.
- Remove some unused macros from em(4).
- Bump __FreeBSD_version as some of the above changes require the modules
of drivers using iflib(9) to be recompiled.
Okayed by: sbruno@ at 201806 DevSummit Transport Working Group [1]
Reviewed by: sbruno (earlier version), erj
PR: 219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2]
Differential Revision: https://reviews.freebsd.org/D15720
"Under my tutelage Nicole did 85% of the work. At the time it seemed
simplest for a number of reasons to put my copyright on it. I now consider
that to have been a mistake."
Submitted by: Matthew Macy <mmacy@mattmacy.io>
Reviewed by: shurd
Approved by: shurd
Differential Revision: https://reviews.freebsd.org/D14766
Email address has changed, uses consistent name (Matthew, not Matt)
Reported by: Matthew Macy <mmacy@mattmacy.io>
Differential Revision: https://reviews.freebsd.org/D13537
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
This was really too big of a commit even if everything worked, but there
are multiple new issues introduced in the one huge commit, so it's not
worth keeping this until it's fixed.
I'll work on splitting this up into logical chunks and introduce them one
at a time over the next week or two.
Approved by: sbruno (mentor)
Sponsored by: Limelight Networks
by Matt Macy as well as other changes which he has accepted via pull
request to his github repo at https://github.com/mattmacy/networking/
This should bring -CURRENT and the github repo into close enough sync to
allow small feature branches rather than a large chain of interdependant
patches being developed out of tree. The reset of the synchronization
should be able to be completed on github by splitting the remaining
changes that are not yet ready into short feature branches for later
review as smaller commits.
Here is a summary of changes included in this patch:
1) More checks when INVARIANTS are enabled for eariler problem
detection
2) Group Task Queue cleanups
- Fix use of duplicate shortdesc for gtaskqueue malloc type.
Some interfaces such as memguard(9) use the short description to
identify malloc types, so duplicates should be avoided.
3) Allow gtaskqueues to use ithreads in addition to taskqueues
- In some cases, this can improve performance
4) Better logging when taskqgroup_attach*() fails to set interrupt
affinity.
5) Do not start gtaskqueues until they're needed
6) Have mp_ring enqueue function enter the ABDICATED rather than BUSY
state. This moves the TX to the gtaskq and allows processing to
continue faster as well as make TX batching more likely.
7) Add an ift_txd_errata function to struct if_txrx. This allows
drivers to inspect/modify mbufs before transmission.
8) Add a new IFLIB_NEED_ZERO_CSUM for drivers to indicate they need
checksums zeroed for checksum offload to work. This avoids modifying
packet data in the TX path when possible.
9) Use ithreads for iflib I/O instead of taskqueues
10) Clean up ioctl and support async ioctl functions
11) Prefetch two cachlines from each mbuf instead of one up to 128B. We
often need to parse packet header info beyond 64B.
12) Fix potential memory corruption due to fence post error in
bit_nclear() usage.
13) Improved hang detection and handling
14) If the packet is smaller than MTU, disable the TSO flags.
This avoids extra packet parsing when not needed.
15) Move TCP header parsing inside the IS_TSO?() test.
This avoids extra packet parsing when not needed.
16) Pass chains of mbufs that are not consumed by lro to if_input()
rather call if_input() for each mbuf.
17) Re-arrange packet header loads to get as much work as possible done
before a cache stall.
18) Lock the context when calling IFDI_ATTACH_PRE()/IFDI_ATTACH_POST()/
IFDI_DETACH();
19) Attempt to distribute RX/TX tasks across cores more sensibly,
especially when RX and TX share an interrupt. RX will attempt to
take the first threads on a core, and TX will attempt to take
successive threads.
20) Allow iflib_softirq_alloc_generic() to request affinity to the same
cpus an interrupt has affinity with. This allows TX queues to
ensure they are serviced by the socket the device is on.
21) Add new iflib sysctls to net.iflib:
- timer_int - interval at which to run per-queue timers in ticks
- force_busdma
22) Add new per-device iflib sysctls to dev.X.Y.iflib
- rx_budget allows tuning the batch size on the RX path
- watchdog_events Count of watchdog events seen since load
23) Fix error where netmap_rxq_init() could get called before
IFDI_INIT()
24) e1000: Fixed version of r323008: post-cold sleep instead of DELAY
when waiting for firmware
- After interrupts are enabled, convert all waits to sleeps
- Eliminates e1000 software/firmware synchronization busy waits after
startup
25) e1000: Remove special case for budget=1 in em_txrx.c
- Premature optimization which may actually be incorrect with
multi-segment packets
26) e1000: Split out TX interrupt rather than share an interrupt for
RX and TX.
- Allows better performance by keeping RX and TX paths separate
27) e1000: Separate igb from em code where suitable
Much easier to understand separate functions and "if (is_igb)" than
previous tests like "if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))"
#blamebruno
Reviewed by: sbruno
Approved by: sbruno (mentor)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12235
recieve descriptors for the igb(4) class of devices. This will
allow a better definition for maximum going forward. Some igb(4)
devices support more than the default 4K.
Reported by: Jason (j@nitrology.com)
Sponsored by: Limelight Networks
- restore newer code for vf, i350, i210, i211
- restore dmac init code for i354 and i350
- restore WUC/WUFC update
- check for igb mac type before attempting trying to assert
a media changed event.
- handle link events for igb(4) and em(4) devices differently
and appropriately for their respective model types.
Submitted by: Matt Macy <mmacy@mattmacy.io>
Sponsored by: Limelight Networks
- unconditionally enable BUS_DMA on non-x86 architectures
- speed up rxd zeroing via customized function
- support out of order updates to rxd's
- add prefetching to hardware descriptor rings
- only prefetch on 10G or faster hardware
- add seperate tx queue intr function
- preliminary rework of NETMAP interfaces, WIP
Submitted by: Matt Macy <mmacy@nextbsd.org>
Sponsored by: Limelight Networks
- em(4) igb(4) and lem(4)
- deprecate the igb device from kernel configurations
- create a symbolic link in /boot/kernel from if_em.ko to if_igb.ko
Devices tested:
- 82574L
- I218-LM
- 82546GB
- 82579LM
- I350
- I217
Please report problems to freebsd-net@freebsd.org
Partial review from jhb and suggestions on how to *not* brick folks who
originally would have lost their igbX device.
Submitted by: mmacy@nextbsd.org
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Limelight Networks and Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8299
Several files use the internal name of `struct device` instead of
`device_t` which is part of the public API. This patch changes all
`struct device *` to `device_t`.
The remaining occurrences of `struct device` are those referring to the
Linux or OpenBSD version of the structure, or the code is not built on
FreeBSD and it's unclear what to do.
Submitted by: Matthew Macy <mmacy@nextbsd.org> (previous version)
Approved by: emaste, jhibbits, sbruno
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7447
- At Intel it is believed that most of their products support "only"
40 DMA segments so lower {EM,IGB}_MAX_SCATTER accordingly. Actually,
40 is more than plenty to handle full size TSO packets so it doesn't
make sense to further distinguish between MAC variants that really
can do 64 DMA segments. Moreover, capping at 40 DMA segments limits
the stack usage of {em,igb}_xmit() that - given the rare use of more
than these - previously hardly was justifiable, while still being
sufficient to avoid the problems seen with em(4) and EM_MAX_SCATTER
set to 32.
- In igb(4), pass the actually supported TSO parameters up the stack.
Previously, the defaults set in if_attach_internal() were applied,
i. e. a maximum of 35 TSO segments, which made supporting more than
these in the driver pointless. However, this might explain why no
problems were seen with IGB_MAX_SCATTER at 64.
- In em(4), take the 5 m_pullup(9) invocations performed by em_xmit()
in the TSO case into account when reporting TSO parameters upwards.
In the worst case, each of these calls will add another mbuf and,
thus, the requirement for an additional DMA segment. So for best
performance, it doesn't make sense to advertize a maximum of TSO
segments that typically will require defragmentation in em_xmit().
Again, this leaves enough room to handle full size TSO packets.
- Drop TSO macros from if_lem.h given that corresponding MACS don't
support TSO in the first place.
Reviewed by: erj, sbruno, jeffrey.e.pieper_intel.com
Approved by: erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D5238
Major changes:
- Add i219/i219(2) hardware support. (Found on Skylake generation and newer
chipsets.)
- Further to the last Skylake support diff, this one also includes support for
the Lewisburg chipset (i219(3)).
- Add a workaround to an igb hardware errata.
All 1G server products need to have IPv6 extension header parsing turned off.
This should be listed in the specification updates for current 1G server
products, e.g. for i350 it's errata #37 in this document:
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/ethernet-controller-i350-spec-update.pdf
- Avoton (i354) PHY errata workaround added
And a bunch of minor fixes, as well as #defines for things that the current
em(4)/igb(4) drivers don't implement.
Differential Revision: https://reviews.freebsd.org/D3162
Reviewed by: sbruno, marius, gnn
Approved by: gnn
MFC after: 2 weeks
Sponsored by: Intel Corporation
alignment guarantees provided by m_defrag(9), use m_collapse(9)
instead for performance reasons.
While at it, sanitize the statistics softc members, i. e. retire
unused ones and add SYSCTL nodes missing for actually used ones.
Differential Revision: https://reviews.freebsd.org/D4717
e1000/e1000e split in linux.
Split rxbuffer and txbuffer apart to support the new RX descriptor format
structures. Move rxbuffer manipulation to em_setup_rxdesc() to unify the
new behavior changes.
Add a RSSKEYLEN macro for help in generating the RSSKEY data structures
in the card.
Change em_receive_checksum() to process the new rxdescriptor format
status bit.
MFC after: 2 weeks
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D3447
doesn't get overrun by things like NFS that can and do shove more than 32 segs when
being used with em(4) and TSO4.
Update tso handling code in em_xmit() with update from jhb@ in email thread:
https://lists.freebsd.org/pipermail/freebsd-net/2014-July/039306.html
set ifp->if_hw_tsomax, ifp->if_hw_tsomaxsegcount & ifp->if_hw_tsomaxsegsize
to appropriate values.
Define a TSO workaround "magic" number of 4 that is used to avoid an
alignment issue in hardware.
Change a couple of integer values that were used as booleans to actual
bool types.
Ensure that em_enable_intr() enables the appropriate mask of interrupts
and not just a hardcoded define of values.
PR: 200221 199174 195078
Differential Revision: https://reviews.freebsd.org/D3192
Reviewed by: erj jhb hiren
MFC after: 2 weeks
Sponsored by: Limelight Networks
up to 2 rx/tx queues for the 82574.
Program the 82574 to enable 5 msix vectors, assign 1 to each rx queue,
1 to each tx queue and 1 to the link handler.
Inspired by DragonFlyBSD, enable some RSS logic for handling tx queue
handling/processing.
Move multiqueue handler functions so that they line up better in a diff
review to if_igb.c
Always enqueue tx work to be done in em_mq_start, if unable to acquire
the TX lock, then this will be processed in the background later by the
taskqueue. Remove mbuf argument from em_start_mq_locked() as the work
is always enqueued. (stolen from igb)
Setup TARC, TXDCTL and RXDCTL registers for better performance and stability
in multiqueue and singlequeue implementations. Handle Intel errata 3 and
generic multiqueue behavior with the initialization of TARC(0) and TARC(1)
Bind interrupt threads to cpus in order. (stolen from igb)
Add 2 new DDB functions, one to display the queue(s) and their settings and
one to reset the adapter. Primarily used for debugging.
In the multiqueue configuration, bump RXD and TXD ring size to max for the
adapter (4096). Setup an RDTR of 64 and an RADV of 128 in multiqueue configuration
to cut down on the number of interrupts. RADV was arbitrarily set to 2x RDTR
and can be adjusted as needed.
Cleanup the display in top a bit to make it clearer where the taskqueue threads
are running and what they should be doing.
Ensure that both queues are processed by em_local_timer() by writing them both
to the IMS register to generate soft interrupts.
Ensure that an soft interrupt is generated when em_msix_link() is run so that
any races between assertion of the link/status interrupt and a rx/tx interrupt
are handled.
Document existing tuneables: hw.em.eee_setting, hw.em.msix, hw.em.smart_pwr_down, hw.em.sbp
Document use of hw.em.num_queues and the new kernel option EM_MULTIQUEUE
Thanks to Intel for their continued support of FreeBSD.
Reviewed by: erj jfv hiren gnn wblock
Obtained from: Intel Corporation
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D1994
applying them to em(4).
Rely on iterations through the local timer, and the tx queue state to
determine if an actual hang has occurred. Any time a descriptor is used
(packet sent), the tx queue is flagged as busy. Then when txeof runs, it
either clears the flag when all is clean, or resets it to 1 if ANY are
cleaned, if nothing is cleaned it increments the flag.
Local timer simply checks to see if busy ever reaches MAX (10, which
is compile time configurable), and then sets it as HUNG, at that point
there is one more timer cycle in which to have any cleans, if not a
watchdog reset will occur.
Differential Revision: https://reviews.freebsd.org/D2019
Submitted by: jfv
Reviewed by: hiren
Obtained from: Intel Corporation
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: Limelight Networks
irrespective of the setting of lem_rx_process_limit, while
giving a chance to the taskqueue scheduler to act after
each chunk.
This makes lem_rxeof similar to the one in if_em.c and if_igb.c .
if_lem.c and if_em.c: add a sysctl to manually configure the
'itr' moderation register.
Approved by: Jack Vogel
There have still been intermittent problems with apparent TX
hangs for some customers. These have been problematic to reproduce
but I believe these changes will address them. Testing on a number
of fronts have been positive.
EM: there is an important 'chicken bit' fix for 82574 in the shared
code this is supported in the core here.
- The TX path has been tightened up to improve performance. In
particular UDP with jumbo frames was having problems, and the
changes here have improved that.
- OACTIVE has been used more carefully on the theory that some
hangs may be due to a problem in this interaction
- Problems with the RX init code, the "lazy" allocation and
ring initialization has been found to cause problems in some
newer client systems, and as it really is not that big a win
(its not in a hot path) it seems best to remove it.
- HWTSO was broken when VLAN HWTAGGING or HWFILTER is used, I
found this was due to an error in setting up the descriptors
in em_xmit.
IGB:
- TX is also improved here. With multiqueue I realized its very
important to handle OACTIVE only under the CORE lock so there
are no races between the queues.
- Flow Control handling was broken in a couple ways, I have changed
and I hope improved that in this delta.
- UDP also had a problem in the TX path here, it was change to
improve that.
- On some hardware, with the driver static, a weird stray interrupt
seems to sometimes fire and cause a panic in the RX mbuf refresh
code. This is addressed by setting interrupts late in the init
path, and also to set all interrupts bits off at the start of that.
to calculate the outstanding descriptors that need to be
refreshed at any time, and use THAT in rxeof to determine
if refreshing needs to be done. Also change the local_timer
to simply fire off the appropriate interrupt rather than
schedule a tasklet, its simpler.
MFC in two weeks