o Create one more spare DMA map for Rx handler to recover from
bus_dmamap_load_mbuf_sg(9) failure.
o Make sure to update status bit in Rx descriptors even if we failed
to allocate a new buffer. Previously it resulted in stuck condition
and em_handle_rxtx task took up all available CPU cycles.
o Don't blindly unload DMA map. Reuse loaded DMA map if received
packet has errors. This would speed up Rx processing a bit under
heavy load as it does not need to reload DMA map in case of error.
(bus_dmamap_load_mbuf_sg(9) is the most expensive call in driver
context.)
o Update if_iqdrops counter if it can't allocate a mbuf cluster.
With this change it's now possible to see queue dropped packets
with netstat(1).
o Update mbuf_cluster_failed counter if fixup code failed to
allocate mbuf header.
o Return ENOBUFS instead of ENOMEM in case of Rx fixup failure.
o Make adapter->lmp NULL in case of Rx fixup failure. Strictly
specking it's not necessary for correct operation but it makes
the intention clear.
o Remove now unused dropped_pkts member in softc.
With these changes em(4) should survive mbuf cluster allocation
failure on Rx path.
Reviewed by: pdeuskar, glebius (with improvements)
80003 NICs and NICs found on ICH8 mobos, and improves support for
already known chips.
Details:
- if_em.c. Merged manually, viewing diff between new vendor
driver and previous one. This was an easy task, because
most changes between 5.1.5 and 6.0.5 are bugfixes taken
from FreeBSD.
- if_em_hw.h. Dropped in from vendor, and then restored
revisions 1.16, 1.17, 1.18.
- if_em_hw.c. Dropped in from vendor, and then restored
revision 1.15.
- if_em_osdep.h. Added new required macros from vendor file
and add a hack against define namespace mangling in
if_em_hw.h. Intel made another hack, but I prefer mine.
renegotiation, we only initialize the hardware only when it is
absolutely required. Process SIOCGIFADDR ioctl in em(4) when we know
an IPv4 address is added. Handling SIOCGIFADDR in a driver is
layering violation but it seems that there is no easy way without
rewritting hardware initialization code to reduce settle time after
reset.
This should fix a long standing bug which didn't send ARP packet when
interface address is changed or an alias address is added. Another
effect of this fix is it doesn't need additional delays anymore when
adding an alias address to the interface.
While I'm here add a new if_flags into softc which remembers current
prgroammed interface flags and make use of it when we have to program
promiscuous mode.
Tested by: Atanas <atanas AT asd DOT aplus DOT net>
Analyzed by: rwatson
Discussed with: -stable
- Only update the rx ring consumer pointer after running through the rx loop,
not with each iteration through the loop.
- If possible, use a fast interupt handler instead of an ithread handler. Use
the interrupt handler to check and squelch the interrupt, then schedule a
taskqueue to do the actual work. This has three benefits:
- Eliminates the 'interrupt aliasing' problem found in many chipsets by
allowing the driver to mask the interrupt in the NIC instead of the
OS masking the interrupt in the APIC.
- Allows the driver to control the amount of work done in the interrupt
handler. This results in what I call 'adaptive polling', where you get
the latency benefits of a quick response to interrupts with the
interrupt mitigation and work partitioning of polling. Polling is still
an option in the driver, but I consider it orthogonal to this work.
- Don't hold the driver lock in the RX handler. The handler and all data
associated is effectively serialized already. This eliminates the cost of
dropping and reaquiring the lock for every receieved packet. The result
is much lower contention for the driver lock, resulting in lower CPU usage
and lower latency for interactive workloads.
The amount of work done in the taskqueue is controlled by the sysctl
dev.em.N.rx_processing_limit
and tunable
hw.em.rx_process_limit
Setting these to -1 effectively removes the limit.
The fast interrupt and taskqueue can be disabled by defining NO_EM_FASTINTR.
This work has been shown to increase fast-forwarding from ~570 kpps to
~750 kpps (note that the same NIC hardware seems unable to transmit more than
800 kpps, so this increase appears to be limited almost solely by the
hardware). Gains have been shown in other workloads, ranging from better
performance to elimination of over-saturation livelocks.
Thanks to Andre Opperman for his time and resources from his network
performance project in performing much of the testing. Thanks to Gleb
Smirnoff and Danny Braniss for their help in testing also.
- don't force busdma to pre-allocate bounce pages for parent tag.
- use system supplied roundup2 macro instead of rolling its own version.
- TX/RX decriptor length should be multiple of 128. There is no
no need to expand the size with the multiple of 4096.
- don't create/destroy DMA maps in TX/RX handlers. Use pre-allocated
DMA maps. Since creating DMA maps on sparc64 is time consuming
operations(resource mananger overhead), this change should boost
performance on sparc64. I could get > 2x speedup on Ultra60.
- TX/RX descriptors could be aligned on 128 boundary. Aligning them
on PAGE_SIZE is waste of resource.
- don't blindly create TX DMA tag with size of MCLBYTES * 8. The size
is only valid under jumbo frame environments. Instead of using the
hardcoded value, re-compute necessary size on the fly.
- RX side bus_dmamap_load_mbuf_sg(9) support.
- remove unused macro EM_ROUNDUP and constant EM_MMBA.
Reviewed by: scottl
Tested by: glebius
for a notebook with em(4) adapter.
- Introduce tunables em.hw.txd and em.hw.rxd, which allow administrator
to configure number of transmit and receive descriptors.
- Check em.hw.txd and em.hw.rxd against hardware limits [*] and require
them to be multiple of 128.
[*] According to comments in if_em.h the 82540EM/82541ER chips can handle
more than 256 descriptors. Since we don't have this hardware to test,
we decided to mimic NetBSD wm(4) driver, that limits these chips to
256 descriptors.
In collaboration with: yongari
overruns and number of watchdog timeouts.
- Do not log(9) RX overrun events, since this pessimizes
things under load [1].
- Do not increase if->if_oerrors in em_watchdog(), since
this leads to counter slipping back, when if->if_oerrors
is recalculated in em_update_stats_counters(). Instead
increase watchdog counter in em_watchdog() and take it
into account in em_update_stats_counters().
Submitted by: ade [1]
- disable jumbo frame support on strict alignment architectures due
to the limitation of hardware. The driver needs a fix-up code for
RX side. The fix will show up in near future.
- fix endian issue for 82544 on PCI-X bus. I couldn't test this as
I don't have the NIC/hardware.
- prefer PCIR_BAR to hardcoded EM_MMBA.
- Properly checks for for 64bit BAR [1]
- replace inl/outl with bus_space(9) [1]
- fix endian issue on VLAN handling.
- reorder header files and remove unnecessary one.
Reviewed by: cognet
No response from: pdeuskar, tackerman
Obtained from: OpenBSD [1]
- Destroy mutex in case of attach failure. [1]
- Lock properly em_watchdog(). [1]
- Lock properly em_sysctl_int_delay(). [1]
- Remove unused global adapter linked list.
- Remove unused dma_size field from struct em_dma_alloc.
- Do not touch interface statistics, that must be edited
only by upper layers. [1]
Submitted by: yongari [1]
o Do not mask the RX overrun interrupt.
o Rewrite em_intr():
- Axe EM_MAX_INTR.
- Cycle acknowledging interrupts and processing
packets until zero interrupt cause register is
read.
- If RX overrun comes in log this fact. [ NetBSD also
resets adapter in this case, but my tests showed that
this is not needed and only pessimizes behavior under
heavy load. ]
- Since almost all functions is rewritten, style the
remaining lines.
This fixes em(4) interfaces wedging under high load.
In collaboration with: wpaul, cognet
Obtained from: NetBSD
replacement and has additional features which make it superior.
Discussed on: -arch
Reviewed by: thompsa
X-MFC-after: never (RELENG_6 as transition period)
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.
This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.
Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.
Reviewed by: sobomax, sam
- Changed from using explicit devices id to using descriptive labels.
- Added support for 82573 and 82546 Quad adapters.
- Corrected support for 82547EI and 82541ER (mac_type was not assigned)
- Removed #ifdef DBG_STATS and extraneous code.
if_em_hw.c/if_em_hw.h
- Added support for 82573 and 82546 Quad adapters.
- Brought forward Intel's most current mac and phy changes.
promiscuous mode introduced in 1.45, which programs the em card not
to strip or prepend tags when in promiscuous mode without also
modifying behavior to manually prepend a vlan header in the event
that the card isn't doing it on transmit. Due to a feature of card
operation, if the global VLAN prepend/strip register isn't set,
setting the VLAN tag flag on individual packet descriptors will
cause the packet to be transmitted using ISL encapsulation rather
than 802.1Q VLAN encapsulation.
This fix causes em_encap() to prepend the header by tracking whether
the card is configured to temporarily disable prepending/stripping
due to promiscuous mode. As a result, entering promiscuous mode on
the parent em interface no longer causes vlans to appear to "wedge"
or transmit ISL-encapsulated frames, which typically will not be
configured/spoken by the other endpoints on the VLAN trunk. This
bug may also exist in other drivers, and the additional vlan
encapsulation logic should be abstracted and centralized in
if_vlan.c if so.
RELENG_5_3 candidate.
MFC after: 1 week
Tested by: pjd, rwatson
Reported by: astesin at ukrtelecom dot net
Reported by: Mike Tancsa <mike at sentex dot net>
Reported by: Iasen Kostov <tbyte at OTEL dot net>
Removed support for Intel 82541ER
Added fix for 82547 which corrects an issue with Jumbo frames larger than 10k.
Added fix for vlan tagged frames not being properly bridged.
Corrected TBI workaround.
Corrected incorrect LED operation issues
Submitted by: tackerman (Tony Ackerman)
MFC after: 2 weeks
- In the receive routine handle the case where last descriptor could have
less than 4 bytes of data.
- Handle race between detach/ioctl routine.
MFC after: 3 days
o correct recursive locking when polling and in em_82547_move_tail
o destroy mutex on detach
o add EM_LOCK_ASSERT and similar macros for creating+deleteing the mtx
Submitted by: Daniel Eischen <eischen@vigrid.com>
Bug Fixes:
- Allow users to use LAA
- Remember promiscuous mode settings while bridging
- Allow gratuitous arp's to be sent
PR: 52966/54488
MFC after: 1 week
recompiling the driver. See the comments near the top of "if_em.h"
for descriptions of these delays. Four new loader tunables control
the system-wide default values:
hw.em.tx_int_delay
hw.em.rx_int_delay
hw.em.tx_abs_int_delay
hw.em.rx_abs_int_delay
The tunables are specified in microseconds. The valid range is
0-67108 usec., and 0 means that the timer is disabled.
There are also four new sysctls (actually, a set of four for each
"em" device in the system) to query and change the interrupt delays
after the system is up:
hw.em0.tx_int_delay
hw.em0.rx_int_delay
hw.em0.tx_abs_int_delay (not present for 82542/3/4 adapters)
hw.em0.rx_abs_int_delay (not present for 82542/3/4 adapters)
It seems to be OK to change these values even while the adapter is
passing traffic.
Approved by: Prafulla Deuskar <pdeuskar@FreeBSD.ORG>
MFC after: 4 weeks
Add sysctl's to display statistics/debug_info
Set WAIT_FOR_AUTONEG_DEFAULT to zero by default
Increment packet in/out statistics inline instead of every two seconds.
MFC after: 3 days
- Use htole* macros where appropriate so that the driver could work on non-x86 architectures
- Use m_getcl() instead of MGETHDR/MCLGET macros
Submitted by: sam (Sam Leffler)
- Added support for ITR (interrupt throttle register). This feature is available on
adapters based on 82545 and above
- Fixed problem with vlan support when traffic has priority bits set. (kern/45907)
PR: kern/45907
MFC after: 1 week
o don't strip the Ethernet header from inbound packets; pass packets
up the stack intact (required significant changes to some drivers)
o reference common definitions in net/ethernet.h (e.g. ETHER_ALIGN)
o track ether_ifattach/ether_ifdetach API changes
o track bpf changes (use BPF_TAP and BPF_MTAP)
o track vlan changes (ifnet capabilities, revised processing scheme, etc.)
o use if_input to pass packets "up"
o call ether_ioctl for default handling of ioctls
Reviewed by: many
Approved by: re
- Set RDTR to zero by default instead of 28.
- Fixed a problem with TX hangs with jumbo frames when number of fragments in the mbuf chain
is large.
- Added support for 82540EP based cards.
MFC after: 3 days
We are having panics with the driver under stress with jumbo frames.
Unfortunately we didnot catch it during our regular test cycle.
I am going to MFC the backout immediately.
descriptors. This simplifies code for jumbo frames.
- Cleaned up coding conventions to make code more unix-like.
- Cleaned up code in if_em_fxhw.c and if_em_phy.c.
Added relevant comments.
MFC after: 1 week