m_pkthdr.ether_vlan. The presence of the M_VLANTAG flag on the mbuf
signifies the presence and validity of its content.
Drivers that support hardware VLAN tag stripping fill in the received
VLAN tag (containing both vlan and priority information) into the
ether_vtag mbuf packet header field:
m->m_pkthdr.ether_vtag = vlan_id; /* ntohs()? */
m->m_flags |= M_VLANTAG;
to mark the packet m with the specified VLAN tag.
On output the driver should check the mbuf for the M_VLANTAG flag to
see if a VLAN tag is present and valid:
if (m->m_flags & M_VLANTAG) {
... = m->m_pkthdr.ether_vtag; /* htons()? */
... pass tag to hardware ...
}
VLAN tags are stored in host byte order. Byte swapping may be necessary.
(Note: This driver conversion was mechanic and did not add or remove any
byte swapping in the drivers.)
Remove zone_mtag_vlan UMA zone and MTAG_VLAN definition. No more tag
memory allocation have to be done.
Reviewed by: thompsa, yar
Sponsored by: TCP/IP Optimization Fundraise 2005
required by arches like sparc64 (not yet implemented) and sun4v where there
are seperate IOMMU's for each PCI bus... For all other arches, it will
end up returning NULL, which makes it a no-op...
Convert a few drivers (the ones we've been working w/ on sun4v) to the
new convection... Eventually all drivers will need to replace the parent
tag of NULL, w/ bus_get_dma_tag(dev), though dev is usually different for
each driver, and will require hand inspection...
Reviewed by: scottl (earlier version)
register. This really shouldn't be using pci_enable_io() directly as
bus_alloc_resource() does it already, but the cached copy of the
command word needs to be correct so the enable/disable mwi functions
work properly.
- Use pci bus accessors to read revision ID and subvendor IDs.
Reviewed by: jvogel
conditions. The cause of missing Tx completion interrupts comes from
Tx interrupt moderation mechanism(delayed interrupts) or chipset bug.
If Tx interrupt moderation mechanism is the cause of false watchdog
timeout error we should have to fix all device drivers that have Tx
interrupt moderation capability. We may need more investigation
for this issue. Anyway, the fix is the same for both cases.
This should fix occasional watchdog timeout errors seen on a few
systems.
Reported by: -net, Patrick M. Hausen < hausen AT punkt DOT de >
Tested by: Patrick M. Hausen < hausen AT punkt DOT de >
Previously em(4) requeued the failed mbuf chains from
bus_dmamap_load_mbuf_sg(9) failure to resend it later. However,
bus_dmamap_load_mbuf_sg(9) may never complete its request as the
fragmented frames can have more than EM_MAX_SCATTER segments.
To handle the above EFBIG case, defragment the frame with m_defrag(9)
and free the mbuf chain if it can't deframent the chain due to
resource shortage.
Reviewed by glebius (with improvements)
o Create one more spare DMA map for Rx handler to recover from
bus_dmamap_load_mbuf_sg(9) failure.
o Make sure to update status bit in Rx descriptors even if we failed
to allocate a new buffer. Previously it resulted in stuck condition
and em_handle_rxtx task took up all available CPU cycles.
o Don't blindly unload DMA map. Reuse loaded DMA map if received
packet has errors. This would speed up Rx processing a bit under
heavy load as it does not need to reload DMA map in case of error.
(bus_dmamap_load_mbuf_sg(9) is the most expensive call in driver
context.)
o Update if_iqdrops counter if it can't allocate a mbuf cluster.
With this change it's now possible to see queue dropped packets
with netstat(1).
o Update mbuf_cluster_failed counter if fixup code failed to
allocate mbuf header.
o Return ENOBUFS instead of ENOMEM in case of Rx fixup failure.
o Make adapter->lmp NULL in case of Rx fixup failure. Strictly
specking it's not necessary for correct operation but it makes
the intention clear.
o Remove now unused dropped_pkts member in softc.
With these changes em(4) should survive mbuf cluster allocation
failure on Rx path.
Reviewed by: pdeuskar, glebius (with improvements)
MCLBYTES - ETHER_ALIGN. Previously it applied the alignment fixup code
for oversized frames which would result in reduced performance on
strict alignment archs.
82571EB quad port copper NIC and has few minor fixes.
Details:
- if_em.c. Merged manually, viewing diff between new vendor
driver and previous one.
- if_em_hw.c. Dropped in from vendor, and then restored
revision 1.15.
80003 NICs and NICs found on ICH8 mobos, and improves support for
already known chips.
Details:
- if_em.c. Merged manually, viewing diff between new vendor
driver and previous one. This was an easy task, because
most changes between 5.1.5 and 6.0.5 are bugfixes taken
from FreeBSD.
- if_em_hw.h. Dropped in from vendor, and then restored
revisions 1.16, 1.17, 1.18.
- if_em_hw.c. Dropped in from vendor, and then restored
revision 1.15.
- if_em_osdep.h. Added new required macros from vendor file
and add a hack against define namespace mangling in
if_em_hw.h. Intel made another hack, but I prefer mine.
by remembering a map used in bus_dmamap_load_mbuf_sg(9). I have
no idea how it could ever worked before.
This fixes a warning generated by a diagnostic check in sun4v
iommu driver.
Reported by: jb
Tested by: jb(sun4v)
renegotiation, we only initialize the hardware only when it is
absolutely required. Process SIOCGIFADDR ioctl in em(4) when we know
an IPv4 address is added. Handling SIOCGIFADDR in a driver is
layering violation but it seems that there is no easy way without
rewritting hardware initialization code to reduce settle time after
reset.
This should fix a long standing bug which didn't send ARP packet when
interface address is changed or an alias address is added. Another
effect of this fix is it doesn't need additional delays anymore when
adding an alias address to the interface.
While I'm here add a new if_flags into softc which remembers current
prgroammed interface flags and make use of it when we have to program
promiscuous mode.
Tested by: Atanas <atanas AT asd DOT aplus DOT net>
Analyzed by: rwatson
Discussed with: -stable
taskqueued interrupt mode is going to be quite complex. Since
the polling mode is considered legacy feature for em(4) driver,
the decision is made to make polling and new interrupt handler
mutually exclusive, selected at compile time.
If kernel is compiled with DEVICE_POLLING, the fast taskqueued
interrupt handler code is disabled and the em_poll() and legacy
em_intr() functions are enabled. Otherwise, legacy functions
are disabled and only em_intr_fast() code is compiled.
Discussed with: scottl
new chips and improves support for already supported ones.
Some details, important for future merges:
- if_em.c merged manually, viewing diff between new vendor
driver and previous one.
- if_em_hw.h dropped in from vendor, and then restored revisions
1.16, 1.17, 1.18.
- if_em_hw.c dropped in from vendor, and then two liner change made,
that restores support for two rare chips.
- In em_attach() remove em_check_for_link(). Not needed here, since
already done in em_hardware_init().
- In em_attach() replace the printing block with call to
em_update_link_status().
- Remove modification of sc->link_state from em_hardware_init() and
from em_media_status(). This makes em_update_link_status() a
single point of change. Call em_update_link_status() where needed.
- Restart watchdog if we *did* processed any descriptors. [1]
- Log the watchdog event if the link is *up*. [2]
PR: kern/92948 [1]
Submitted by: Mihail Balikov <mihail.balikov interbgc.com> [1]
PR: kern/92895 [2]
Submitted by: Vladimir Ivanov <wawa yandex-team.ru> [2]
hacking:
- Remove all spaces at eol.
- Improve style(9) in most frequently edited functions.
- In em_encap() push variables for 82544 workaround in the block
where they are only used.
- In em_get_buf() remove unused variable.
taskqueue_start_threads(struct taskqueue **, int count, int pri,
const char *name, ...);
This allows the creation of 1 or more threads that will service a single
taskqueue. Also rework the taskqueue_create() API to remove the API change
that was introduced a while back. Creating a taskqueue doesn't rely on
the presence of a process structure, and the proc mechanics are much better
encapsulated in taskqueue_start_threads(). Also clean up the
taskqueue_terminate() and taskqueue_free() functions to safely drain
pending tasks and remove all associated threads.
The TASKQUEUE_DEFINE and TASKQUEUE_DEFINE_THREAD macros have been changed
to use the new API, but drivers compiled against the old definitions will
still work. Thus, recompiling drivers is not a strict requirement.
the the interface has been configured. I'm not sure how this could ever
have worked before, but it should be fixed now. Also break out the interrupt
degresitration function into it's own step.
- Only update the rx ring consumer pointer after running through the rx loop,
not with each iteration through the loop.
- If possible, use a fast interupt handler instead of an ithread handler. Use
the interrupt handler to check and squelch the interrupt, then schedule a
taskqueue to do the actual work. This has three benefits:
- Eliminates the 'interrupt aliasing' problem found in many chipsets by
allowing the driver to mask the interrupt in the NIC instead of the
OS masking the interrupt in the APIC.
- Allows the driver to control the amount of work done in the interrupt
handler. This results in what I call 'adaptive polling', where you get
the latency benefits of a quick response to interrupts with the
interrupt mitigation and work partitioning of polling. Polling is still
an option in the driver, but I consider it orthogonal to this work.
- Don't hold the driver lock in the RX handler. The handler and all data
associated is effectively serialized already. This eliminates the cost of
dropping and reaquiring the lock for every receieved packet. The result
is much lower contention for the driver lock, resulting in lower CPU usage
and lower latency for interactive workloads.
The amount of work done in the taskqueue is controlled by the sysctl
dev.em.N.rx_processing_limit
and tunable
hw.em.rx_process_limit
Setting these to -1 effectively removes the limit.
The fast interrupt and taskqueue can be disabled by defining NO_EM_FASTINTR.
This work has been shown to increase fast-forwarding from ~570 kpps to
~750 kpps (note that the same NIC hardware seems unable to transmit more than
800 kpps, so this increase appears to be limited almost solely by the
hardware). Gains have been shown in other workloads, ranging from better
performance to elimination of over-saturation livelocks.
Thanks to Andre Opperman for his time and resources from his network
performance project in performing much of the testing. Thanks to Gleb
Smirnoff and Danny Braniss for their help in testing also.