It seems RTL8169/RTL8168/RTL810xE has a kind of interrupt
moderation mechanism but it is not documented at all. The magic
value dramatically reduced number of interrupts without noticeable
performance drops so apply it to all RTL8169/RTL8169 controllers.
Vendor's FreeBSD driver also applies it to RTL810xE controllers but
their Linux driver explicitly cleared the register, so do not
enable interrupt moderation for RTL810xE controllers.
While I'm here sort 8169 specific registers.
Obtained from: RealTek FreeBSD driver
useful counters like rl_missed_pkts is 16 bits quantity which is
too small to hold meaningful information happened in a second. This
means driver should frequently read these counters in order not to
lose accuracy and that approach is too inefficient in driver's
view. Moreover it seems there is no way to trigger an interrupt to
detect counter near-full or wraparound event as well as lacking
clearing the MAC counters. Another limitation of reading the
counters from RealTek controllers is lack of interrupt firing at
the end of DMA cycle of MAC counter read request such that driver
have to poll the end of the DMA which is a time consuming process
as well as inefficient. The more severe issue of the MAC counter
read request is it takes too long to complete the DMA. All these
limitation made maintaining MAC counters in driver impractical. For
now, just provide simple sysctl interface to trigger reading the
MAC counters. These counters could be used to track down driver
issues. Users can read MAC counters maintained in controller with
the following command.
#sysctl dev.re.0.stats=1
While I'm here add check for validity of dma map and allocated
memory before unloading/freeing them.
Tested by: rmacklem
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
addresses; we now let mii_attach() only probe the PHY at the desired
address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
off from, partly even based on grabbing and using the softc of the
parent; we now pass these flags down from the NIC to the PHY drivers
via mii_attach(). This got us rid of all such hacks except those of
brgphy() in combination with bce(4) and bge(4), which is way beyond
what can be expressed with simple flags.
While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).
Reviewed by: jhb, yongari
Previously rl(4) continuously checked whether there are RX events
or TX completions in forever loop. This caused TX starvation under
high RX load as well as consuming too much CPU cycles in the
interrupt handler. If interrupt was shared with other devices which
may be always true due to USB devices in these days, rl(4) also
tried to process the interrupt. This means polling(4) was the only
way to mitigate the these issues.
To address these issues, rl(4) now disables interrupts when it
knows the interrupt is ours and limit the number of iteration of
the loop to 16. The interrupt would be enabled again before exiting
interrupt handler if the driver is still running. Because RX buffer
is 64KB in size, the number of iterations in the loop has nothing
to do with number of RX packets being processed. This change
ensures sending TX frames under high RX load.
RX handler drops a driver lock to pass received frames to upper
stack such that there is a window that user can down the interface.
So rl(4) now checks whether driver is still running before serving
RX or TX completion in the loop.
While I'm here, exit interrupt handler when driver initialized
controller.
With this change, now rl(4) can send frames under high RX load even
though the TX performance is still not good(rl(4) controllers can't
queue more than 4 frames at a time so low TX performance was one of
design issue of rl(4) controllers). It's much better than previous
TX starvation and you should not notice RX performance drop with
this change. Controller still shows poor performance under high
network load but for many cases it's now usable without resorting
to polling(4).
MFC after: 2 weeks
IFF_ALLMULTI/IFF_PROMISC as well as multicast filter configuration.
Rewrite RX filter logic to reduce number of register accesses and
make it handle promiscuous/allmulti toggling without controller
reinitialization.
Previously rl(4) counted on controller reinitialization to reprogram
promiscuous configuration but r211767 resulted in avoiding
controller reinitialization whenever promiscuous mode is toggled.
To address this, keep track of driver's view of interface state and
handle IFF_ALLMULTI/IFF_PROMISC changes without reinitializing
controller. This should fix a regression introduced in r211267.
While I'm here remove unnecessary variable reassignment in ioctl
handler.
PR: kern/151079
MFC after: 1 week
register mapping. I'm not sure whether it comes from the fact that
controllers live behind certain PCI brdge(PLX PCI 6152 33BC) and
the bridge has some issues in handling I/O space register mapping.
Unfortunately it's not possible to narrow down to an exact
controller that shows this issue because RealTek used the same PCI
device/revision id again. In theory, it's possible to check parent
PCI bridge device and change rl(4) to use memory space register
mapping if the parent PCI bridge is PLX PCI 6152. But I didn't try
to do that and we wouldn't get much benefit with added complexity.
Blindly switching to use memory space register mapping for rl(4)
may make most old controllers not to work. At least, I don't want
to take potential risk from such change. So use I/O space register
mapping by default but give users chance to override it via a
tunable. The tunable to use memory space register mapping would be
given by adding the following line to /boot/loader.conf file.
dev.rl.%d.prefer_iomap="0"
This change makes P811B quad-port work with this tunable.
Tested by: Nikola Kalpazanov ( n.kalpazanov <> gmail dot com )
MFC after: 1 week
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.
MFC after: 1 month
According to the specifications AMD/ATI SMBus controller is very
similar to SMBus controller found in PIIX4.
Some notable differences:
o different bit for enabling/signalling regular interrupt mode
o in practice seems to support only polling mode
Thus, intpm driver is modified to support polling-only mode
and to recognize SB700 PCI ID and differences.
Tested on: SB700 and PIIX4E platforms
Reviewed by: jhb
MFC after: 2 weeks
X-Perhaps-ToDo: rename the driver to reflect its function
and supported hardware
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs. This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.
For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.
Approved by: re (kib)
MFC after: 6 weeks
CPU for too long period than necessary. Additively, interfaces are kept
polled (in the tick) even if no more packets are available.
In order to avoid such situations a new generic mechanism can be
implemented in proactive way, keeping track of the time spent on any
packet and fragmenting the time for any tick, stopping the processing
as soon as possible.
In order to implement such mechanism, the polling handler needs to
change, returning the number of packets processed.
While the intended logic is not part of this patch, the polling KPI is
broken by this commit, adding an int return value and the new flag
IFCAP_POLLING_NOCOUNT (which will signal that the return value is
meaningless for the installed handler and checking should be skipped).
Bump __FreeBSD_version in order to signal such situation.
Reviewed by: emaste
Sponsored by: Sandvine Incorporated
checksum offload frames. Software workaround used for broken
controllers(RTL8169, RTL8168, RTL8168B) seem to cause watchdog
timeouts on RTL8139C+.
Introduce a new flag RL_FLAG_AUTOPAD to mark automatic padding
feature of controller and set it for RTL8139C+ and controllers that
use new descriptor format. This fixes watchdog timeouts seen on
RTL8139C+.
Reported by: Dimitri Rodis < DimitriR <> integritasystems dot com >
Tested by: Dimitri Rodis < DimitriR <> integritasystems dot com >
1. fix nointr check in intsmb_start, matters only if ENABLE_ALART is
defined (by default, it is not);
2. drop unnecessary inspection/reporting of power-management io registers
base address;
3. in verbose mode report errors from SMBus host controller and their
mapping to smbus(4) errors;
Approved by: jhb (mentor)
event from mii(4) may not be delivered if valid link was already
established. To address the issue, check current link state after
driving MII_TICK. This should fix a regression introduced in
r184245.
PR: kern/129647
command whenever Tx completion interrupt is raised. The Tx poll
bit is cleared when all packets waiting to be transferred have been
processed. This means the second Tx poll command can be silently
ignored as the Tx poll bit could be still active while processing
of previous Tx poll command is in progress.
To address the issue re(4) used to invoke the Tx poll command in Tx
completion handler whenever it detects there are pending packets in
TxQ. However that still does not seem to completely eliminate
watchdog timeouts seen on RealTek PCIe controllers. To fix the
issue kick Tx poll command only after Tx completion interrupt is
raised as this would indicate Tx is now idle state such that it can
accept new Tx poll command again. While here apply this workaround
for PCIe based controllers as other controllers does not seem to
have this limitation.
Tested by: Victor Balada Diaz < victor <> bsdes DOT net >
out of sleep mode prior to accessing to PHY. This should fix device
attach failure seen on these controllers. Also enable the sleep
mode when device is put into sleep state.
PR: kern/123123, kern/123053
same LOM hardware with goofed-up EEPROM programming also needed reading the
Ethernet address from the chips registers as the EEPROM did not have a
sensible address programmed.
Patch developed by: pyun@
Funky hardware on loan: www.id-it.nl
MFC after: 2 weeks
established a valid link or not. In miibus_statchg handler add a
check for established link is valid one for the controller(e.g.
1000baseT is not a valid link for fastethernet controllers.)
o Added a flag RE_FLAG_FASTETHER to mark fastethernet controllers.
o Added additional check to know whether we've really encountered
watchdog timeouts or missed Tx completion interrupts. This change
may help to track down the cause of watchdog timeouts.
o In interrupt handler, removed a check for link state change
interrupt. Not all controllers have the bit and re(4) did not
rely on the event for a long time. In addition, re(4) didn't
request the interrupt in RL_IMR register.
Tested by: rpaulo
cable tuning. This has helped in some installations for hardware
deployed by a former employer. Made optional because the lists aren't
full of complaints about these cards... even when they were wildly
popular.
Reviewed by: attilio@, jhb@, trhodes@ (all an older version of the patch)
established a valid link or not.
In rl_start_locked, don't try to send packets unless we have valid
link. While I'm here add a check that verifies whether driver can
accept Tx requests by inspecting IFF_DRV_OACTIVE/IFF_DRV_RUNNING
flag.
- The hardware does not support DAC so limit DMA address space to
4GB.
- Removed BUS_DMA_ALLOC_NOW flag.
- Created separated Tx buffer and Rx buffer DMA tags. Previously
it used to single DMA tag and it was not possible to specify
different DMA restrictions.
- Apply 4 bytes alignment limitation of Tx buffer.
- Apply 8 bytes alignment limitation of Rx buffer.
- Tx side bus_dmamap_load_mbuf_sg(9) support.
- Preallocate Tx DMA maps as creating DMA maps take very long time
on architectures that require real DMA maps.
- Adjust guard buffer size to 1522 + 8 as it should include VLAN
and additional reserved bytes in Rx buffer.
- Plug memory leak in device detach. Previously wrong buffer
address was used to free allocated memory.
- Added rl_list_rx_init() to clear Rx buffer and cleared the
buffer.
- Don't destroy DMA maps in rl_txeof() as the DMA map should be
reused. There is no reason to destroy/recreate the DMA maps in
this driver.
- Removed rl_dma_map_rxbuf()/rl_dma_map_txbuf() callbacks.
- The hardware does not support descriptor based DMA on Tx side
and the Tx buffer address should be aligned on 4 bytes boundary
as well as manual padding for short frames. Because of this
hardware limitation rl(4) always used to invoke m_defrag(9) to
get a 4 bytes aligned single buffer. However m_defrag(9) takes
a lot of CPU cycles on slow machines and not all packets need
the help of m_defrag(9). Armed with the information, don't
invoke m_defrag(9) if the following conditions are true.
1. Buffer is not fragmented.
2. Buffer is aligned on 4 bytes boundary.
3. Manual padding is not necessary.
4. Or padding is necessary but upper stack passed a writable
buffer and the space needed for padding is satisfied.
This change combined with preallocated DMA maps greatly
increased Tx performance of driver on sparc64.
- Moved bus_dmamap_sync(9) in rl_start_locked() to rl_encap() and
corrected memory synchronization operation specifier of
bus_dmamap_sync(9).
- Removed bus_dmamap_unload(9) in rl_stop(). There is no need to
reload/unload Rx buffer as rl(4) always have to copy from the
buffer. It just needs proper bus_dmamap_sync(9) calls before
copying the received frame.
With this change rl(4) should work on systems with more than 4GB
memory.
PR: kern/128143