controller in question generates frames with bad IP checksum value
if packets contain IP options. For instance, packets generated by
ping(8) with record route option have wrong IP checksum value. The
controller correctly computes checksum for normal TCP/UDP packets
though.
There are two known RTL8168/8111C variants in market and the issue
I observed happened on RL_HWREV_8168C_SPIN2. I'm not sure
RL_HWREV_8168C also has the same issue but it would be better to
assume it has the same issue since they shall share same core.
RTL8102E which is supposed to be released at the time of
RTL8168/8111C announcement does not have the issue.
Tested by: Konstantin V. Krotov ( kkv <> insysnet dot ru )
the controller has a kind of embedded controller/memory and vendor
applies a large set of magic code via undocumented PHY registers in
device initialization stage. I guess it's a firmware image for the
embedded controller in RTL8105E since the code is too big compared
to other DSP fixups. However I have no idea what that magic code
does and what's purpose of the embedded controller. Fortunately
driver seems to still work without loading the firmware.
While I'm here change device description of RTL810xE controller.
H/W donated by: Realtek Semiconductor Corp.
capability. One of reason using interrupt taskqueue in re(4) was
to reduce number of TX/RX interrupts under load because re(4)
controllers have no good TX/RX interrupt moderation mechanism.
Basic TX interrupt moderation is done by hardware for most
controllers but RX interrupt moderation through undocumented
register showed poor RX performance so it was disabled in r215025.
Using taskqueue to handle RX interrupt greatly reduced number of
interrupts but re(4) consumed all available CPU cycles to run the
taskqueue under high TX/RX network load. This can happen even with
RTL810x fast ethernet controller and I believe this is not
acceptable for most systems.
To mitigate the issue, use one-shot timer register to moderate RX
interrupts. The timer register provides programmable one-shot timer
and can be used to suppress interrupt generation. The timer runs at
125MHZ on PCIe controllers so the minimum time allowed for the
timer is 8ns. Data sheet says the register is 32 bits but
experimentation shows only lower 13 bits are valid so maximum time
that can be programmed is 65.528us. This yields theoretical maximum
number of RX interrupts that could be generated per second is about
15260. Combined with TX completion interrupts re(4) shall generate
less than 20k interrupts. This number is still slightly high
compared to other intelligent ethernet controllers but system is
very responsive even under high network load.
Introduce sysctl variable dev.re.%d.int_rx_mod that controls amount
of time to delay RX interrupt processing in units of us. Value 0
completely disables RX interrupt moderation. To provide old
behavior for controllers that have MSI/MSI-X capability, introduce
a new tunable hw.re.intr_filter. If the tunable is set to non-zero
value, driver will use interrupt taskqueue. The default value of
the tunable is 0. This tunable has no effect on controllers that
has no MSI/MSI-X capability or if MSI/MSI-X is explicitly disabled
by administrator.
While I'm here cleanup interrupt setup/teardown since re(4) uses
single MSI/MSI-X message at this moment.
recent PCIe controllers(RTL8102E or later and RTL8168/8111C or
later) supports either 2 or 4 MSI-X messages. Unfortunately vendor
did not publicly release RSS related information yet. However
switching to MSI-X is one-step forward to support RSS.
RTL8111C generated corrupted frames where TCP option header was
broken. All other sample controllers I have did not show such
problem so it could be RTL8111C specific issue. Because there are
too many variants it's hard to tell how many controllers have such
issue. Just disable TSO by default but have user override it.
controllers. Experimentation with RTL8102E, RTL8103E and RTL8105E
showed dramatic decrement of TX completion interrupts under high TX
load(e.g. from 147k interrupts/second to 10k interrupts/second)
With this change, TX interrupt moderation is applied to all
controllers except RTL8139C+.
GbE controllers. It seems these controllers no longer support
multi-fragmented RX buffers such that driver have to allocate
physically contiguous buffers.
o Retire RL_FLAG_NOJUMBO flag and introduce RL_FLAG_JUMBOV2 to
mark controllers that use new jumbo frame scheme.
o Configure PCIe max read request size to 4096 for standard frames
and reduce it to 512 for jumbo frames.
o TSO/checksum offloading is not supported for jumbo frames on
these controllers. Reflect it to ioctl handler and driver
initialization.
o Remove unused rl_stats_no_timeout in softc.
o Embed a pointer to structure rl_hwrev into softc to keep track
of controller MTU limitation and remove rl_hwrev in softc since
that information is available through a pointer to structure
rl_hwrev.
Special thanks to Realtek for donating sample hardwares which made
this possible.
H/W donated by: Realtek Semiconductor Corp.
limit maximum RX buffer size to RE_RX_DESC_BUFLEN instead of
blindly configuring it to 16KB. Due to lack of documentation, re(4)
didn't allow jumbo frame on these controllers. However it seems
controller is confused with jumbo frame such that it can DMA the
received frame to wrong address instead of splitting it into
multiple RX buffers. Of course, this caused panic.
Since re(4) does not support jumbo frames on these controllers,
make controller drop frame that is longer than RE_RX_DESC_BUFLEN
sized frame. Fortunately RTL810x controllers, which do not support
jumbo frame, have no such issues but this change also limited
maximum RX buffer size allowed to RTL810x controllers. Allowing
16KB RX buffer for controllers that have no such capability is
meaningless.
MFC after: 3 days
and just show old (cached) values. Controller will not respond to
the command unless MAC is enabled so DUMP request for down
interface caused request timeout.
RealTek changed TX descriptor format for later controllers so these
controllers require MSS configuration in different location of TX
descriptor. TSO is enabled by default for controllers that use new
descriptor format.
For old controllers, TSO is still disabled by default due to broken
frames under certain conditions but users can enable it.
Special thanks to Hayes Wang at RealTek.
MFC after: 2 weeks
not provide any MAC configuration interface for resolved flow
control parameters. There is even no register that configures water
mark which will control generation of pause frames.
However enabling flow control surely enhanced performance a lot.
It seems RTL8169/RTL8168/RTL810xE has a kind of interrupt
moderation mechanism but it is not documented at all. The magic
value dramatically reduced number of interrupts without noticeable
performance drops so apply it to all RTL8169/RTL8169 controllers.
Vendor's FreeBSD driver also applies it to RTL810xE controllers but
their Linux driver explicitly cleared the register, so do not
enable interrupt moderation for RTL810xE controllers.
While I'm here sort 8169 specific registers.
Obtained from: RealTek FreeBSD driver
There were a couple of attempts in the past to reduce it since it
took more than 1ms. Because mii_tick() periodically polls link
status, waiting more than 1ms for each GMII register access was
overkill. Unfortunately all previous attempts were failed with
various ways on different controllers.
This time, add additional 20us dealy at the end of GMII register
access which seems to requirement of all RealTek controllers to
issue next GMII register access request. This is the same way what
Linux does.
useful counters like rl_missed_pkts is 16 bits quantity which is
too small to hold meaningful information happened in a second. This
means driver should frequently read these counters in order not to
lose accuracy and that approach is too inefficient in driver's
view. Moreover it seems there is no way to trigger an interrupt to
detect counter near-full or wraparound event as well as lacking
clearing the MAC counters. Another limitation of reading the
counters from RealTek controllers is lack of interrupt firing at
the end of DMA cycle of MAC counter read request such that driver
have to poll the end of the DMA which is a time consuming process
as well as inefficient. The more severe issue of the MAC counter
read request is it takes too long to complete the DMA. All these
limitation made maintaining MAC counters in driver impractical. For
now, just provide simple sysctl interface to trigger reading the
MAC counters. These counters could be used to track down driver
issues. Users can read MAC counters maintained in controller with
the following command.
#sysctl dev.re.0.stats=1
While I'm here add check for validity of dma map and allocated
memory before unloading/freeing them.
Tested by: rmacklem
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
addresses; we now let mii_attach() only probe the PHY at the desired
address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
off from, partly even based on grabbing and using the softc of the
parent; we now pass these flags down from the NIC to the PHY drivers
via mii_attach(). This got us rid of all such hacks except those of
brgphy() in combination with bce(4) and bge(4), which is way beyond
what can be expressed with simple flags.
While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).
Reviewed by: jhb, yongari
frame, make sure to update VLAN capabilities whenever jumbo frame
is configured.
While I'm here rearrange interface capabilities configuration. The
controller requires VLAN hardware tagging to make TSO work on VLANs
so explicitly check this requirement.
Tx DMA burst size 2048, I beleive PCIe maximum read request size
also should match to the value of Tx DMA burst size. With this
change I can get more than 800Mbps for TCP bulk transfers.
Previously I was not able to get more than 700Mbps. If I enable TSO
it now shows 927Mbps.
reacquiring driver lock in Rx handler. re(4) drops a driver lock
before passing received frame to upper stack and reacquire the
lock. During the time window ioctl calls could be executed and if
the ioctl was interface down request, driver will stop the
controller and free allocated mbufs. After that when driver comes
back to Rx handler again it does not know what was happend so it
could access free mbufs which in turn cause panic.
Reported by: Norbert Papke < npapk <> acm dot org >
Tested by: Norbert Papke < npapk <> acm dot org >
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs. This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.
For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.
Approved by: re (kib)
MFC after: 6 weeks
CPU for too long period than necessary. Additively, interfaces are kept
polled (in the tick) even if no more packets are available.
In order to avoid such situations a new generic mechanism can be
implemented in proactive way, keeping track of the time spent on any
packet and fragmenting the time for any tick, stopping the processing
as soon as possible.
In order to implement such mechanism, the polling handler needs to
change, returning the number of packets processed.
While the intended logic is not part of this patch, the polling KPI is
broken by this commit, adding an int return value and the new flag
IFCAP_POLLING_NOCOUNT (which will signal that the return value is
meaningless for the installed handler and checking should be skipped).
Bump __FreeBSD_version in order to signal such situation.
Reviewed by: emaste
Sponsored by: Sandvine Incorporated
checksum offload frames. Software workaround used for broken
controllers(RTL8169, RTL8168, RTL8168B) seem to cause watchdog
timeouts on RTL8139C+.
Introduce a new flag RL_FLAG_AUTOPAD to mark automatic padding
feature of controller and set it for RTL8139C+ and controllers that
use new descriptor format. This fixes watchdog timeouts seen on
RTL8139C+.
Reported by: Dimitri Rodis < DimitriR <> integritasystems dot com >
Tested by: Dimitri Rodis < DimitriR <> integritasystems dot com >
It seems that RTL8168D and RTL8102EL requires additional settle
time to complete RL_PHYAR register write. Accessing RL_PHYAR
register right after the write causes errors for subsequent PHY
register accesses.
Tested by: george at luckytele dot com,
Steve Wills < STEVE at stevenwills dot com >
mapping. The tunable is OFF for all controllers except RTL8169SC
family. RTL8169SC seems to require more magic to use memory
register mapping. r187483 added a fix for RTL8169SCe controller but
it does not looke like fix other variants of RTL8169SC.
Tested by: Gavin Stone-Tolcher g.stone-tolcher <> its dot uq dot edu dot au
controllers that lose Tx completion interrupts under certain
conditions. With this change it's safe to use MSI on PCIe
controllers so enable MSI on these controllers.
- Always program RX configuration register from scratch instead of
doing read/modify/write.
- Rename re_setmulti() to re_set_rxmode() to be reflect reality.
- Simplify hash filter logic a little while I am here.
Reviewed by: yongari (early version)