a taskqueue.
This gives a 16% performance improvement under high load on slow systems,
especially when vr shares an interrupt with another device, which is
common with the Alix x86 boards.
Contrary to the other devices, I left the interrupt processing for loop
in because there was no significant difference in performance and this
should avoid enqueuing more taskqueues unnecessarily.
We also decided to move the vr_start_locked() call inside the for loop
because we found out that it helps performance since TCP ACKs now have a
chance to go out quicker.
Reviewed by: yongari (older version, same idea)
Discussed with: yongari, jhb
control for all vr(4) controllers that support it. It's known that
old vr(4) controllers(Rhine II) does not support TX pause but Rhine
III supports both TX and RX pause.
Make TX pause really work on Rhine III by letting controller know
available RX buffers.
While here, adjust XON/XOFF parameters to get better performance
with flow control.
using member variables in softc.
While I'm here change media after setting IFF_DRV_RUNNING. This
will remove unnecessary link state handling in vr_tick() if
controller established a link immediately.
register both status change and link state change callbacks.
Implement checking valid link in state change callback and poll
active link state in vr_tick(). This allows immediate detection of
lost link as well as protecting driver from frequent link flips during
link renegotiation. taskq implementation was removed because driver
now needs to poll link state in vr_tick().
While I'm here do not report current link state if interface is not
running.
Tested by: n_hibma
MFC after: 1 week
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
addresses; we now let mii_attach() only probe the PHY at the desired
address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
off from, partly even based on grabbing and using the softc of the
parent; we now pass these flags down from the NIC to the PHY drivers
via mii_attach(). This got us rid of all such hacks except those of
brgphy() in combination with bce(4) and bge(4), which is way beyond
what can be expressed with simple flags.
While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).
Reviewed by: jhb, yongari
vr(4) overhauling(r177050).
It seems that filtering multicast addresses with multicast CAM
entries require accessing 'CAM enable bit' for each CAM entry.
Subsequent accessing multicast CAM control register without
toggling the 'CAM enable bit' seem to no effects.
In order to fix that separate CAM setup from CAM mask configuration
and CAM entry modification. While I'm here add VLAN CAM filtering
feature which will be enabled in future(FreeBSD now can receive
VLAN id insertion/removal event from vlan(4) on the fly).
For VT6105M hardware, explicitly disable VLAN hardware tag
insertion/stripping and enable VLAN CAM filtering for VLAN id 0.
This shall make non-VLAN frames set VR_RXSTAT_VIDHIT bit in Rx
status word.
Added multicast/VLAN CAM address definition to header file.
PR: kern/125010, kern/125024
MFC after: 1 week
years. All datasheet I have indicates the bit 15 is the
VR_RXSTAT_RX_OK. The bit 14 is reserved for all Rhine family
except VT6105M. VT6105M uses that bit to indicate a VLAN frame
with matching CAM VLAN id.
Use the VR_RXSTAT_RX_OK instead of VR_RXSTAT_RXERR when vr(4)
checks the validity of received frame.
This should fix occasional dropping frames on VT6105M.
Tested by: Goran Lowkrantz ( goran.lowkrantz at ismobile dot com )
MFC after: 1 week
state change and reliable error recovery.
o Moved vr_softc structure and relevant macros to header file.
o Use PCIR_BAR macro to get BARs.
o Implemented suspend/resume methods.
o Implemented automatic Tx threshold configuration which will be
activated when it suffers from Tx underrun. Also Tx underrun
will try to restart only Tx path and resort to previous
full-reset(both Rx/Tx) operation if restarting Tx path have failed.
o Removed old bit-banging MII interface. Rhine provides simple and
efficient MII interface. While I'm here show PHY address and PHY
register number when its read/write operation was failed.
o Define VR_MII_TIMEOUT constant and use it in MII access routines.
o Always honor link up/down state reported by mii layers. The link
state information is used in vr_start() to determine whether we
got a valid link.
o Removed vr_setcfg() which is now handled in vr_link_task(), link
state taskqueue handler. When mii layer reports link state changes
the taskqueue handler reprograms MAC to reflect negotiated duplex
settings. Flow-control changes are not handled yet and it should
be revisited when mii layer knows the notion of flow-control.
o Added a new sysctl interface to get statistics of an instance of
the driver.(sysctl dev.vr.0.stats=1)
o Chip name was renamed to reflect the official name of the chips
described in VIA Rhine I/II/III datasheet.
REV_ID_3065_A -> REV_ID_VT6102_A
REV_ID_3065_B -> REV_ID_VT6102_B
REV_ID_3065_C -> REV_ID_VT6102_C
REV_ID_3106_J -> REV_ID_VT6105_A0
REV_ID_3106_S -> REV_ID_VT6105M_A0
The following chip revisions were added.
#define REV_ID_VT6105_B0 0x83
#define REV_ID_VT6105_LOM 0x8A
#define REV_ID_VT6107_A0 0x8C
#define REV_ID_VT6107_A1 0x8D
#define REV_ID_VT6105M_B1 0x94
o Always show chip revision number in device attach. This shall help
identifying revision specific issues.
o Check whether EEPROM reloading is complete by inspecting the state
of VR_EECSR_LOAD bit. This bit is self-cleared after the EEPROM
reloading. Previously vr(4) blindly spins for 200us which may/may
not enough to complete the EEPROM reload.
o Removed if_mtu setup. It's done in ether_ifattach().
o Use our own callout to drive watchdog timer.
o In vr_attach disable further interrupts after reset. For VT6102 or
newer hardwares, diable MII state change interrupt as well because
mii state handling is done by mii layer.
o Add more sane register initialization for VT6102 or newer chips.
- Have NIC report error instead of retrying forever.
- Let hardware detect MII coding error.
- Enable MODE10T mode.
- Enable memory-read-multiple for VT6107.
o PHY address for VT6105 or newer chips is located at fixed address 1.
For older chips the PHY address is stored in VR_PHYADDR register.
Armed with these information, there is no need to re-read
VR_PHYADDR register in miibus handler to get PHY address. This
saves one register access cycle for each MII access.
o Don't reprogram VR_PHYADDR register whenever access to a register
located at a PHY address is made. Rhine fmaily allows reprogramming
PHY address location via VR_PHYADDR register depending on
VR_MIISTAT_PHYOPT bit of VR_MIISTAT register. This used to lead
numerous phantom PHYs attached to miibus during phy probe phase and
driver used to limit allowable PHY address in mii register accessors
for certain chip revisions. This removes one more register access
cycle for each MII access.
o Correctly set VLAN header length.
o bus_dma(9) conversion.
- Limit DMA access to be in range of 32bit address space. Hardware
doesn't support DAC.
- Apply descriptor ring alignment requirements(16 bytes alignment)
- Apply Rx buffer address alignment requirements(4 bytes alignment)
- Apply Tx buffer address alignment requirements(4 bytes alignment)
for Rhine I chip. Rhine II or III has no Tx buffer address
alignment restrictions, though.
- Reduce number of allowable number of DMA segments to 8.
- Removed the atomic(9) used in descriptor ownership managements
as it's job of bus_dmamap_sync(9).
With these change vr(4) should work on all platforms.
o Rhine uses two separated 8bits command registers to control Tx/Rx
MAC. So don't access it as a single 16bit register.
o For non-strict alignment architectures vr(4) no longer require
time-consuming copy operation for received frames to align IP
header. This greatly improves Rx performance on i386/amd64
platforms. However the alignment is still necessary for
strict-alignment platforms(e.g. sparc64). The alignment is handled
in new fuction vr_fixup_rx().
o vr_rxeof() now rejects multiple-segmented(fragmented) frames as
vr(4) is not ready to handle this situation. Datasheet said nothing
about the reason when/why it happens.
o In vr_newbuf() don't set VR_RXSTAT_FIRSTFRAG/VR_RXSTAT_LASTFRAG
bits as it's set by hardware.
o Don't pass checksum offload information to upper layer for
fragmented frames. The hardware assisted checksum is valid only
when the frame is non-fragmented IP frames. Also mark the checksum
is valid for corrupted frames such that upper layers doesn't need
to recompute the checksum with software routine.
o Removed vr_rxeoc(). RxDMA doesn't seem to need to be idle before
sending VR_CMD_RX_GO command. Previously it used to stop RxDMA
first which in turn resulted in long delays in Rx error recovery.
o Rewrote Tx completion handler.
- Always check VR_TXSTAT_OWN bit in status word prior to
inspecting other status bits in the status word.
- Collision counter updates were corrected as VT3071 or newer
ones use different bits to notify collisions.
- Unlike other chip revisions, VT86C100A uses different bit to
indicate Tx underrun. For VT3071 or newer ones, check both
VR_TXSTAT_TBUFF and VR_TXSTAT_UDF bits to see whether Tx
underrun was happend. In case of Tx underrun requeue the failed
frame and restart stalled Tx SM. Also double Tx DMA threshold
size on each failure to mitigate future Tx underruns.
- Disarm watchdog timer only if we have no queued packets,
otherwise don't touch watchdog timer.
o Rewrote interrupt handler.
- status word in Tx/Rx descriptors indicates more detailed error
state required to recover from the specific error. There is no
need to rely on interrupt status word to recover from Tx/Rx
error except PCI bus error. Other event notifications like
statistics counter overflows or link state events will be
handled in main interrupt handler.
- Don't touch VR_IMR register if we are in suspend mode. Touching
the register may hang the hardware if we are in suspended state.
Previously it seems that touching VR_IMR register in interrupt
handler was to work-around panic occurred in system shutdown
stage on SMP systems. I think that work-around would hide
root-cause of the panic and I couldn't reproduce the panic
with multiple attempts on my box.
o While padding space to meet minimum frame size, zero the pad data
in order to avoid possibly leaking sensitive data.
o Rewrote vr_start_locked().
- Don't try to queue packets if number of available Tx descriptors
are short than that of required one.
o Don't reinitialize hardware whenever media configuration is
changed. Media/link state changes are reported from mii layer if
this happens and vr_link_task() will perform necessary changes.
o Don't reinitialize hardware if only PROMISC bit was changed. Just
toggle the PROMISC bit in hardware is sufficient to reflect the
request.
o Rearrganed the IFCAP_POLLING/IFCAP_HWCSUM handling in vr_ioctl().
o Generate Tx completion interrupts for every VR_TX_INTR_THRESH-th
frames. This reduces Tx completion interrupts under heavy network
loads.
o Since vr(4) doesn't request Tx interrupts for every queued frames,
reclaim any pending descriptors not handled in Tx completion
handler before actually firing up watchdog timeouts.
o Added vr_tx_stop()/vr_rx_stop() to wait for the end of active
TxDMA/RxDMA cycles(draining). These routines are used in vr_stop()
to ensure sane state of MAC before releasing allocated Tx/Rx
buffers. vr_link_task() also takes advantage of these functions to
get to idle state prior to restarting Tx/Rx.
o Added vr_tx_start()/vr_rx_start() to restart Rx/Tx. By separating
Rx operation from Tx operation vr(4) no longer need to full-reset
the hardware in case of Tx/Rx error recovery.
o Implemented WOL.
o Added VT6105M specific register definitions. VT6105M has the
following hardware capabilities.
- Tx/Rx IP/TCP/UDP checksum offload.
- VLAN hardware tag insertion/extraction. Due to lack of information
for getting extracted VLAN tag in Rx path, VLAN hardware support
was not implemented yet.
- CAM(Content Addressable Memory) based 32 entry perfect multicast/
VLAN filtering.
- 8 priority queues.
o Implemented CAM based 32 entry perfect multicast filtering for
VT6105M. If number of multicast entry is greater than 32, vr(4)
uses traditional hash based filtering.
o Reflect real Tx/Rx descriptor structure. Previously vr(4) used to
embed other driver (private) data into these structure. This type
of embedding make it hard to work on LP64 systems.
o Removed unused vr_mii_frame structure and MII bit-baning
definitions.
o Added new PCI configuration registers that controls mii operation
and mode selection.
o Reduced number of Tx/Rx descriptors to 128 from 256. From my
testing, increasing number of descriptors above than 64 didn't help
increasing performance at all. Experimentations show 128 Rx
descriptors seems to help a lot reducing Rx FIFO overruns under
high system loads. It seems the poor Tx performance of Rhine
hardwares comes from the limitation of hardware. You wouldn't
satuarte the link with vr(4) no matter how fast CPU/large number of
descriptors are used.
o Added vr_statistics structure to hold various counter values.
No regression was reported but one variant of Rhine III(VT6105M)
found on RouterBOARD 44 does not work yet(Reported by Milan Obuch).
I hope this would be resolved in near future.
I'd like to say big thanks to Mike Tancsa who kindly donated a Rhine
hardware to me. Without his enthusiastic testing and feedbacks
overhauling vr(4) never have been possible. Also thanks to Masayuki
Murayama who provided some good comments on the hardware's internals.
This driver is result of combined effort of many users who provided
many feedbacks so I'd like to say special thanks to them.
Hardware donated by: Mike Tancsa (mike AT sentex dot net)
Reviewed by: remko (initial version)
Tested by: Mike Tancsa(x86), JoaoBR ( joao AT matik DOT com DOT br )
Marcin Wisnicki ( mwisnicki+freebsd AT gmail DOT com )
Stefan Ehmann ( shoesoft AT gmx DOT net )
Florian Smeets ( flo AT kasimir DOT com )
Phil Oleson ( oz AT nixil DOT net )
Larry Baird ( lab AT gta DOT com )
Milan Obuch ( freebsd-current AT dino DOT sk )
remko (initial version)
The 6105M and 6102 does not have the DWORD alignment problem, so
don't m_defrag() every packet in the transmit path for those.
More stringent usage of tx-descriptor ring and its flags.
Tested on 6102 and 6105M, other chips may also be able to run
without the m_defrag() but I have neither hardware nor docs to
find out.
Sponsored by: Soekris Engineering
if_ioctl, if_watchdog, etc, or in functions that are used by
these methods only. In all other cases use device_printf().
This also fixes several panics, when if_printf() is called before
softc->ifp was initialized.
Submitted by: Alex Lyashkov <umka sevcity.net>
I had to initialize the ifnet a bit earlier in attach so that the
if_printf()'s in vr_reset() didn't explode with a page fault.
- Use M_ZERO with contigmalloc() rather than an explicit bzero.
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.
This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.
Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.
Reviewed by: sobomax, sam
the packets are immediately returned for sending (e.g. when bridging
or packet forwarding). There are more efficient ways to do this
but for now use the least intrusive approach.
Reviewed by: imp, rwatson
using the Rhine's internal shift registers which are designed
for the job. This reduces the amount of time we wait around shifting
bits, and seems to work better with some chips.
Also, provide a workaround for some newer cards which report fake PHYs
at multiple addresses. (As more cards are ID'd, I'm sure this part
of the code will have to be expanded to cover more cases.)
Submitted by: Thomas Nystrom <thn@saeab.se>
MFC after: 1 week
under load.
This patch has been tested by Thomas and other for more than a month now,
and all (known) hangs seem to be solved.
Thomas's explanation of the patch:
* Fix the problem with the printing of the RX-error.
* Code from if_fet do better deal with the RX-recovery including a
timeout of the RX-turnoff.
* The call to vr_rxeof before vr_rxeoc have been moved to a point
where the RX-part of the chip is turned off. Otherwise there is a
window where new data could have been written to the buffer chain
before the RX-part is turned off. If this happens the chip will see
a busy rx-buffer. I have no evidence that this have occured but
god knows what the chip will do in this case!
* I have added a timeout of the TX-turnoff. I have checked and in
my 900 MHz system the flags for turnoff (both RX & TX) is seen at
the first check in the loop.
* I could see that I got the VR_ISR_DROPPED interrupt sometimes and
started to thinking about this. I then realized that no recovery is
needed for this case and therefore I only count it as an rxerror
(which was not done before).
* Finally I have changed the FIFO RX threshhold to 128 bytes. When I
did this the VR_ISR_DROPPED interrupt went away. Theory: The chip
will receive a complete frame before it tries to write it out to
memory then the RX threshold is set to store'n'forward. IF the frame
is large AND the next rx frame also is large AND the bus is busy
transfering a TX frame to the TX fifo THEN the second received
frame wont fit in the FIFO and is then dropped. By having the RX
threshold set to 128 the RX fifo is emptied faster.
MFC after: 5 days
1. Detect the revision of the Rhine chip we're using.
2. Use the force reset command on revisions which support
it whenever the normal reset command fails.
This should solve a wide range of "my vr0 locks up with reset
failed messages" problems. (Although the root causes should
be eventually tracked down.)
Tested by: grenville armitage <garmitage@swin.edu.au>
Obtained from: Via's if_fet driver
MFC after: 3 days
Approved by: re
elimiates the driver lockup problem reported by many.
Concepts used were taken from Via's if_fet driver. Verification
and implementation were done by Thomas Nystrom.
Submitted by: Thomas Nystrom <thn@saeab.se>
MFC after: 3 days
sizes. Previously, the end result was at the mercy of the card's default
setting. This change will reduce the number of buffer underruns for
some users.
PR: kern/37929
Submitted by: Thomas Nystrom <thn@saeab.se>
MFC after: 7 days
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
takes care of all the 10/100 and gigE PCI drivers that I've done.
Next will be the wireless drivers, then the USB ones. I may pick up
some stragglers along the way. I'm sort of playing this by ear: if
anyone spots any places where I've screwed up horribly, please let me
know.
a module. Also modified the code to work on FreeBSD/alpha and added
device vr0 to the alpha GENERIC config.
While I was in the neighborhood, I noticed that I was still using
#define NFPX 1 in all of the Makefiles that I'd copied from the fxp
module. I don't really use #define Nfoo X so it didn't matter, but
I decided to customize this correctly anyway.
- Change to the same transmit scheme as the PNIC driver.
- Dynamically set the cache alignment, and set burst size the same as
the PNIC driver in mx_init().
- Enable 'store and forward' mode by default. This is the slowest option
and it does reduce 100Mbps performance somewhat, but it's the most
reliable setting I can find. I'm more interested in having the driver
work reliably than trying to squeeze the best performance out of it.
The reason I'm doing this is that on *some* systems you may see a lot
of transmit underruns (which I can't explain: these are *fast* test
systems) and these errors seem to cause unusual and decidedly
non-tulip-like behavior. In normal 10Mbps mode, performance is fine
(you can easily saturate a 10Mbps link).
Also tweak some of the other drivers:
- Increase the size of the TX ring for the Winbond, ASIX, VIA Rhine
and PNIC drivers.
- Set a larger value for ifq_maxlen in the ThunderLAN driver. The setting
of TL_TX_LIST_CNT - 1 is too low (the ThunderLAN driver only allocates
20 transmit descriptors, and I don't want to fiddle with that now
because the ThunderLAN's descriptor structure is an oddball size
compared to the others).
Addtron appear to have their own VIA Rhine II and RealTek 8139 boards
with custom PCI vendor and device IDs. This commit updates the PCI
vendor and device lists in the vr and rl drivers so that we can probe
the additional devices.
Found by: nosing around the PCI vendor and device code list at:
http://www.halcyon.com/scripts/jboemler/pci/pcicode
performance and reliability a little. There was a condition before
where transmission would stall during periods of heavy traffic
exchange between two hosts. Also set the 'want interrupt' bit in
receive descriptor control words.
PCI fast ethernet adapters, plus man pages.
if_pn.c: Netgear FA310TX model D1, LinkSys LNE100TX, Matrox FastNIC 10/100,
various other PNIC devices
if_mx.c: NDC Communications SOHOware SFA100 (Macronix 98713A), various
other boards based on the Macronix 98713, 98713A, 98715, 98715A
and 98725 chips
if_vr.c: D-Link DFE530-TX, other boards based on the VIA Rhine and
Rhine II chips (note: the D-Link and certain other cards
that actually use a Rhine II chip still return the PCI
device ID of the Rhine I. I don't know why, and it doesn't
really matter since the driver treats both chips the same
anyway.)
if_wb.c: Trendware TE100-PCIE and various other cards based on the
Winbond W89C840F chip (the Trendware card is identical to
the sample boards Winbond sent me, so who knows how many
clones there are running around)
All drivers include support for ifmedia, BPF and hardware multicast
filtering.
Also updated GENERIC, LINT, RELNOTES.TXT, userconfig and
sysinstall device list.
I also have a driver for the ASIX AX88140A in the works.