96 Commits

Author SHA1 Message Date
glebius
ba78c18044 Convert to if_foreach_llmaddr() KPI. 2019-10-21 18:08:12 +00:00
mmacy
7aeac9ef18 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
glebius
684180d1f9 Mechanically convert to if_inc_counter(). 2014-09-18 20:30:47 +00:00
glebius
9b93b159b3 Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
jhb
e8769d6be2 Fix various NIC drivers to properly cleanup static DMA resources.
In particular, don't check the value of the bus_dma map against NULL
to determine if either bus_dmamem_alloc() or bus_dmamap_load() succeeded.
Instead, assume that bus_dmamap_load() succeeeded (and thus that
bus_dmamap_unload() should be called) if the bus address for a resource
is non-zero, and assume that bus_dmamem_alloc() succeeded (and thus
that bus_dmamem_free() should be called) if the virtual address for a
resource is not NULL.

In many cases these bugs could result in leaks when a driver was detached.

Reviewed by:	yongari
MFC after:	2 weeks
2014-06-11 14:53:58 +00:00
glebius
ff6e113f1b The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
glebius
a69aaa7721 Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.
2012-12-04 09:32:43 +00:00
kevlo
0105e3c818 Remove unused variable mii 2012-02-11 08:12:52 +00:00
marius
17e14c6132 - There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
  (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
  since r52045) but even recently added device drivers do this unnecessarily.
  Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
  Discussed with: jhb
- Also while at it, use __FBSDID.
2011-11-22 21:28:20 +00:00
yongari
2a52428fd6 Announce flow control capability to underlying PHY driver.
Pause timer value is initialized to 0xFFFF. Controller allows just
4 different TX pause thresholds. The lowest possible threshold
value looks too aggressive so use next available threshold value.
2011-11-22 20:57:06 +00:00
yongari
6cf42ed3b9 Rework link establishment and link state detection logic.
- Remove MIIBUS statchg callback and program VGE_DIAGCTL before
   initiating link establishment.  Previously driver used to
   program VGE_DIAGCTL after getting a link in statchg callback.
   It seems the VGE_DIAGCTL register works like a kind of MII
   register such that it requires setting a 'to be' mode in advance
   rather than relying on resolved speed/duplex of established link.
   This means the statchg callback is not needed in driver.  In
   addition, if there was no link at the time of media change, this
   was not called at all.
 - Introduce vge_ifmedia_upd_locked() to change current media to
   configured one.  Actual media change is performed only after PHY
   reset and VGE_DIAGCTL setup.
 - In WOL configuration, make sure to clear forced mode such that
   controller can rely on auto-negotiation.
 - Unlike most other drivers that use miibus(4), vge(4) used
   controller's auto-polling feature for link state tracking via
   interrupt.  This came from controller's inefficient mechanism to
   access MII registers.  On link state change interrupt, vge(4)
   used to get current link state with series of MII register
   accesses.  Because vge(4) already enabled auto polling, read PHY
   status register to resolved speed/duplex/flow control parameters.

vge(4) still does not drive MII_TICK to reduce number of MII
register accesses which in turn means the driver does not know the
status of auto-negotiation.  This was a one of long standing
issue of vge(4).  Probably driver may be able to implement a timer
that keeps track of auto-negotiation state and restart
auto-negotiation when driver couldn't establish a link within a
specified period.  However the controller does not provide a
reliable way to detect auto-negotiation failure so I'm not sure
whether it's worth to implement it in driver.

Alternatively driver can completely disable MII auto-polling and
let miibus(4) poll link state by driving MII_TICK.  This may reduce
unnecessary overhead of stopping/restarting MII auto-polling of
controller.  Unfortunately it was known that some variants of
controller does not work correctly if MII auto-polling is disabled.
2011-11-22 20:45:09 +00:00
yongari
486dcdb089 Always start MII auto polling before accessing any MII registers. 2011-11-22 18:58:39 +00:00
yongari
4c371e596b Close a race where SIOCGIFMEDIA ioctl get inconsistent link status.
Because driver is accessing a common MII structure in
mii_pollstat(), updating user supplied structure should be done
before dropping a driver lock.

Reported by:	Karim (fodillemlinkarimi <> gmail dot com)
2011-10-17 19:49:00 +00:00
yongari
b3819a3c48 vge(4) hardwares poll media status and generates an interrupt
whenever the link state is changed.  Using software based polling
for media status tracking is known to cause MII access failure
under certain conditions once link is established so vge(4) used to
rely on link status change interrupt.
However DEVICE_POLLING completely disables generation of all kind
of interrupts on vge(4) such that this resulted in not detecting
link state change event.  This means vge(4) does not correctly
detect established/lost link with DEVICE_POLLING.  Losing the
interrupt made vge(4) not to send any packets to peer since vge(4)
does not try to send any packets when there is no established link.

Work around the issue by generating link state change interrupt
with DEVICE_POLLING.

PR:		kern/160442
Approved by:	re (kib)
2011-09-07 16:57:43 +00:00
yongari
8b344f95a5 Datasheet says vge(4) controllers support DAC but it seems that's
not true on old PCI based controllers.  DAC configuration is read
from EEPROM in device reset phase and driver can override DAC
configuration.  However I guess there is an undocumented reason why
EEPROM configuration does not enable DAC so do not blindly override
DAC configuration.  Recent PCIe based controllers are supposed to
support 64bit DMA so allow 64bit DMA only on PCIe based controllers.

PR:		kern/157184
MFC after:	1 week
2011-05-20 18:27:13 +00:00
jhb
00c3c01f4f Do a sweep of the tree replacing calls to pci_find_extcap() with calls to
pci_find_cap() instead.
2011-03-23 13:10:15 +00:00
marius
385153aa98 Convert the PHY drivers to honor the mii_flags passed down and convert
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
  addresses; we now let mii_attach() only probe the PHY at the desired
  address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
  off from, partly even based on grabbing and using the softc of the
  parent; we now pass these flags down from the NIC to the PHY drivers
  via mii_attach(). This got us rid of all such hacks except those of
  brgphy() in combination with bce(4) and bge(4), which is way beyond
  what can be expressed with simple flags.

While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).

Reviewed by:	jhb, yongari
2010-10-15 14:52:11 +00:00
yongari
2498334877 Remove wrong assertion. 2009-12-25 00:23:47 +00:00
yongari
a303d4ee71 Disable jumbo frame support for PCIe VT6130/VT6132 controllers.
Quite contrary to VT6130 datasheet which says it supports up to 8K
jumbo frame, VT6130 does not seem to send jumbo frame that is
larger than 4K in length. Trying to send a frame that is larger
than 4K cause TX MAC hang.
Even though it's possible to allow 4K jumbo frame for VT6130, I
think it's meaningless to allow 4K jumbo frame. I'm not sure VT6132
also has the same limitation but I guess it uses the same MAC of
VT6130.
2009-12-20 19:45:46 +00:00
yongari
d7fc8c3ed3 VT6130 datasheet was wrong. If VT6130 receive a jumbo frame the
controller will split the jumbo frame into multiple RX buffers.
However it seems the hardware always dma the frame to 8 bytes
boundary for the split frames. Only the first part of the fragment
can have 4 byte alignment and subsequent buffers should be 8 bytes
aligned. Change RX buffer the alignment requirement to 8 bytes from
4 bytes.
2009-12-20 19:11:32 +00:00
yongari
8af5f7e769 Correct fragment bit definition in comments. 2009-12-20 18:53:34 +00:00
yongari
8cc49ae53c Swap VGE_TXQTIMER and VGE_RXQTIMER register definition. Pending
timer for Tx queue is at 0x3E.
2009-12-19 20:45:23 +00:00
yongari
23a0015214 Add rudimentary WOL support. While I'm here remove enabling
busmastering/memory address in resume path. Bus driver will handle
that.
2009-12-18 22:14:28 +00:00
yongari
17682087e2 Remove unused member variable of softc. 2009-12-17 19:48:54 +00:00
yongari
bd0be472d9 Actually clear interrupts. Writing 0 has no effect. 2009-12-17 18:03:05 +00:00
yongari
3672ee2e9a Implement interrupt moderation scheme supported by VT61xx
controllers. TX/RX interrupt mitigation is controlled by
VGE_TXSUPPTHR and VGE_RXSUPPTHR register. These registers suppress
generation of interrupts until the programmed frames counter equals
to the registers. VT61xx also supports interrupt hold off timer
register. If this interrupt hold off timer is active all interrupts
would be disabled until the timer reaches to 0. The timer value is
reloaded whenever VGE_ISR register written. The timer resolution is
about 20us.

Previously vge(4) used single shot timer to reduce Tx completion
interrupts. This required VGE_CRS1 register access in Tx
start/completion handler to rearm new timeout value and it did not
show satisfactory result(more than 50k interrupts under load). Rx
interrupts was not moderated at all such that vge(4) used to
generate too many interrupts which in turn made polling(4) better
approach under high network load.

This change activates all interrupt moderation mechanism and
initial values were tuned to generate interrupt less than 8k per
second. That number of interrupts wouldn't add additional packet
latencies compared to polling(4). These interrupt parameters could
be changed with sysctl.
dev.vge.%d.int_holdoff
dev.vge.%d.rx_coal_pkt
dev.vge.%d.tx_coal_pkt
Interface has be brought down and up again before change take
effect.

With interrupt moderation there is no more need to loop in
interrupt handler. This loop always added one more register access.
While I'm here remove dead code which tried to implement subset of
interrupt moderation.
2009-12-17 18:00:25 +00:00
yongari
1961bff3a9 Remove unused VGE_ETHER_ALIGN definition. 2009-12-17 17:38:06 +00:00
yongari
760ca3284b Add "Velocity" to probe message which will make it clearer which
ethernet controller was recognized. VIA consistently calls
"Velocity" family for gigabit ethernet controllers. For fast
ethernet controllers they uses "Rhine" family(vr(4) controllers))
and vr(4) already shows "Rhine" in probe message.
2009-12-16 20:03:43 +00:00
yongari
e75eb0a487 Add new flag VGE_FLAG_SUSPENDED to mark suspended state and
remove suspended member in softc.
2009-12-16 19:49:23 +00:00
yongari
c57b5821c5 Add hardware MAC statistics support. This statistics could be
extracted from dev.vge.%d.stats sysctl node.
2009-12-16 19:41:40 +00:00
yongari
5eb45f07cb Rewrite RX filter setup and simplify code.
Now promiscuous mode and multicast handling is performed in single
function, vge_rxfilter().
2009-12-16 19:32:44 +00:00
yongari
2df754472e All vge(4) controllers support RX/TX checksum offloading for VLAN
tagged frames so add checksum offloading capabilities. Also add
missing VLAN hardware tagging control in ioctl handler and let
upper stack know current VLAN capabilities.
2009-12-16 18:03:25 +00:00
yongari
2fc41157e8 Tell upper layer vge(4) supports long frames. This should be done
after ether_ifattach(), as ether_ifattach() initializes it with
ETHER_HDR_LEN.
While I'm here remove setting if_mtu, it's already handled in
ether_ifattach().
2009-12-14 22:55:20 +00:00
yongari
a1a80aabeb Don't report current link status if interface is not UP.
If interface is not UP, the current link status wouldn't
reflect the negotiated status.
2009-12-14 22:30:07 +00:00
yongari
b326f2699c Report media change result to caller instead of returning success
without regard to the result.
2009-12-14 22:23:06 +00:00
yongari
34ca1e0cd8 Whenever link state change interrupt is raised, vge_tick() is
called and vge(4) used to drive auto-negotiation timer(mii_tick) in
vge_tick(). Therefore the mii_tick was not called for every hz such
that auto-negotiation complete was never handled in vge(4).
Use mii_pollstat to extract current negotiated speed/duplex instead
of mii_tick. The latter is valid only for auto-negotiation case.
While I'm here change the confusing function name vge_tick() to
vge_link_statchg().
2009-12-14 22:20:05 +00:00
yongari
c23c48f2e6 Sort function prototyes. 2009-12-14 22:00:11 +00:00
yongari
2f7dcd7497 We don't have to reload EEPROM in vge_reset(). Because vge_reset()
is called in vge_init_lock(), vge(4) always used to reload EEPROM.
Also add more comment why vge(4) clears VGE_CHIPCFG0_PACPI bit.
While I'm here add missing new line in vge_reset().
2009-12-14 21:16:02 +00:00
yongari
c7f8314b75 Increase output queue size from 64 to 255. 2009-12-14 20:59:18 +00:00
yongari
6903a53024 Add MSI support for VT613x controllers. 2009-12-14 20:49:50 +00:00
yongari
168c3be20e Save PHY address by reading VGE_MIICFG register. For PCIe
controllers(VT613x), we assume the PHY address is 1.
Use the saved PHY address in MII register access routines and
remove accessing VGE_MIICFG register.
While I'm here save PCI express capability register which will be
used in near future.
2009-12-14 20:39:42 +00:00
yongari
97ba1f5c72 Introduce vge_flags member in softc. The vge_flags member will
record device specific bits. Remove vge_link and use vge_flags.
While here, move clearing link state before mii_mediachg() as
mii_mediachg() may affect link state.
2009-12-14 20:17:53 +00:00
yongari
c84f3aa93d style(9). 2009-12-14 20:07:25 +00:00
yongari
4d65295e78 s/u_intXX_t/uintXX_t/g 2009-12-14 19:53:57 +00:00
yongari
ed72fffe71 Remove unnecessary return statement. 2009-12-14 19:49:20 +00:00
yongari
3fbe7c396f Use ANSI function definations. 2009-12-14 19:44:54 +00:00
yongari
fb3e5ec422 Clear VGE_TXDESC_Q bit for transmitted frames. The VGE_TXDESC_Q bit
seems to work like a tag that indicates 'not list end' of queued
frames. Without having a VGE_TXDESC_Q bit indicates 'list end'. So
the last frame of multiple queued frames has no VGE_TXDESC_Q bit.
The hardware has peculiar behavior for VGE_TXDESC_Q bit handling.
If the VGE_TXDESC_Q bit of descriptor was set the controller would
fetch next descriptor. However if next descriptor's OWN bit was
cleared but VGE_TXDESC_Q was set, it could confuse controller.
Clearing VGE_TXDESC_Q bit for transmitted frames ensure correct
behavior.
2009-12-14 19:08:11 +00:00
yongari
6e554c2edf Fix typo in register definition. 2009-12-14 18:50:26 +00:00
yongari
e58dd71e01 Use PCIR_BAR instead of hard-coded value. 2009-12-14 18:49:16 +00:00