Commit Graph

31 Commits

Author SHA1 Message Date
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Pedro F. Giffuni
8dfea46460 Remove slightly used const values that can be replaced with nitems().
Suggested by:	jhb
2016-04-21 15:38:28 +00:00
Gleb Smirnoff
c8dfaf382f Mechanically convert to if_inc_counter(). 2014-09-19 03:51:26 +00:00
Gleb Smirnoff
1bffa9511f Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
John Baldwin
068d8643ad Fix various NIC drivers to properly cleanup static DMA resources.
In particular, don't check the value of the bus_dma map against NULL
to determine if either bus_dmamem_alloc() or bus_dmamap_load() succeeded.
Instead, assume that bus_dmamap_load() succeeeded (and thus that
bus_dmamap_unload() should be called) if the bus address for a resource
is non-zero, and assume that bus_dmamem_alloc() succeeded (and thus
that bus_dmamem_free() should be called) if the virtual address for a
resource is not NULL.

In many cases these bugs could result in leaks when a driver was detached.

Reviewed by:	yongari
MFC after:	2 weeks
2014-06-11 14:53:58 +00:00
Pyun YongHyeon
52ee8ac027 Increase the number of TX DMA segments from 32 to 35. It turned
out 32 is not enough to support a full sized TSO packet.
While I'm here fix a long standing bug introduced in r169632 in
bce(4) where it didn't include L2 header length of TSO packet in
the maximum DMA segment size calculation.

In collaboration with:	rmacklem
MFC after:		2 weeks
2014-03-31 01:54:59 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Gleb Smirnoff
c6499eccad Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.
2012-12-04 09:32:43 +00:00
Marius Strobl
4b7ec27007 - There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
  (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
  since r52045) but even recently added device drivers do this unnecessarily.
  Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
  Discussed with: jhb
- Also while at it, use __FBSDID.
2011-11-22 21:28:20 +00:00
Pyun YongHyeon
57c81d92ae Close a race where SIOCGIFMEDIA ioctl get inconsistent link status.
Because driver is accessing a common MII structure in
mii_pollstat(), updating user supplied structure should be done
before dropping a driver lock.

Reported by:	Karim (fodillemlinkarimi <> gmail dot com)
2011-10-17 19:49:00 +00:00
Marius Strobl
3fcb7a5365 - Remove attempts to implement setting of BMCR_LOOP/MIIF_NOLOOP
(reporting IFM_LOOP based on BMCR_LOOP is left in place though as
  it might provide useful for debugging). For most mii(4) drivers it
  was unclear whether the PHYs driven by them actually support
  loopback or not. Moreover, typically loopback mode also needs to
  be activated on the MAC, which none of the Ethernet drivers using
  mii(4) implements. Given that loopback media has no real use (and
  obviously hardly had a chance to actually work) besides for driver
  development (which just loopback mode should be sufficient for
  though, i.e one doesn't necessary need support for loopback media)
  support for it is just dropped as both NetBSD and OpenBSD already
  did quite some time ago.
- Let mii_phy_add_media() also announce the support of IFM_NONE.
- Restructure the PHY entry points to use a structure of entry points
  instead of discrete function pointers, and extend this to include
  a "reset" entry point. Make sure any PHY-specific reset routine is
  always used, and provide one for lxtphy(4) which disables MII
  interrupts (as is done for a few other PHYs we have drivers for).
  This includes changing NIC drivers which previously just called the
  generic mii_phy_reset() to now actually call the PHY-specific reset
  routine, which might be crucial in some cases. While at it, the
  redundant checks in these NIC drivers for mii->mii_instance not being
  zero before calling the reset routines were removed because as soon
  as one PHY driver attaches mii->mii_instance is incremented and we
  hardly can end up in their media change callbacks etc if no PHY driver
  has attached as mii_attach() would have failed in that case and not
  attach a miibus(4) instance.
  Consequently, NIC drivers now no longer should call mii_phy_reset()
  directly, so it was removed from EXPORT_SYMS.
- Add a mii_phy_dev_attach() as a companion helper to mii_phy_dev_probe().
  The purpose of that function is to perform the common steps to attach
  a PHY driver instance and to hook it up to the miibus(4) instance and to
  optionally also handle the probing, addition and initialization of the
  supported media. So all a PHY driver without any special requirements
  has to do in its bus attach method is to call mii_phy_dev_attach()
  along with PHY-specific MIIF_* flags, a pointer to its PHY functions
  and the add_media set to one. All PHY drivers were updated to take
  advantage of mii_phy_dev_attach() as appropriate. Along with these
  changes the capability mask was added to the mii_softc structure so
  PHY drivers taking advantage of mii_phy_dev_attach() but still
  handling media on their own do not need to fiddle with the MII attach
  arguments anyway.
- Keep track of the PHY offset in the mii_softc structure. This is done
  for compatibility with NetBSD/OpenBSD.
- Keep track of the PHY's OUI, model and revision in the mii_softc
  structure. Several PHY drivers require this information also after
  attaching and previously had to wrap their own softc around mii_softc.
  NetBSD/OpenBSD also keep track of the model and revision on their
  mii_softc structure. All PHY drivers were updated to take advantage
  as appropriate.
- Convert the mebers of the MII data structure to unsigned where
  appropriate. This is partly inspired by NetBSD/OpenBSD.
- According to IEEE 802.3-2002 the bits actually have to be reversed
  when mapping an OUI to the MII ID registers. All PHY drivers and
  miidevs where changed as necessary. Actually this now again allows to
  largely share miidevs with NetBSD, which fixed this problem already
  9 years ago. Consequently miidevs was synced as far as possible.
- Add MIIF_NOMANPAUSE and mii_phy_flowstatus() calls to drivers that
  weren't explicitly converted to support flow control before. It's
  unclear whether flow control actually works with these but typically
  it should and their net behavior should be more correct with these
  changes in place than without if the MAC driver sets MIIF_DOPAUSE.

Obtained from:	NetBSD (partially)
Reviewed by:	yongari (earlier version), silence on arch@ and net@
2011-05-03 19:51:29 +00:00
Marius Strobl
d6c65d276e Converted the remainder of the NIC drivers to use the mii_attach()
introduced in r213878 instead of mii_phy_probe(). Unlike r213893 these
are only straight forward conversions though.

Reviewed by:	yongari
2010-10-15 15:00:30 +00:00
Pyun YongHyeon
96486faa6e Make sure to not use stale ip/tcp header pointers. The ip/tcp
header parser uses m_pullup(9) to get access to mbuf chain.
m_pullup(9) can allocate new mbuf chain and free old one if the
space left in the mbuf chain is not enough to hold requested
contiguous bytes. Previously drivers can use stale ip/tcp header
pointer if m_pullup(9) returned new mbuf chain.

Reported by:	Andrew Boyer (aboyer <> averesystems dot com)
MFC after:	10 days
2010-10-14 18:31:40 +00:00
Pyun YongHyeon
78b1140644 Remove enabling RX checksum offloading in RX filter setup. RX
checksum is enabled in sge_init_locked().
While I'm here do not set RX checksum bits in RX descriptor
initialization. It is controller's job to set these bits.

Tested by:	xclin <xclin <> cs dot nctu dot edu dot tw >
2010-07-08 18:22:49 +00:00
Pyun YongHyeon
9def357406 Don't blindly set IFF_DRV_OACTIVE when sge_encap() fails. If there
is no queued frame, IFF_DRV_OACTIVE would never be cleared.

Submitted by:	Nikolay Denev < ndenev <> gmail at com >
MFC after:	4 days
2010-06-04 17:11:33 +00:00
Pyun YongHyeon
2eae4d808a sge_encap() can sometimes return an error with m_head set to NULL.
Make sure not to requeue freed mbuf in sge_start_locked(). This
should fix NULL pointer dereference panic.

Reported by:	Nikolay Denev <ndenev <> gmail dot com>
Submitted by:	jhb
2010-05-24 17:12:44 +00:00
Pyun YongHyeon
8775710a1d SiS190 supports RX 10 bytes padding, CRC stripping as well as VLAN
hardware tag insertion/stripping. Remove conditional code that
disables these hardware features on SiS190. Also nuke RX fixup code
which is no more required on strict-alignment architectures because
SiS190 supports RX 10 bytes padding.
Now all hardware features except jumbo frame and WOL are supported.
Thanks to Masa Murayama who confirmed SiS190 also has the same
hardware features of SiS191.
I guess the only difference between SiS191 and SiS190 would be
jumbo frame support. It will be implemented in near future.
2010-05-10 17:35:17 +00:00
Pyun YongHyeon
65329b3111 Implement TSO and TSO over VLAN. Increase number of allowed
fragmentation of mbuf chain to 32 from 16 because TSO can send 64KB
sized packet which in turn requires long list of mbuf chain. Due to
lack of documentation, I'm not sure whether driver have to pull up
ethernet/IP/TCP header with options to make controller work but
driver have to parse TCP header to update pseudo TCP checksum
anyway. The controller expects pseudo TCP checksum computed by
upper stack and the checksum should follow the MS NDIS
specification to make TSO work.

Tested by:	xclin <xclin <> cs dot nctu dot edu dot tw >
2010-05-10 17:14:14 +00:00
Pyun YongHyeon
f648f6bb0c Free entire mbuf chain instead of the first mbuf. 2010-05-04 21:23:59 +00:00
Pyun YongHyeon
55c978ba9d Enable multi-descriptor transmisstion for fragmented mbufs. There
is no more need to defragment mbufs. After transmitting the
multi-fragmented frame, the controller updates only the first
descriptor of multi-descriptor transmission so it's driver's
responsibility to clear OWN bits of remaining descriptor of
multi-descriptor transmission. It seems the controller behaves much
like jme(4) controllers in descriptor handling.

Tested by:	xclin <xclin <> cs dot nctu dot edu dot tw >
2010-05-04 19:04:51 +00:00
Pyun YongHyeon
7135e11ef9 Remove clearing RxHashTable2 register. The register is reprogrammed
in sge_rxfilter().
2010-05-04 17:34:00 +00:00
Pyun YongHyeon
464aa6d5fb Fix wrong dma tag usage. Previously it used TX descriptor ring dma
tag which should be TX mbuf dma tag.

Reported by:	xclin <xclin <> cs dot nctu dot edu dot tw >
2010-05-03 00:56:26 +00:00
Pyun YongHyeon
c186cf13eb Enable VLAN hardware tag insertion/stripping. Due to lack of SiS190
controller, I'm not sure whether this is also applicable to SiS190
so this feature is only activated on SiS191 controller.
In theory, controller reinitialization is not needed when VLAN tag
configuration is changed, but xclin said controller was not stable
whenever toggling VLAN tag bit. To address that, sge(4)
reinitialize controller for VLAN configuration which seems to work
as expected. VLAN tag information for TX/RX descriptor and
configure bit of RxMacControl register was found by xclin.

Submitted by:	xclin <xclin <> cs dot nctu dot edu dot tw > (initial version)
Tested by:	xclin <xclin <> cs dot nctu dot edu dot tw >
2010-04-29 18:14:14 +00:00
Pyun YongHyeon
d1c5ee8030 Enable FCS stripping and padding 10 bytes bit of RX MAC control
register. Due to lack of SiS190 controller, I'm not sure whether
this is also applicable to SiS190 so this feature is only activated
on SiS191 controller.
The controller can pad 10 bytes before DMAing a received frame to
RX buffer and received bytes include the padded bytes. This padding
is very useful on strict-alignment architectures because driver
does not have to copy received frame to align IP header on 4 bytes
boundary. It also gives better RX performance on non-strict
alignment architectures. Special thanks to xclin to give me
valuable register information. Without his enthusiastic trial and
errors this wouldn't be even possible.

While I'm here tighten validity check of received frame. Controller
clears RDS_CRCOK bit when it received bad CRC frames. xclin found
that using loop back testing.

Tested by:	xclin <xclin <> cs dot nctu dot edu dot tw >
2010-04-29 18:00:42 +00:00
Pyun YongHyeon
7f20f021e1 Explicitly marks SiS190 to differentiate it from SiS191. 2010-04-29 17:34:01 +00:00
Pyun YongHyeon
60a1a1542e Remove wrong link state chage. 2010-04-29 17:30:21 +00:00
Pyun YongHyeon
9c2851d29a Preserve unknown bits of RX MAC control register when driver
programs RX filter configuration. It seems RX MAC control register
is one of key registers to get various offloading features as well
as performance. Blindly clearing unrelated bits can result in
unexpected results.

Tested by:	xclin <xclin <> cs dot nctu dot edu dot tw >
2010-04-29 17:28:07 +00:00
Pyun YongHyeon
a1a667ecf1 Intialize interrupt moderation control register. The magic value
was chosen by lots of trial and errors. The chosen value shows
good interrupt moderation without additional latency.
Without this change, controller can generate more than 140k
interrupts per second under high network load.

Submitted by:	xclin <xclin <> cs dot nctu dot edu dot tw >
2010-04-22 20:25:07 +00:00
Pyun YongHyeon
c6491946d8 Fix include path. 2010-04-15 17:24:21 +00:00
Pyun YongHyeon
d193ed0bed Add driver for Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet.
This driver was written by Alexander Pohoyda and greatly enhanced
by Nikolay Denev. I don't have these hardwares but this driver was
tested by Nikolay Denev and xclin.

Because SiS didn't release data sheet for this controller, programming
information came from Linux driver and OpenSolaris. Unlike other open
source driver for SiS190/191, sge(4) takes full advantage of TX/RX
checksum offloading and does not require additional copy operation in
RX handler.
The controller seems to have advanced offloading features like VLAN
hardware tag insertion/stripping, TCP segmentation offload(TSO) as
well as jumbo frame support but these features are not available
yet. Special thanks to xclin <xclin<> cs dot nctu dot edu dot tw>
who sent fix for receiving VLAN oversized frames.
2010-04-14 20:45:33 +00:00