Commit Graph

45 Commits

Author SHA1 Message Date
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Pyun YongHyeon
f9beb16729 Don't overwrite mapped bits.
Found by:	PVS-Studio
2017-04-14 08:11:50 +00:00
Pedro F. Giffuni
453130d9bf sys/dev: minor spelling fixes.
Most affect comments, very few have user-visible effects.
2016-05-03 03:41:25 +00:00
Pedro F. Giffuni
057b4402bf sys/dev: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:03:15 +00:00
Pedro F. Giffuni
73a1170a8c sys/dev: use our nitems() macro when it is avaliable through param.h.
No functional change, only trivial cases are done in this sweep,
Drivers that can get further enhancements will be done independently.

Discussed in:	freebsd-current
2016-04-19 23:37:24 +00:00
Pyun YongHyeon
9dda5c8ffc Fix variable assignment.
Found by:	PVS-Studio
2016-02-18 01:24:10 +00:00
Gleb Smirnoff
a9af3b705c Mechanically convert to if_inc_counter(). 2014-09-24 11:33:43 +00:00
Gleb Smirnoff
1bffa9511f Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
John Baldwin
068d8643ad Fix various NIC drivers to properly cleanup static DMA resources.
In particular, don't check the value of the bus_dma map against NULL
to determine if either bus_dmamem_alloc() or bus_dmamap_load() succeeded.
Instead, assume that bus_dmamap_load() succeeeded (and thus that
bus_dmamap_unload() should be called) if the bus address for a resource
is non-zero, and assume that bus_dmamem_alloc() succeeded (and thus
that bus_dmamem_free() should be called) if the virtual address for a
resource is not NULL.

In many cases these bugs could result in leaks when a driver was detached.

Reviewed by:	yongari
MFC after:	2 weeks
2014-06-11 14:53:58 +00:00
Pyun YongHyeon
52ee8ac027 Increase the number of TX DMA segments from 32 to 35. It turned
out 32 is not enough to support a full sized TSO packet.
While I'm here fix a long standing bug introduced in r169632 in
bce(4) where it didn't include L2 header length of TSO packet in
the maximum DMA segment size calculation.

In collaboration with:	rmacklem
MFC after:		2 weeks
2014-03-31 01:54:59 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Andre Oppermann
edd26b66ce Change local variable tso_segsz to tsosegsz to avoid mbuf.h macro conflicts.
Sponsored by:	The FreeBSD Foundation
2013-08-24 19:38:36 +00:00
Gleb Smirnoff
c6499eccad Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.
2012-12-04 09:32:43 +00:00
Gavin Atkinson
389c8bd51e Align the PCI Express #defines with the style used for the PCI-X
#defines.  This also has the advantage that it makes the names more
compact, iand also allows us to correct the non-uniform naming of
the PCIM_LINK_* defines, making them all consistent amongst themselves.

This is a mostly mechanical rename:
  s/PCIR_EXPRESS_/PCIER_/g
  s/PCIM_EXP_/PCIEM_/g
  s/PCIM_LINK_/PCIEM_LINK_/g

When this is MFC'd, #defines will be added for the old names to assist
out-of-tree drivers.

Discussed with:	jhb
MFC after:	1 week
2012-09-18 22:04:59 +00:00
Kevin Lo
144a072499 Fix a logic error when use PCIY_PMG capability
Reviewed by:	yongari
2012-06-07 02:24:27 +00:00
Marius Strobl
3fcb7a5365 - Remove attempts to implement setting of BMCR_LOOP/MIIF_NOLOOP
(reporting IFM_LOOP based on BMCR_LOOP is left in place though as
  it might provide useful for debugging). For most mii(4) drivers it
  was unclear whether the PHYs driven by them actually support
  loopback or not. Moreover, typically loopback mode also needs to
  be activated on the MAC, which none of the Ethernet drivers using
  mii(4) implements. Given that loopback media has no real use (and
  obviously hardly had a chance to actually work) besides for driver
  development (which just loopback mode should be sufficient for
  though, i.e one doesn't necessary need support for loopback media)
  support for it is just dropped as both NetBSD and OpenBSD already
  did quite some time ago.
- Let mii_phy_add_media() also announce the support of IFM_NONE.
- Restructure the PHY entry points to use a structure of entry points
  instead of discrete function pointers, and extend this to include
  a "reset" entry point. Make sure any PHY-specific reset routine is
  always used, and provide one for lxtphy(4) which disables MII
  interrupts (as is done for a few other PHYs we have drivers for).
  This includes changing NIC drivers which previously just called the
  generic mii_phy_reset() to now actually call the PHY-specific reset
  routine, which might be crucial in some cases. While at it, the
  redundant checks in these NIC drivers for mii->mii_instance not being
  zero before calling the reset routines were removed because as soon
  as one PHY driver attaches mii->mii_instance is incremented and we
  hardly can end up in their media change callbacks etc if no PHY driver
  has attached as mii_attach() would have failed in that case and not
  attach a miibus(4) instance.
  Consequently, NIC drivers now no longer should call mii_phy_reset()
  directly, so it was removed from EXPORT_SYMS.
- Add a mii_phy_dev_attach() as a companion helper to mii_phy_dev_probe().
  The purpose of that function is to perform the common steps to attach
  a PHY driver instance and to hook it up to the miibus(4) instance and to
  optionally also handle the probing, addition and initialization of the
  supported media. So all a PHY driver without any special requirements
  has to do in its bus attach method is to call mii_phy_dev_attach()
  along with PHY-specific MIIF_* flags, a pointer to its PHY functions
  and the add_media set to one. All PHY drivers were updated to take
  advantage of mii_phy_dev_attach() as appropriate. Along with these
  changes the capability mask was added to the mii_softc structure so
  PHY drivers taking advantage of mii_phy_dev_attach() but still
  handling media on their own do not need to fiddle with the MII attach
  arguments anyway.
- Keep track of the PHY offset in the mii_softc structure. This is done
  for compatibility with NetBSD/OpenBSD.
- Keep track of the PHY's OUI, model and revision in the mii_softc
  structure. Several PHY drivers require this information also after
  attaching and previously had to wrap their own softc around mii_softc.
  NetBSD/OpenBSD also keep track of the model and revision on their
  mii_softc structure. All PHY drivers were updated to take advantage
  as appropriate.
- Convert the mebers of the MII data structure to unsigned where
  appropriate. This is partly inspired by NetBSD/OpenBSD.
- According to IEEE 802.3-2002 the bits actually have to be reversed
  when mapping an OUI to the MII ID registers. All PHY drivers and
  miidevs where changed as necessary. Actually this now again allows to
  largely share miidevs with NetBSD, which fixed this problem already
  9 years ago. Consequently miidevs was synced as far as possible.
- Add MIIF_NOMANPAUSE and mii_phy_flowstatus() calls to drivers that
  weren't explicitly converted to support flow control before. It's
  unclear whether flow control actually works with these but typically
  it should and their net behavior should be more correct with these
  changes in place than without if the MAC driver sets MIIF_DOPAUSE.

Obtained from:	NetBSD (partially)
Reviewed by:	yongari (earlier version), silence on arch@ and net@
2011-05-03 19:51:29 +00:00
John Baldwin
3b0a4aef96 Do a sweep of the tree replacing calls to pci_find_extcap() with calls to
pci_find_cap() instead.
2011-03-23 13:10:15 +00:00
John Baldwin
932b56d242 - Add a locked variant of jme_start() and invoke it directly while holding
the lock instead of queueing it to a task.
- Do not invoke jme_rxintr() to reclaim any unprocessed but received
  packets when shutting down the interface.  Instead, just drop these
  packets to match the behavior of other drivers.
- Hold the driver lock in the interrupt handler to avoid races with
  ioctl requests to down the interface.

Reviewed by:	yongari
2011-01-13 14:42:43 +00:00
Pyun YongHyeon
4f1ff93a3b Add support for JMicron JMC251/JMC261 Gigabit/Fast ethernet
controller with Card Read Host Controller. These controllers are
multi-function devices and have the same ethernet core of
JMC250/JMC260. Starting from REVFM 5(chip full mask revision)
controllers have the following features.
 o eFuse support
 o PCD(Packet Completion Deferring)
 o More advanced PHY power saving

Because these controllers started to use eFuse, station address
modified by driver is permanent as if it was written to EEPROM. If
you have to change station address please save your controller
default address to safe place before reprogramming it. There is no
way to restore factory default station address.

Many thanks to JMicron for continuing to support FreeBSD.

HW donated by:	JMicron
2010-12-18 23:52:50 +00:00
Pyun YongHyeon
4d9ab34335 Use system defined PCIR_EXPRESS_DEVICE_CTL instead of using magic
number.
2010-12-18 23:26:38 +00:00
Pyun YongHyeon
cd33cef723 Make sure whether driver allocated resource before releasing it. 2010-12-18 23:24:59 +00:00
Pyun YongHyeon
f25c5972da Fix a regression introduced in r213893. FPGA version requires PHY
probing so allow PHY probing on all possible addresses.
2010-12-18 23:21:16 +00:00
Pyun YongHyeon
1a0b1eafd2 Consistently put a tab character between #define and the macro name. 2010-12-18 23:03:38 +00:00
Pyun YongHyeon
7e86a37e0d Remove unecessary and clearly wrong usage of atomic(9).
Reported by:	avg, jhb, attilio
2010-12-10 21:43:20 +00:00
Pyun YongHyeon
fc519a189b Enable ethernet flow-control on all jme(4) controllers. 2010-11-26 02:01:16 +00:00
Pyun YongHyeon
7bcbe6cb58 Allocate 1 MSI/MSI-X vector. Originally jme(4) was designed to
support multi-queue but the hardware limitation made it hard to
implement supporting multi-queue. Allocating more than necessary
vectors is resource waste and it can be added back when we
implement multi-queue support.
2010-11-26 01:58:25 +00:00
Pyun YongHyeon
cd96c3c984 Disable retrying RX descriptor loading. The counter is used to set
number of retry to be performed whenever controller found RX
descriptor was empty. RX empty interrupt is generated only when the
retry counter is over. Experimentation shows retrying RX descriptor
loading increased number of dropped frames under flow-control
enabled environments so disable it and have controller generate RX
empty interrupt as fast as it can.
While I'm here fix RXCSR_DESC_RT_CNT macro.
2010-11-26 01:48:29 +00:00
Marius Strobl
8e5d93dbb4 Convert the PHY drivers to honor the mii_flags passed down and convert
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
  addresses; we now let mii_attach() only probe the PHY at the desired
  address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
  off from, partly even based on grabbing and using the softc of the
  parent; we now pass these flags down from the NIC to the PHY drivers
  via mii_attach(). This got us rid of all such hacks except those of
  brgphy() in combination with bce(4) and bge(4), which is way beyond
  what can be expressed with simple flags.

While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).

Reviewed by:	jhb, yongari
2010-10-15 14:52:11 +00:00
Pyun YongHyeon
96486faa6e Make sure to not use stale ip/tcp header pointers. The ip/tcp
header parser uses m_pullup(9) to get access to mbuf chain.
m_pullup(9) can allocate new mbuf chain and free old one if the
space left in the mbuf chain is not enough to hold requested
contiguous bytes. Previously drivers can use stale ip/tcp header
pointer if m_pullup(9) returned new mbuf chain.

Reported by:	Andrew Boyer (aboyer <> averesystems dot com)
MFC after:	10 days
2010-10-14 18:31:40 +00:00
Pyun YongHyeon
7bd35300da Add TSO support on VLANs. jme(4) controllers do not require VLAN
hardware tagging to make TSO work over VLANs.
2010-02-22 22:05:49 +00:00
Gavin Atkinson
51d930e747 If we fail to read the Ethernet address from the card, just print an
warning message and attach without setting the Ethernet address to a
random address.  It is not believed that this code can actually be
executed, and if it does, we're better off printing an error message than
faking up an Ethernet address.

PR:		kern/133239
Reviewed by:	yongari (earlier version of patch)
Approved by:	ed (mentor)
2010-01-08 10:32:27 +00:00
Gavin Atkinson
cd3135e9ec Set the locally-assigned bit in the randomly generated Ethernet address
if we end up having to generate one.

PR:		kern/133239
Discussed with:	yongari
Approved by:	ed (mentor)
MFC after:	2 weeks
2009-12-25 19:57:28 +00:00
Pyun YongHyeon
32f8942a21 Remove unnecessary device reinitialization. 2009-09-28 19:33:52 +00:00
Robert Watson
eb956cd041 Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs.  This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.

For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.

Approved by:	re (kib)
MFC after:	6 weeks
2009-06-26 11:45:06 +00:00
Pyun YongHyeon
450ab47230 Add HW MAC counter support for newer JMC250/JMC260 revisions. 2008-12-04 02:16:53 +00:00
Pyun YongHyeon
f37739d7ab Add support for newer JMC250/JMC260 revisions.
o Chip full mask revision 2 or later controllers have to
   set correct Tx MAC and Tx offload clock depending on negotiated
   link speed.
 o JMC260 chip full mask revision 2 has a silicon bug that can't
   handle 64bit DMA addressing. Add workaround to the bug by
   limiting DMA address space to be within 32bit.
 o Valid FIFO space of receive control and status register was
   changed on chip full mask revision 2 or later controllers. For
   these controllers, use default 16QW as it's supposed to be the
   safest value for maximum PCIe compatibility. JMicron confirmed
   performance will not be reduced even if the FIFO space is set
   to 16QW.
 o When interface is put into suspend/shutdown state, remove Tx MAC
   and Tx offload clock to save more power. We don't need Tx clock
   at all in this state.
 o Added new register definition for chip full mask revision 2 or
   later controllers.

Thanks to JMicron for their continuous support of FreeBSD.
2008-12-04 01:58:40 +00:00
Pyun YongHyeon
08c23fcaae Make sure to read the last byte of EEPROM descriptor. Previously
the last byte of the ethernet address was not read which in turn
resulted in getting 5 out of the 6 bytes of ethernet address and
always returned ENOENT. I did not notice the bug on FPGA version
because of additional configuration data in EEPROM.

Pointed out by:	bouyer at NetBSD
2008-10-14 00:54:15 +00:00
Pyun YongHyeon
a8061cb7e2 Read PCI device id instead of PCI revision id. Also checks the read
device id is JMC260 family. Previously it just verified the deivce
is JMC260 Rev A0. This will make it easy for newer JMC2xx support.

Pointed out by:	bouyer at NetBSD
2008-10-13 01:11:28 +00:00
Pyun YongHyeon
cf8f254faa Add workaround for occasional packet loss issue of JMC250 A2
when it runs on half-duplex media.
While I'm here add register definition for GPREG1. ATM the GPREG1
register is only valid for JMC250 A1/A2.

Submitted by:	Ethan at JMicron
2008-09-22 06:17:21 +00:00
Pyun YongHyeon
8de8f265b6 Add workaround for CRC errors seen at 100Mbps on JMC250 A2.
While here update chip revision number of JMC250/JMC260 from the
latest datasheet.
2008-09-09 10:19:48 +00:00
Pyun YongHyeon
9b45b70125 Fix typo. 2008-09-09 10:10:03 +00:00
Pyun YongHyeon
43742818e3 Fix buffer discard index.
While I'm here dicard all buffers if errored frame is part of
multi-segmented frames.

Pointed out by:	sephe
Reviewd by:	sephe
MFC after:	3 days
2008-07-28 02:37:15 +00:00
Pyun YongHyeon
7a4e8171ba Correct 1000Mbps link handling logic for JMC250. This should make
jme(4) run on 1000Mbps link.
2008-07-18 04:20:48 +00:00
Pyun YongHyeon
a5ebadc632 Add driver support for PCIe adapters based on JMicron JMC250
gigabit ethernet and JMC260 fast ethernet controllers. ATM jme(4)
supports all hardware features except RSS and multiple Tx/Rx queue.

In these days most ethernet controller vendors take a ply of
concealing hardware detailes from open source developers. As
contrasted with these vendors JMicron provided all necessary
information needed to write a stable driver during driver writing
and answered many questions I had. They even helped fixing driver
bugs with protocol analyzer. Many thanks to JMicron for their
support of FreeBSD.

H/W donated by:	JMicron
2008-05-27 01:42:01 +00:00