Commit Graph

53 Commits

Author SHA1 Message Date
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Pedro F. Giffuni
7282444b10 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:36:21 +00:00
Pedro F. Giffuni
453130d9bf sys/dev: minor spelling fixes.
Most affect comments, very few have user-visible effects.
2016-05-03 03:41:25 +00:00
Gleb Smirnoff
c13dc68743 - Provide igb_get_counter() to return counters that are not collected,
but taken from hardware.
- Mechanically convert to if_inc_counter() the rest of counters.
2014-09-24 11:23:55 +00:00
John Baldwin
c34f1a08c6 Fix teardown of static DMA allocations in various NIC drivers:
- Add missing calls to bus_dmamap_unload() in et(4).
- Check the bus address against 0 to decide when to call
  bus_dmamap_unload() instead of comparing the bus_dma map against NULL.
- Check the virtual address against NULL to decide when to call
  bus_dmamem_free() instead of comparing the bus_dma map against NULL.
- Don't clear bus_dma map pointers to NULL for static allocations.
  Instead, treat the value as completely opaque.
- Pass the correct virtual address to bus_dmamem_free() in wpi(4) instead
  of trying to free a pointer to the virtual address.

Reviewed by:	yongari
2014-06-17 14:47:49 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Gleb Smirnoff
c6499eccad Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.
2012-12-04 09:32:43 +00:00
Gavin Atkinson
389c8bd51e Align the PCI Express #defines with the style used for the PCI-X
#defines.  This also has the advantage that it makes the names more
compact, iand also allows us to correct the non-uniform naming of
the PCIM_LINK_* defines, making them all consistent amongst themselves.

This is a mostly mechanical rename:
  s/PCIR_EXPRESS_/PCIER_/g
  s/PCIM_EXP_/PCIEM_/g
  s/PCIM_LINK_/PCIEM_LINK_/g

When this is MFC'd, #defines will be added for the old names to assist
out-of-tree drivers.

Discussed with:	jhb
MFC after:	1 week
2012-09-18 22:04:59 +00:00
Pyun YongHyeon
0b69904450 style. No functional changes. 2012-01-10 20:52:02 +00:00
Pyun YongHyeon
5effa1598a FreeBSD driver does not require arpcom structure in softc. 2011-12-09 23:37:55 +00:00
Pyun YongHyeon
5d384a0de9 Announce flow control ability to PHY driver and enable RX flow
control.  Controller does not automatically generate pause frames
based on number of available RX buffers so it's very hard to
know when driver should generate XON frame in time.  The only
mechanism driver can detect low number of RX buffer condition is
ET_INTR_RXRING0_LOW or ET_INTR_RXRING1_LOW interrupt.  This
interrupt is generated whenever controller notices the number of
available RX buffers are lower than pre-programmed value(
ET_RX_RING0_MINCNT and ET_RX_RING1_MINCNT register).  This scheme
does not provide a way to detect when controller sees enough number
of RX buffers again such that efficient generation of XON/XOFF
frame is not easy.

While here, add more flow control related register definition.
2011-12-09 19:10:38 +00:00
Pyun YongHyeon
39bea5ddf3 Remove unnecessary definition of ET_PCIR_BAR. Controller support
I/O memory only.
While here, use pci_set_max_read_req(9) rather than directly
manipulating PCIe device control register.
2011-12-09 18:34:45 +00:00
Pyun YongHyeon
fa1483dd2f Do not disable interrupt without knowing whether the raised
interrupt is ours.  Note, interrupts are automatically ACKed when
the status register is read.
Add RX/TX DMA error to interrupt handler and do full controller
reset if driver happen to encounter these errors.  There is no way
to recover from these DMA errors without controller reset.
Rename local variable name intrs with status to enhance
readability.

While I'm here, rename ET_INTR_TXEOF and ET_INTR_RXEOF to
ET_INTR_TXDMA and ET_INTR_RXDMA respectively.  These interrupts
indicate that a frame is successfully DMAed to controller's
internal FIFO and they have nothing to do with EOF(end of frame).
Driver does not need to wait actual end of TX/RX of a frame(e.g.
no need to wait the end signal of TX which is generated when a
frame in TX FIFO is emptied by MAC).  Previous names were somewhat
confusing.
2011-12-09 18:17:02 +00:00
Pyun YongHyeon
38953bb0a5 Disable all clocks and put PHY into COMA before entering into
suspend state.  This will save more power.
On resume, make sure to enable all clocks.  While I'm here, if
controller is not fast ethernet, enable gigabit PHY.
2011-12-07 23:20:14 +00:00
Pyun YongHyeon
6f61c82844 Consistently use a tab character instead of using either a space or
tab after #define.
While I'm here consistently use capital letters when it uses
hexadecimal notation.

No functional changes.
2011-12-07 22:04:57 +00:00
Pyun YongHyeon
8e5ad9907b Protect SIOCSIFMTU ioctl handler with driver lock.
Don't blindly re-initialize controller whenever MTU is changed.
Now, reinitializing is done only when driver is running.

While here, remove unnecessary assignment of error value since it
was already initialized to 0.
2011-12-07 21:54:44 +00:00
Pyun YongHyeon
e0b5ac0220 Implement hardware MAC statistics counter. Counters could be
queried with dev.et.%d.stats sysctl node where %d is an instance of
device.
2011-12-07 21:46:09 +00:00
Pyun YongHyeon
1f009e2f39 Rework link state tracking and TX/RX MAC configuration.
o Do not report link status if driver is not running.
 o TX/RX MAC configuration should be done with resolved speed,
   duplex and flow control after establishing a link so it can't
   be done in driver initialization routine.
   Move the configuration to miibus_statchg callback which will be
   called whenever any link state change is detected.
   At this moment, flow-control is not enabled yet mainly because
   I was not able to set correct flow control parameters to
   generate TX pause frames.
 o Now TX/RX MAC is enabled only when a valid link is detected.
   Rearragnge hardware initialization routine a bit to leave
   enabling MAC to miibus_statchg callback.  In order to that,
   TX/RX DMA engine is enabled in et_init_locked().
 o Introduce ET_FLAG_LINK flag to track current link state.
 o Introduce ET_FLAG_FASTETHER flag to mark whether controller is
   fast ethernet.  This flag is checked in miibus_statchg callback
   to know whether PHY established a valid link.
 o In et_stop(), TX/RX MAC is explicitly disabled instead of
   relying on et_reset().  And move et_reset() from et_stop() to
   controller initialization.  Controler reset is not required here
   and it would also clear critial registers(i.e station address,
   RX filter configuration, WOL etc) that are required to make WOL
   work.
 o Switching to current media is done in et_init_locked() after
   setting IFF_DRV_RUNNING flag.  This should ensure reliable
   auto-negotiation/manual link establishment.
 o In et_start_locked(), check whether driver got a valid link
   before trying to send frames.
 o Remove checking a link in et_tick() as this is done by
   miibus_statchg callback.
2011-12-07 21:29:51 +00:00
Pyun YongHyeon
6537ffa6a9 Remove et_enable_intrs(), et_disable_intrs() functions and
manipulation of interrupt register access is done through
CSR_WRITE_4 macro.  Also add disabling interrupt into et_reset()
because we want interrupt disabled state after controller reset.
While I'm here slightly change interrupt handler to be more
readable one.
2011-12-07 19:43:04 +00:00
Pyun YongHyeon
244fd28bde Controller does not require TX start command for every frame. So
send a single TX command after setting up all TX frames.  This
removes unnecessary register accesses and bus_dmamap_sync(9) calls.
et(4) uses TX interrupt moderation so it's possible to have TX
buffers that were already transmitted but waiting for TX completion
interrupt.  If the number of available TX descriptor is less then
1/3 of total TX descriptor, try reclaiming first to get enough free
TX descriptors before setting up TX descriptors.
After r228325, et_txeof() no longer tries to send frames after
reclaiming TX buffers.  That change was made to give more chance
to transmit frames in main interrupt handler since we can still
send frames in interrupt handler with RX interrupt.  So right
before exiting interrupt hander, after enabling interrupt, try to
send more frames.  This gives slightly better performance numbers.

While I'm here reduce number of spare TX descriptors from 8 to 4.
Controller does not require reserved TX descriptors, it was just to
reduce TX overhead.  After r228325, driver has much lower TX
overhead so it does not make sense to reserve 8 TX descriptors.
2011-12-07 19:08:54 +00:00
Pyun YongHyeon
05884511b0 Overhaul bus_dma(9) usage in et(4) and clean up TX/RX path. This
change should make et(4) work on any architectures.
 o Remove m_getl inline function and replace it with stanard mbuf
   interfaces.  Previous code tried to minimize code duplication
   but this came from incorrect use of common DMA tag.
   Driver may be still use a common RX allocation handler with
   additional structure changes but I don't see much point to do
   that it would make it hard to understand the code.
 o Remove DragonflyBSD specific constant EVL_ENCAPLEN, use
   ETHER_VLAN_ENCAP_LEN instead.
 o Add bunch of new RX status definition.  It seems controller
   supports RX checksum offloading but I was not able to make the
   feature work yet.  Currently driver checks whether recevied
   frame is good one or not.
 o Avoid a typedef ending in '_t' as style(9) says.
 o Controller has no restriction on DMA address space, so there
   is no reason to limit the DMA address to 32bit.  Descriptor
   rings,  status blocks and TX/RX buffers now use full 64bit DMA
   addressing.
 o Allocate DMA memory shared between host and controller as
   coherent.
 o Create 3 separate DMA tags to be used as TX, mini RX ring and
   stanard RX ring.  Previously it created a single DMA tag and it
   was used to all three rings.
 o et(4) does not support jumbo frame at this moment and I still
   don't quite understand how jumbo frame works on this controller
   so use two RX rings to handle small sized frame and normal sized
   frame respectively.  The mini RX ring will be used to receive
   frames that are less than or equal to 127 bytes.  The second RX
   ring is used to receive frames that are not handled by the first
   RX ring.
   If jumbo frame support is implemented, driver may have to choose
   better RX scheme by letting the second RX ring handle jumbo
   frames.  This scheme will mimic Broadcom's efficient jumbo frame
   handling feature.  However RAM buffer size(16KB) of the
   controller is too small to hold 2 jumbo frames, if 9KB
   jumbo frame is used, I'm not sure how good performance would it
   have.
 o In et_rxeof(), make sure to check whether controller received
   good frame or not.  Passing corrupted frame to upper layer is
   bad idea.
 o If driver receives a bad frame or driver fails to allocate RX
   buffer due to resource shortage condition, reuse previously
   loaded DMA map for RX buffer instead of unloading/loading RX
   buffer again.
 o et_init_tx_ring() never fails so change return type to void.
 o In watchdog handler, show TX DMA write back status of errored
   frame which could be used as a clue to debug watchdog timeout.
 o Add missing bus_dmamap_sync() in various places such that et(4)
   should work with bounce buffers(e.g. PAE).
 o TX side bus_dmamap_load_mbuf_sg(9) support.
 o RX side bus_dmamap_load_mbuf_sg(9) support.
 o Controller has no DMA alignment limit in RX buffer so use
   m_adj(9) in RX buffer allocation to make IP header align on 2
   bytes boundary.  Otherwise it would trigger unaligned access
   error in upper layer on strict alignment architectures.
   One of down side of controller is it provides limited set of RX
   buffer length like most Intel controllers.  This is not problem
   at this moment because driver does not support jumbo frame yet
   but it may require alignment fixup code to support jumbo frame
   on strict alignment architectures.
 o In et_txeof(), don't zero TX descriptors for transmitted frames.
   TX descriptors don't need write access after transmission.
   Driver sets IFF_DRV_OACTIVE when the number of available TX
   descriptors are less than or equal to ET_NSEG_SPARE.  Make sure
   to clear IFF_DRV_OACTIVE only when the number of available TX
   descriptor is greater than ET_NSEG_SPARE.
2011-12-07 18:17:09 +00:00
Pyun YongHyeon
a64788d1bc Make et_probe() return BUS_PROBE_DEFAULT such that allow other
driver that has high precedence for the controller override et(4).
Add missing callout_drain(9) in device detach and rework detach
routine.  While I'm here use rman_get_rid(9) instead of using
cached resource id because bus methods are free to change the
id.
2011-12-06 00:58:42 +00:00
Pyun YongHyeon
d2f7028c11 et(4) supports VLAN oversized frame so correctly set header length.
While I'm here remove initializing if_mtu, it is set by
ether_ifattach(9).  Also move callout_init_mtx(9) to the right below
driver lock initialization.
2011-12-06 00:18:37 +00:00
Pyun YongHyeon
c8b727ce77 Fix alt(4) support. Also add check for number of available TX
descriptors before trying to send frames.  If we're not able to
send a frame, make sure to prepend it to if_snd queue such that
alt(4) should work.

While I'm here prefer ETHER_BPF_MTAP to BPF_MTAP.  ETHER_BPF_MTAP
should be used for controllers that support VLAN hardware tag
insertion.  The controller supports VLAN tag insertion but lacks
VLAN tag stripping in RX path though.
2011-12-05 22:55:52 +00:00
Pyun YongHyeon
0442028aaf Implement suspend/resume methods. Driver has no issue with
suspend/resume.
2011-12-05 22:22:39 +00:00
Pyun YongHyeon
7ac1823956 Remove NetBSD license. r199548 removed all bit macros that were
derived from NetBSD.
2011-12-05 22:09:07 +00:00
Marius Strobl
4b7ec27007 - There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
  (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
  since r52045) but even recently added device drivers do this unnecessarily.
  Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
  Discussed with: jhb
- Also while at it, use __FBSDID.
2011-11-22 21:28:20 +00:00
Pyun YongHyeon
965706387b Make sure to report media change status to caller. Previously it
always reported success.
2011-10-17 20:03:38 +00:00
Pyun YongHyeon
0ae9f6a9f3 Add missing driver lock in media status handler. 2011-10-17 19:58:34 +00:00
Kevin Lo
ee6eac62f7 Remove duplicate header includes 2011-06-28 08:36:48 +00:00
Marius Strobl
3fcb7a5365 - Remove attempts to implement setting of BMCR_LOOP/MIIF_NOLOOP
(reporting IFM_LOOP based on BMCR_LOOP is left in place though as
  it might provide useful for debugging). For most mii(4) drivers it
  was unclear whether the PHYs driven by them actually support
  loopback or not. Moreover, typically loopback mode also needs to
  be activated on the MAC, which none of the Ethernet drivers using
  mii(4) implements. Given that loopback media has no real use (and
  obviously hardly had a chance to actually work) besides for driver
  development (which just loopback mode should be sufficient for
  though, i.e one doesn't necessary need support for loopback media)
  support for it is just dropped as both NetBSD and OpenBSD already
  did quite some time ago.
- Let mii_phy_add_media() also announce the support of IFM_NONE.
- Restructure the PHY entry points to use a structure of entry points
  instead of discrete function pointers, and extend this to include
  a "reset" entry point. Make sure any PHY-specific reset routine is
  always used, and provide one for lxtphy(4) which disables MII
  interrupts (as is done for a few other PHYs we have drivers for).
  This includes changing NIC drivers which previously just called the
  generic mii_phy_reset() to now actually call the PHY-specific reset
  routine, which might be crucial in some cases. While at it, the
  redundant checks in these NIC drivers for mii->mii_instance not being
  zero before calling the reset routines were removed because as soon
  as one PHY driver attaches mii->mii_instance is incremented and we
  hardly can end up in their media change callbacks etc if no PHY driver
  has attached as mii_attach() would have failed in that case and not
  attach a miibus(4) instance.
  Consequently, NIC drivers now no longer should call mii_phy_reset()
  directly, so it was removed from EXPORT_SYMS.
- Add a mii_phy_dev_attach() as a companion helper to mii_phy_dev_probe().
  The purpose of that function is to perform the common steps to attach
  a PHY driver instance and to hook it up to the miibus(4) instance and to
  optionally also handle the probing, addition and initialization of the
  supported media. So all a PHY driver without any special requirements
  has to do in its bus attach method is to call mii_phy_dev_attach()
  along with PHY-specific MIIF_* flags, a pointer to its PHY functions
  and the add_media set to one. All PHY drivers were updated to take
  advantage of mii_phy_dev_attach() as appropriate. Along with these
  changes the capability mask was added to the mii_softc structure so
  PHY drivers taking advantage of mii_phy_dev_attach() but still
  handling media on their own do not need to fiddle with the MII attach
  arguments anyway.
- Keep track of the PHY offset in the mii_softc structure. This is done
  for compatibility with NetBSD/OpenBSD.
- Keep track of the PHY's OUI, model and revision in the mii_softc
  structure. Several PHY drivers require this information also after
  attaching and previously had to wrap their own softc around mii_softc.
  NetBSD/OpenBSD also keep track of the model and revision on their
  mii_softc structure. All PHY drivers were updated to take advantage
  as appropriate.
- Convert the mebers of the MII data structure to unsigned where
  appropriate. This is partly inspired by NetBSD/OpenBSD.
- According to IEEE 802.3-2002 the bits actually have to be reversed
  when mapping an OUI to the MII ID registers. All PHY drivers and
  miidevs where changed as necessary. Actually this now again allows to
  largely share miidevs with NetBSD, which fixed this problem already
  9 years ago. Consequently miidevs was synced as far as possible.
- Add MIIF_NOMANPAUSE and mii_phy_flowstatus() calls to drivers that
  weren't explicitly converted to support flow control before. It's
  unclear whether flow control actually works with these but typically
  it should and their net behavior should be more correct with these
  changes in place than without if the MAC driver sets MIIF_DOPAUSE.

Obtained from:	NetBSD (partially)
Reviewed by:	yongari (earlier version), silence on arch@ and net@
2011-05-03 19:51:29 +00:00
John Baldwin
3b0a4aef96 Do a sweep of the tree replacing calls to pci_find_extcap() with calls to
pci_find_cap() instead.
2011-03-23 13:10:15 +00:00
Marius Strobl
d6c65d276e Converted the remainder of the NIC drivers to use the mii_attach()
introduced in r213878 instead of mii_phy_probe(). Unlike r213893 these
are only straight forward conversions though.

Reviewed by:	yongari
2010-10-15 15:00:30 +00:00
Pyun YongHyeon
744ec7f282 Make sure to clear IFF_DRV_RUNNING to reinitialize controller.
While I'm here update if_oerrors counter when driver encounters
watchdog timeout.
2010-09-21 17:31:14 +00:00
Xin LI
e5fdd9de2c Change copyright holder to author. We prefer using a real legal
entity for copyright holders.

Approved by:	sephe
MFC after:	3 days
2010-07-30 17:51:22 +00:00
Pyun YongHyeon
ed848e3a02 Only Tx checksum offloading is supported now. Remove experimental
code sneaked in r199611.
2009-11-20 20:43:16 +00:00
Pyun YongHyeon
fe42b04d70 Add __FBSDID. 2009-11-20 20:40:34 +00:00
Pyun YongHyeon
9955274c64 Add IPv4/TCP/UDP Tx checksum offloading support. It seems the
controller also has support for IP/TCP checksum offloading for Rx
path. But I failed to find to way to enable Rx MAC to compute the
checksum of received frames.
2009-11-20 20:33:59 +00:00
Pyun YongHyeon
abd12fc6d6 Because we know received bytes including CRC there is no reason to
call m_adj(9). The controller also seems to have a capability to
strip CRC bytes but I failed to activate this feature except for
loopback traffic.
2009-11-20 20:25:21 +00:00
Pyun YongHyeon
26e07b50c6 Add initial endianness support. It seems the controller supports
both big-endian and little-endian format in descriptors for Rx path
but I couldn't find equivalent feature in Tx path. So just stick to
little-endian for now.
2009-11-20 20:18:53 +00:00
Pyun YongHyeon
bbd5260f13 Remove unnecessary structure packing. 2009-11-20 20:12:37 +00:00
Pyun YongHyeon
accb4fcd92 Fix copy & paste error and remove extra space before colon.
Pointed out by: danfe
2009-11-19 22:59:52 +00:00
Pyun YongHyeon
8b3c64969b Use capability pointer to access PCIe registers rather than
directly access them at fixed address. Frequently the register
offset could be changed if additional PCI capabilities are added to
controller.
One odd thing is ET_PCIR_L0S_L1_LATENCY register. I think it's PCIe
link capabilities register but the location of the register does
not match with PCIe capability pointer + offset. I'm not sure it's
shadow register of PCIe link capabilities register.
2009-11-19 22:53:41 +00:00
Pyun YongHyeon
d95121b427 Use bus_{read,write}_4 rather than bus_space_{read,write}_4. 2009-11-19 22:06:19 +00:00
Pyun YongHyeon
398f1b6501 style(9) 2009-11-19 21:53:21 +00:00
Pyun YongHyeon
73e338b028 Remove extra spce at the EOL. 2009-11-19 21:46:58 +00:00
Pyun YongHyeon
cc3c3b4ec4 Add MSI support. 2009-11-19 21:45:06 +00:00
Pyun YongHyeon
5b8f4900f3 Destroy driver mutex in device detach. 2009-11-19 21:39:43 +00:00
Pyun YongHyeon
696e249300 Remove support code for FreeBSD 6.x versions. 2009-11-19 21:08:33 +00:00
Pyun YongHyeon
23263665d5 Remove complex macros that were used to compute bits values.
Although these macros may have its own strength, its complex
definition make hard to read the code.

Approved by:	delphij
2009-11-19 20:57:35 +00:00