Commit Graph

48 Commits

Author SHA1 Message Date
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Josh Paetzel
dced0d1821 Add ability to perform a firmware reset during driver initialization.
Required by Lancer Gen 5 hardware.

Submitted by:	Ram Kishore Vegesna <ram.vegesna@broadcom.com>
Obtained from:	Broadcom
2018-05-01 17:39:20 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Pedro F. Giffuni
7282444b10 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:36:21 +00:00
Josh Paetzel
c2625e6e38 Update oce to version 11.0.50.0
Submitted by:	Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
2016-09-22 22:51:11 +00:00
Conrad Meyer
14410265b6 Revert r306148 to fix build
Requested by:	jpaetzel
Reported by:	Larry Rosenman <ler at lerctr.org>, Jenkins
2016-09-22 00:25:23 +00:00
Josh Paetzel
764c812d84 Update oce driver to 11.0.50.0
Submitted by:	Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
2016-09-21 22:53:16 +00:00
Pedro F. Giffuni
74b8d63dcc Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
Sepherosa Ziehau
6dd38b8716 tcp/lro: Use tcp_lro_flush_all in device drivers to avoid code duplication
And factor out tcp_lro_rx_done, which deduplicates the same logic with
netinet/tcp_lro.c

Reviewed by:	gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5725
2016-04-01 06:28:33 +00:00
John Baldwin
cbc4d2db75 Remove taskqueue_enqueue_fast().
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167.  It has been a compat shim ever
since.  It's time for the compat shim to go.

Submitted by:	Howard Su <howard0su@gmail.com>
Reviewed by:	sephe
Differential Revision:	https://reviews.freebsd.org/D5131
2016-03-01 17:47:32 +00:00
Justin Hibbits
c47476d7e6 Migrate many bus_alloc_resource() calls to bus_alloc_resource_anywhere().
Most calls to bus_alloc_resource() use "anywhere" as the range, with a given
count.  Migrate these to use the new bus_alloc_resource_anywhere() API.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D5370
2016-02-27 03:38:01 +00:00
Gleb Smirnoff
8ec07310fa These files were getting sys/malloc.h and vm/uma.h with header pollution
via sys/mbuf.h
2016-02-01 17:41:21 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
John-Mark Gurney
0d6c1105bc srandom has no influence on read_random, at least not this late... 2015-02-13 19:44:04 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Hans Petter Selasky
9fd573c39d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
Gleb Smirnoff
c8dfaf382f Mechanically convert to if_inc_counter(). 2014-09-19 03:51:26 +00:00
Hans Petter Selasky
72f3100047 Revert r271504. A new patch to solve this issue will be made.
Suggested by:	adrian @
2014-09-13 20:52:01 +00:00
Hans Petter Selasky
eb93b77ae4 Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-09-13 08:26:09 +00:00
Warner Losh
c236d64745 Cast queue length because q_len isn't really an enum in the same sense
that clang wants it to be (a value that can only have values inside
the enum range), but rather an unsigned count of bytes.
2014-08-07 21:56:46 +00:00
Luigi Rizzo
d398c8634a Various bugfixes from Stefano Garzarella:
1. oce_multiq_start(): make sure the buffer is consumed even on ENXIO
2. oce_multiq_transmit(): there is an extra call to drbr_enqueue()
  causing the mbuf to be enqueued twice when the NIC's queue is full,
  and potential panics
3. oce_multiq_transmit(): same problem fixed recently in ixgbe (r267187)
   and other drivers: if the mbuf is enqueued, the proper return value is 0

Submitted by:	Stefano Garzarella
MFC after:	3 days
2014-07-02 12:13:11 +00:00
Xin LI
a4f734b4fc Apply vendor fixes for big endian support and 20GBps/25GBps link speeds.
Many thanks to Emulex for their continued support of FreeBSD!

Submitted by:	Venkata Duvvuru <VenkatKumar.Duvvuru Emulex.Com>
MFC after:	3 days
2014-06-24 20:11:22 +00:00
John Baldwin
c34f1a08c6 Fix teardown of static DMA allocations in various NIC drivers:
- Add missing calls to bus_dmamap_unload() in et(4).
- Check the bus address against 0 to decide when to call
  bus_dmamap_unload() instead of comparing the bus_dma map against NULL.
- Check the virtual address against NULL to decide when to call
  bus_dmamem_free() instead of comparing the bus_dma map against NULL.
- Don't clear bus_dma map pointers to NULL for static allocations.
  Instead, treat the value as completely opaque.
- Pass the correct virtual address to bus_dmamem_free() in wpi(4) instead
  of trying to free a pointer to the virtual address.

Reviewed by:	yongari
2014-06-17 14:47:49 +00:00
Gleb Smirnoff
b245f96c44 Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
  notion of data (packet counters, etc) are by no means MD. And it is a
  bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
  which at modern speeds overflow within a second.

  This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
  make future changes to if_data less ABI breaking. Unfortunately the
  8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with:	emax
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-13 03:42:24 +00:00
Xin LI
198f573526 Eliminate unused drbr_stats_update implementation in oce(4) driver.
Noticed by:	dim
MFC after:	2 weeks
2013-12-30 21:30:49 +00:00
Xin LI
b41206d84a Apply vendor improvements to oce(4) driver:
- Add support to 40Gbps devices;
 - Add support to control adaptive interrupt coalescing (AIC)
   via sysctl;
 - Improve support of BE3 devices;

Many thanks to Emulex for their continued support of FreeBSD.

Submitted by:	Venkata Duvvuru <VenkatKumar.Duvvuru Emulex Com>
MFC after:	3 days
2013-12-04 20:24:18 +00:00
Xin LI
d8f7bfb8bd The previous code makes a memory allocation in size of
struct mbx_common_read_write_flashrom plus 32KB and caps the actual
transfer size at 32KB.  This is harmless as it is but may confuse
static code analyzer, so allocate a full 32KB instead.

Reported by:	Coverity via mjacob
Submitted by:	Venkata Duvvuru <VenkatKumar.Duvvuru Emulex Com>
Coverity CID:	1125820
2013-11-14 18:53:17 +00:00
Gleb Smirnoff
c3322cb91c Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Xin LI
5fbb683079 Update driver to version 10.0.664.0.
Many thanks to Emulex for their continued support of FreeBSD.

Submitted by:	Venkata Duvvuru <VenkatKumar.Duvvuru Emulex Com>
MFC after:	3 day
2013-10-23 18:58:38 +00:00
Xin LI
291a1934fa Update driver with recent vendor improvements, most notably support
of Skyhawk adapters.

Many thanks to Emulex for their continued support of FreeBSD.

Submitted by:	"Duvvuru,Venkat Kumar" <VenkatKumar.Duvvuru Emulex.Com>
MFC after:	1 day
2013-07-06 08:30:45 +00:00
Xin LI
1f14e0cb94 Eliminate excessive $FreeBSD$ headers.
Noticed by:	jmallett
2013-03-08 18:08:12 +00:00
Xin LI
cdaba8920e Update driver to version 4.6.95.0.
Submitted by:	"Duvvuru,Venkat Kumar" <VenkatKumar.Duvvuru Emulex.Com>
MFC after:	3 days
2013-03-06 09:53:38 +00:00
Josh Paetzel
beb0f7e7d4 Resolve issue that caused WITNESS to report LORs.
PR:	kern/171838
Submitted by:	Venkat Duvvuru <venkatduvvuru.ml@gmail.com>
MFC after:	2 weeks
2013-02-14 17:34:17 +00:00
Randall Stewart
ded5ea6a25 This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will  need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by:	jhb@freebsd.org, jlv@freebsd.org
2013-02-07 15:20:54 +00:00
Sofian Brabez
61bfd86762 Use DEVMETHOD_END macro defined in sys/bus.h instead of {0, 0} sentinel on device_method_t arrays
Reviewed by:	cognet
Approved by:	cognet
2013-01-30 18:01:20 +00:00
Gleb Smirnoff
c6499eccad Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.
2012-12-04 09:32:43 +00:00
Eitan Adler
db702c59cf remove duplicate semicolons where possible.
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:00:37 +00:00
John Baldwin
6d9190b49c Use if_initbaudrate(). 2012-10-18 15:14:13 +00:00
Gleb Smirnoff
063efed28c The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
  ALTQ queueing or buf_ring(9) queueing.
  - It doesn't touch the buf_ring stats any more.
  - It doesn't touch ifnet stats anymore.
  - drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
  - It handles br_drops itself.
  - br_prod_bytes stats are dropped. Rationale: no one ever
    reads them but update of a common counter on every packet
    negatively affects performance due to excessive cache
    invalidation.
  - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
    we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and  ixv(4) no longer
  use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
  drbr_stats_update(), thus it was the only driver accurate under
  moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
  - drop software stats updating.
  - take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
  from hardware.

Reviewed by:	jfv, gnn
2012-09-28 18:28:27 +00:00
John Baldwin
cda6c6abf9 Use pci_find_cap() instead of pci_find_extcap() to locate PCI
find capabilities as the latter API is deprecated for this purpose.

MFC after:	2 weeks
2012-03-03 18:03:50 +00:00
Luigi Rizzo
9bd3250a02 Patches from Naresh Raju Gottumukkala
- Feature: UMC - Universal Multi Channel support
- Bugfix: BE3 Firmware Flashing bug.
- Code improvements:
  - Removed duplicate switch cases in the oce_ioctl routine.
  - Made changes to mcc_async notifications routine(oce_mq_handler)

MFC after:	1 week
2012-02-17 13:55:17 +00:00
John Baldwin
0187124bee Use if_maddr_*lock() routines to lock the per-interface multicast
address list rather than manipulating the lock directly.
2012-02-13 19:35:35 +00:00
Bjoern A. Zeeb
ad5129589c Start to try to hide LRO (and some TSO) bits behind #ifdefs as especially
the symbols are not there when compiling a kernel without IP support and
we do have users doing so.
2012-02-11 08:33:52 +00:00
Bjoern A. Zeeb
84c2fd2f94 Use the more common macro to set the if_baudrate to 10Gbit/s. Just use
UL not ULL, which should make 32bit archs more happy.
2012-02-11 07:47:06 +00:00
Bjoern A. Zeeb
c3ed30eb19 Make use of the read-only variant of the IF_ADDR_*LOCK() macros introduced
in r229614 rather than the compat one.
2012-02-11 07:43:33 +00:00
Luigi Rizzo
2f345d8ed5 Add a driver for Emulex OneConnect ethernet cards (10 Gbit PCIe)
A manpage will come in a future commit.

Submitted by:   Naresh Raju Gottumukkala (emulex)
2012-02-10 21:03:04 +00:00