35294 Commits

Author SHA1 Message Date
shurd
99c641b97c Roll up iflib commits from github. This pulls in most of the work done
by Matt Macy as well as other changes which he has accepted via pull
request to his github repo at https://github.com/mattmacy/networking/

This should bring -CURRENT and the github repo into close enough sync to
allow small feature branches rather than a large chain of interdependant
patches being developed out of tree.  The reset of the synchronization
should be able to be completed on github by splitting the remaining
changes that are not yet ready into short feature branches for later
review as smaller commits.

Here is a summary of changes included in this patch:

1)  More checks when INVARIANTS are enabled for eariler problem
    detection
2)  Group Task Queue cleanups
    - Fix use of duplicate shortdesc for gtaskqueue malloc type.
      Some interfaces such as memguard(9) use the short description to
      identify malloc types, so duplicates should be avoided.
3)  Allow gtaskqueues to use ithreads in addition to taskqueues
    - In some cases, this can improve performance
4)  Better logging when taskqgroup_attach*() fails to set interrupt
    affinity.
5)  Do not start gtaskqueues until they're needed
6)  Have mp_ring enqueue function enter the ABDICATED rather than BUSY
    state.  This moves the TX to the gtaskq and allows processing to
    continue faster as well as make TX batching more likely.
7)  Add an ift_txd_errata function to struct if_txrx.  This allows
    drivers to inspect/modify mbufs before transmission.
8)  Add a new IFLIB_NEED_ZERO_CSUM for drivers to indicate they need
    checksums zeroed for checksum offload to work.  This avoids modifying
    packet data in the TX path when possible.
9)  Use ithreads for iflib I/O instead of taskqueues
10) Clean up ioctl and support async ioctl functions
11) Prefetch two cachlines from each mbuf instead of one up to 128B.  We
    often need to parse packet header info beyond 64B.
12) Fix potential memory corruption due to fence post error in
    bit_nclear() usage.
13) Improved hang detection and handling
14) If the packet is smaller than MTU, disable the TSO flags.
    This avoids extra packet parsing when not needed.
15) Move TCP header parsing inside the IS_TSO?() test.
    This avoids extra packet parsing when not needed.
16) Pass chains of mbufs that are not consumed by lro to if_input()
    rather call if_input() for each mbuf.
17) Re-arrange packet header loads to get as much work as possible done
    before a cache stall.
18) Lock the context when calling IFDI_ATTACH_PRE()/IFDI_ATTACH_POST()/
    IFDI_DETACH();
19) Attempt to distribute RX/TX tasks across cores more sensibly,
    especially when RX and TX share an interrupt.  RX will attempt to
    take the first threads on a core, and TX will attempt to take
    successive threads.
20) Allow iflib_softirq_alloc_generic() to request affinity to the same
    cpus an interrupt has affinity with.  This allows TX queues to
    ensure they are serviced by the socket the device is on.
21) Add new iflib sysctls to net.iflib:
    - timer_int - interval at which to run per-queue timers in ticks
    - force_busdma
22) Add new per-device iflib sysctls to dev.X.Y.iflib
    - rx_budget allows tuning the batch size on the RX path
    - watchdog_events Count of watchdog events seen since load
23) Fix error where netmap_rxq_init() could get called before
    IFDI_INIT()
24) e1000: Fixed version of r323008: post-cold sleep instead of DELAY
    when waiting for firmware
    - After interrupts are enabled, convert all waits to sleeps
    - Eliminates e1000 software/firmware synchronization busy waits after
      startup
25) e1000: Remove special case for budget=1 in em_txrx.c
    - Premature optimization which may actually be incorrect with
      multi-segment packets
26) e1000: Split out TX interrupt rather than share an interrupt for
    RX and TX.
    - Allows better performance by keeping RX and TX paths separate
27) e1000: Separate igb from em code where suitable
    Much easier to understand separate functions and "if (is_igb)" than
    previous tests like "if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))"

#blamebruno

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12235
2017-09-13 01:18:42 +00:00
sbruno
5543e587c7 The diff is the initial submission of Cavium Liquidio 2350/2360 10/25G
Intelligent NIC driver.

The submission conconsists of firmware binary file and driver sources.

Submitted by:	pkanneganti@cavium.com (Prasad V Kanneganti)
Relnotes:	Yes
Sponsored by:	Cavium Networks
Differential Revision:	https://reviews.freebsd.org/D11927
2017-09-12 23:36:58 +00:00
ian
5623dee899 Add a default implementation that returns ENODEV for start, repeat_start,
stop, read, and write methods.  Some controllers don't implement these
individual operations and have only a transfer method.  In that case, we
should return an indication that the device is present but doesn't support
the method, as opposed to the kobj default error ENXIO which makes it
look like the whole device is missing.  Userland tools such as i2c(8) can
use the differing return values to switch between the two different i2c
IO mechanisms.
2017-09-11 23:47:49 +00:00
mw
355f88634d Improve HW type checking in mv_ehci driver
This patch adds hwtype parameter which keeps information about hardware
revision of Marvell EHCI controller. It allows to replace multiple
calls to ofw_bus_is_compatible with comparing hwtype value during driver
initialization.

Submitted by: Patryk Duda <pdk@semihalf.com>
Suggested by: ian
Obtained from: Semihalf
Sponsored by: Semihalf
2017-09-11 10:41:42 +00:00
scottl
e90eee487b Add infrastructure for allocating multiple MSI-X interrupts. Also
add more fine-tuned controls for allocating requests and replies.

Sponsored by:	Netflix
2017-09-11 01:51:27 +00:00
ian
39e8e58c20 Add gpio methods to read/write/configure up to 32 pins simultaneously.
Sometimes it is necessary to combine several gpio pins into an ad-hoc bus
and manipulate the pins as a group. In such cases manipulating the pins
individualy is not an option, because the value on the "bus" assumes
potentially-invalid intermediate values as each pin is changed in turn. Note
that the "bus" may be something as simple as a bi-color LED where changing
colors requires changing both gpio pins at once, or something as complex as
a bitbanged multiplexed address/data bus connected to a microcontroller.

In addition to the absolute requirement of simultaneously changing the
output values of driven pins, a desirable feature of these new methods is to
provide a higher-performance mechanism for reading and writing multiple
pins, especially from userland where pin-at-a-time access incurs a noticible
syscall time penalty.

These new interfaces are NOT intended to abstract away all the ugly details
of how gpio is implemented on any given platform. In fact, to use these
properly you absolutely must know something about how the gpio hardware is
organized. Typically there are "banks" of gpio pins controlled by registers
which group several pins together. A bank may be as small as 2 pins or as
big as "all the pins on the device, hundreds of them." In the latter case, a
driver might support this interface by allowing access to any 32 adjacent
pins within the overall collection. Or, more likely, any 32 adjacent pins
starting at any multiple of 32. Whatever the hardware restrictions may be,
you would need to understand them to use this interface.

In additional to defining the interfaces, two example implementations are
included here, for imx5/6, and allwinner. These represent the two primary
types of gpio hardware drivers. imx6 has multiple gpio devices, each
implementing a single bank of 32 pins. Allwinner implements a single large
gpio number space from 1-n pins, and the driver internally translates that
linear number space to a bank+pin scheme based on how the pins are grouped
into control registers. The allwinner implementation imposes the restriction
that the first_pin argument to the new functions must always be pin 0 of a
bank.

Differential Revision:	https://reviews.freebsd.org/D11810
2017-09-10 18:08:25 +00:00
kib
452c04519c Fix typo, TC0->TCO.
Submitted by:	jhb
MFC after:	1 week
2017-09-10 13:21:54 +00:00
kib
861f0ba4d4 Add definitions of (new) bits for TCO registers from the
Lewisburg/Sunrise Point documentation.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-10 12:10:27 +00:00
kib
f1910b8d27 Style: tab after #define.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-10 11:57:02 +00:00
scottl
ce44045fac Fix intrhook release in MPR and MPS for EARLY_AP_STARTUP.
Reported by:	Limelight
Sponsored by:	Netflix
2017-09-10 07:10:40 +00:00
scottl
a2aed52cc9 More code refactoring in preparation for enabling multiqueue.
Sponsored by:	Netflix
2017-09-10 04:09:18 +00:00
scottl
8fef16b01d Convert some in-line printing of diagnostic into tables.
Sponsored by:	Netflix
2017-09-09 22:02:36 +00:00
scottl
3145ad5e18 Remove the unnecessary use of a temporary string buffer.
Sponsored by:	Netflix
2017-09-09 18:39:55 +00:00
scottl
e0befa0341 Start separating the LSI drivers into per-queue structures. No
functional change.

Sponsored by:	Netflix
2017-09-09 18:03:40 +00:00
mw
c8ddef874b Add support for Armada 3700 in the NETA driver
This patch enables using NETA driver on Marvell Armada 3700 SoC
by introducing new compatible string, modifying clock source
obtaining and also excluding unnecessary parts.
The driver is added as a build option for arm64 platforms as well.

Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Semihalf
Differential Revision: https://reviews.freebsd.org/D12258
2017-09-09 11:54:04 +00:00
mw
da6ff7cde5 Store virtual address of buffer in mvneta_rx_ring
Now the virtual address of received buffer is taken from a software ring.
Thanks to this, we can use the NETA driver on 64 bits architecture and
avoid 32-bit buf_cookie descriptor field limitation.

Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Semihalf
Differential Revision: https://reviews.freebsd.org/D12257
2017-09-09 11:49:36 +00:00
mw
870c89c45c Introduce UART driver module for Armada 3700
This patch adds support for UART in Armada 3700 family.
It exposes both low-level UART interface, as well as
standard driver methods.

Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Semihalf
Differential Revision: https://reviews.freebsd.org/D12250
2017-09-09 11:42:32 +00:00
mw
a6d3b03225 Add support for Armada 3700 EHCI
This patch reuses ehci_mv driver by adding a support for the new
compatible string and adding ehci_mv.c to list of available options
for arm64 platforms.

Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Semihalf
Differential Revision: https://reviews.freebsd.org/D12255
2017-09-09 11:06:58 +00:00
mw
c58ace156e Add support for AHCI in Armada 3700
This patch simply AHCI generic driver by extending compatible list.

Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Semihalf
Differential Revision: https://reviews.freebsd.org/D12254
2017-09-09 11:01:44 +00:00
mw
e6919c1102 Add support for xhci in Armada 3700 and 7k/8k
This driver will be used by Marvell Armada 3700 and 7k/8k SoC families.
The same, generic xhci device also appears in Armada 380, so we are reusing
driver.

This patch also adds xhci_mv.c entry to the arm64 files list.

Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Semihalf
Differential Revision: https://reviews.freebsd.org/D12252
2017-09-09 10:54:13 +00:00
np
5f42964041 cxgbe(4): Fix a couple of problems in the sge_wrq data path.
- start_wrq_wr must not drain the wr_list if there are incomplete_wrs
  pending.  This can happen when a t4_wrq_tx runs between two
  start_wrq_wr.

- commit_wrq_wr must examine the cookie's pidx and ndesc with the
  queue's lock held.  Otherwise there is a bad race when incomplete WRs
  are being completed and commit_wrq_wr for the WR that is ahead in the
  queue updates the next incomplete WR's cookie's pidx/ndesc but the
  commit_wrq_wr for the second one is using stale values that it read
  without the lock.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-09-09 05:12:14 +00:00
scottl
cc20e2dd99 Refactor interrupt allocation and deallocation. Add some extra
diagnostics.  No other functional changes.

Sponsored by:	Netflix
2017-09-08 20:20:35 +00:00
shurd
2b41576ff7 Added support for displaying HW port stats using sysctl.
This provides port stats (updated once per second) in
dev.bnxt.X.port_stats for PFs.  VFs do not have access to the port stats.

Submitted by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed by:	shurd, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Broadcom Limited
Differential Revision:	https://reviews.freebsd.org/D11914
2017-09-08 18:03:34 +00:00
scottl
a1d4bb9b44 Fix intrhook release in MFI as well 2017-09-08 17:51:19 +00:00
scottl
a47298b955 As with r323317, hold off on releasing the intrhook during boot until
we're ready to accept probing from GEOM.  Untested, but the pattern is
the same as with aac.
2017-09-08 17:40:29 +00:00
scottl
e12d535518 Move the intrhook release to later in the function so that GEOM knows to wait longer
for possible root devices to come online.  This fixes a race that seems to be
triggered by EARLY_AP_STARTUP.

Submitted by:	cgull@glup.org
2017-09-08 16:52:59 +00:00
kib
4b0f002ff7 Fix malloc() uses in em_get_regs().
Do not use malloc(M_NOWAIT), wait is possible there, and the malloc
failures where not checked.  Do not forget to free malloced memory.

Reported and tested by:	pho
Approved by:	sbruno
Sponsored by:	The FreeBSD Foundation
2017-09-08 14:54:07 +00:00
landonf
893330e185 bhnd: Remove unsupported USB core IDs from the bhnd_usb device table.
This resolves a SoC reset triggered by attempting to attach to the
BCM5365's USB 1.1 controller.

Approved by:	adrian (mentor, implicit)
2017-09-06 23:43:20 +00:00
mjg
03fd2a83b9 vxge: plug void casts from memcpy/memzero calls
Most of places using them did not have the cast in the first place.

No functional changes.

MFC after:	1 week
2017-09-06 21:38:07 +00:00
shurd
52b656c3e0 bnxt: Update firmware header file with the latest one
hsi_struct_def.h file contains all firmware (HWRM) data struct's, updated
that with the latest one which was released on 30'th Aug.

After this upgrade, HWRM version will be 1.8.1.5 (earlier it was 1.4.0).

Submitted by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed by:	shurd, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Broadcom Limited
Differential Revision:	https://reviews.freebsd.org/D12203
2017-09-06 20:19:30 +00:00
shurd
8395e6fe21 bnxt: Use correct firmware call for number of queues supported
1) Based on the suggestion from firmware team, derive
   scctx->isc_ntxqsets_max & scctx->isc_nrxqsets_max based on FUNC_QCFG
   (instead of FUNC_QCAPS).
2) Bump-up driver version to "1.0.0.2".

Submitted by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed by:	shurd, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Broadcom Limited
Differential Revision:	https://reviews.freebsd.org/D12128
2017-09-06 20:14:34 +00:00
kib
3e6224e523 Skylake server core PMC support for hwpmc(4).
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
Hardware provided by:	Intel
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12221
2017-09-06 17:19:48 +00:00
hselasky
810f32006b Add new USB quirk.
PR:			221775
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-06 13:59:57 +00:00
scottl
b215fc814b Checkpoint the next phase in debug message cleanup, this time focusing on
error recovery messages.

Sponsored by:	Netflix
2017-09-06 09:19:54 +00:00
cem
4e49567270 amdsmn(4): Do not probe not matching hostbridges
Similar to r323195, but for amdsmn(4) driver (which borrowed some design).

Ignore hostbs that do not match our PCI device id criteria.

Sponsored by:	Dell EMC Isilon
2017-09-05 21:00:33 +00:00
cem
4e680a2201 amdtemp(4): Do not probe not matching hostbridges
Some systems have hostbs that do not match our PCI device id criteria.
Detect and ignore these devices in probe.

PR:		218264
Sponsored by:	Dell EMC Isilon
2017-09-05 20:35:25 +00:00
cem
312a42b051 amdtemp(4): Add support for Family 17h temperature sensor
The sensor value is formatted similarly to previous models (same
bitfield sizes, same units), but must be read off of the internal
System Management Network (SMN) from the System Management Unit (SMU)
co-processor.

PR:		218264
Reported and tested by:	Nils Beyer <nbe AT renzel.net>
Reviewed by:	avg (no +1), mjoras, truckman
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12217
2017-09-05 15:19:14 +00:00
cem
3bf35bb3c4 Add smn(4) driver for AMD System Management Network
AMD Family 17h CPUs have an internal network used to communicate between
the host CPU and the PSP and SMU coprocessors.  It exposes a simple
32-bit register space.

Reviewed by:	avg (no +1), mjoras, truckman
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12217
2017-09-05 15:13:41 +00:00
sephe
ab9d242c8d hyperv/hn: Log RSS capabilities mask.
This helps to detect when UDP hash types can be supported.

MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12177
2017-09-05 06:20:02 +00:00
sephe
464b6d2488 hyperv/hn: Implement SIOCGIFRSS{KEY,HASH}.
The conditional compiling in the review request is removed, since
these IOCTLs will be available in stable/10 and stable/11.

Reviewed by:	gallatin
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12175
2017-09-05 06:05:48 +00:00
mav
e31c0fd7de Increase negotiation polling period from 10ms to 100ms.
There is no big need to burn CPU if other side may be not there yet.  For
example, the PLX hardware by default enables the NTB link up on reset, not
dependig on driver to do it.  In case of Intel hardware this also reduces
race between MSI-X workaround negotiation and upper layers, using the same
scratchpad registers in different time.

MFC after:	12 days
2017-09-02 13:28:45 +00:00
mav
27adbb1a94 Make NTB drivers report more info via NewBus methods.
MFC after:	12 days
2017-09-02 11:56:16 +00:00
mav
4807d74429 Link Interface has no Link Error registers.
MFC after:	13 days
2017-09-01 09:48:19 +00:00
np
d04c74979d cxgbe/iw_cxgbe: Set TCP_NODELAY before initiating connection so that
t4_tom picks it up right away.  This is less work than waiting for
the connection to be established before applying the setting.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-09-01 01:34:12 +00:00
np
bbae171575 cxgbe/t4_tom: There may not be a tid to update if the connection isn't
established.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-08-31 23:34:08 +00:00
jkim
019413f519 Merge ACPICA 20170831. 2017-08-31 22:47:04 +00:00
mav
541c775bb8 Clear doorbell bits after masking them before processing.
In theory this allows to avoid one more expensive doorbell register read
later in some scenarios.  But in practice it also significantly increases
packet rate on PLX hardware, that I can't explain yet, possibly work-
arounding some interrupt delays.

MFC after:	13 days
Sponsored by:	iXsystems, Inc.
2017-08-31 21:37:22 +00:00
np
feaa9f48e7 cxgbe/t4_tom: Add a knob to select the congestion control algorigthm
used by the TOE hardware for fully offloaded connections.  The knob
affects new connections only.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-08-31 20:33:22 +00:00
mav
213e12d71e Remove unneeded pmap_change_attr() calls.
Reported by:	kib
MFC after:	13 days
2017-08-31 17:02:06 +00:00
mav
376d970620 Add/polish some defines.
MFC after:	13 days
2017-08-31 16:32:11 +00:00