1100 Commits

Author SHA1 Message Date
Andrew Gallatin
df8437a93d cxgbe: fix enabling lro & rxtimestamps
A recent change caused iq flags, like LRO, to be set before
init_iq(). However, init_iq() clears those flags, so they
became effectively impossible to set.   This change moves
the initializion of these flags to after the call to init_iq().
This fixes LRO.

Differential Revision: https://reviews.freebsd.org/D30460
Reviewed by: np, rrs
Sponsored by: Netflix
Fixes: 43bbae19483fbde0a91e61acad8a6e71e334c8b8 <https://reviews.freebsd.org/R10:43bbae19483fbde0a91e61acad8a6e71e334c8b8>"
2021-05-26 10:00:07 -04:00
John Baldwin
883a0196b6 crypto: Add a new type of crypto buffer for a single mbuf.
This is intended for use in KTLS transmit where each TLS record is
described by a single mbuf that is itself queued in the socket buffer.
Using the existing CRYPTO_BUF_MBUF would result in
bus_dmamap_load_crp() walking additional mbufs in the socket buffer
that are not relevant, but generating a S/G list that potentially
exceeds the limit of the tag (while also wasting CPU cycles).

Reviewed by:	markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30136
2021-05-25 16:59:18 -07:00
Navdeep Parhar
24b98f288d cxgbe(4): Overhaul CLIP (Compressed Local IPv6) table management.
- Process the list of local IPs once instead of once per adapter.  Add
  addresses from all VNETs to the driver's list but leave hardware
  updates for later when the global VNET/IFADDR list locks have been
  released.

- Add address to the hardware table synchronously when a CLIP entry is
  requested for an address that's not already in there.

- Provide ioctls that allow userspace tools to manage addresses in the
  CLIP table.

- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
  automatically added to the CLIP table or not.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-05-23 16:07:29 -07:00
Navdeep Parhar
ffbb373c5a cxgbe(4): Fix build warnings with NOINET kernels.
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D26334
2021-05-21 20:42:04 -07:00
John Baldwin
0cc7d64a2a iscsi: Move the maximum data segment limits into 'struct icl_conn'.
This fixes a few bugs in iSCSI backends where the backends were using
the limits they advertised initially during the login phase as the
final values instead of the values negotiated with the other end.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	mav
Differential Revision:	https://reviews.freebsd.org/D30271
2021-05-20 09:59:11 -07:00
John Baldwin
3bede2908a cxgbei: Add tunable sysctls for the FirstBurstLength and MaxBurstLength.
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30269
2021-05-19 15:56:54 -07:00
John Baldwin
671fd0ec8d cxgbei: Remove unused sysctls.
These were seemingly copied over from icl_soft.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30268
2021-05-19 15:56:45 -07:00
John Baldwin
a9f0cf4838 cxgbe: Fix some merge-o's for the per-rxq iSCSI counters.
I botched a few of the changes when rebasing the changes in
4b6ed0758dc6fad17081d7bd791cb0edbddbddb8 across the changes in
43bbae19483fbde0a91e61acad8a6e71e334c8b8.

- Move the counter allocations into alloc_ofld_rxq().

- Free the counters freeing an ofld rxq.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30267
2021-05-19 15:56:31 -07:00
Navdeep Parhar
3965469eaa cxgbe(4): Remove some dead code.
MFC after:	3 days
2021-05-18 23:16:03 -07:00
John Baldwin
8d2b4b2e7c cxgbe: Cast pointer arguments to trunc_page() to vm_offset_t.
Reported by:	mjg, jenkins, rmacklem
Fixes:		46bee8043ee2bd352d420cd573e0364ca45f813e
Sponsored by:	Chelsio Communications
2021-05-17 17:04:22 -07:00
John Baldwin
e73e2ee0ac cxgbei: Handle target transfers with excess unsolicited data.
The CTL frontend might have provided a buffer that is smaller than the
FirstBurstLength and thus smaller than the amount of unsolicited data
included in the request PDU.  Treat these transfers as an empty
transfer.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications

Differential Revision:	https://reviews.freebsd.org/D29940
2021-05-14 12:21:34 -07:00
John Baldwin
e894e3adb2 cxgbei: Explicitly clear the page pode reservation pointer after freeing it.
A single union ctl_io can be reused across multiple transfers (in
particular by the ramdisk backend).  On a reuse, the reservation
pointer would retain its value from the previous transfer tripping an
assertion.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications

Differential Revision:	https://reviews.freebsd.org/D29939
2021-05-14 12:21:34 -07:00
John Baldwin
1ad32ad0be cxgbei: Don't clamp iSCSI PDUs to 8K.
The firmware no longer requires this workaround.

Discussed with:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29912
2021-05-14 12:21:24 -07:00
John Baldwin
4add8e4c89 cxgbei: Don't leak resources for an aborted target transfer.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29911
2021-05-14 12:17:26 -07:00
John Baldwin
a1c687347a cxgbei: Add support for zero-copy iSCSI target transmission/read.
- Switch to allocating the cxgbei version of icl_pdu explicitly
  as a separate refcounted object allocated via malloc/free
  instead of storing it in the bhs mbuf prior to the bhs.

- Support the icl_conn_pdu_queue_cb() method to set a callback
  on a PDU to be invoked when the PDU is freed.

- For ICL_NOCOPY buffers, use an external mbuf to manage the
  storage for the buffer via m_extaddref().  Each external mbuf
  holds a reference on the associated PDU, so the callback is
  invoked once all of the external mbufs have been freed.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29910
2021-05-14 12:17:20 -07:00
John Baldwin
31df8ff73e cxgbei: Rework the pdu_append_data hook to support M_WAITOK.
- Only allocate 16K jumbo mbufs if the region of data to be
  appended is sufficiently large, and use a loop.

- Use m_getm2() to allocate a chain for data less than 16K, or
  if m_getjcl() fails.

- Use ENOMEM as the return value instead of '1' if the hook fails due
  to a memory allocation error.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29909
2021-05-14 12:17:14 -07:00
John Baldwin
46bee8043e cxgbei: Support DDP for target I/O S/G lists with more than one entry.
A CAM target layer I/O CCB can use a S/G list of virtual address ranges
to describe its data buffer.  This change adds zero-copy receive support
for such requests.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29908
2021-05-14 12:17:06 -07:00
John Baldwin
23b209ee88 cxgbe tom: Account for pre-iSCSI mode data on suspended connections.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29907
2021-05-14 12:17:02 -07:00
John Baldwin
91ca7b0954 cxgbei: Whitespace fixes, comment typo, and rewrap a comment.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29906
2021-05-14 12:16:57 -07:00
John Baldwin
87bb5ed606 cxgbei: Use hardware RX flow control for offloaded iSCSI connections.
Forthcoming T6 iSCSI DDP support requires hardware RX flow control.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29905
2021-05-14 12:16:51 -07:00
John Baldwin
4427ac3675 cxgbe tom: Set the tid in the work requests to program page pods for iSCSI.
As a result, CPL_FW4_ACK now returns credits for these work requests.
To support this, page pod work requests are now constructed in special
mbufs similar to "raw" mbufs used for NIC TLS in plain TX queues.
These special mbufs are stored in the ulp_pduq and dispatched in order
with PDU work requests.

Sponsored by:	Chelsio Communications
Discussed with:	np
Differential Revision:	https://reviews.freebsd.org/D29904
2021-05-14 12:16:40 -07:00
John Baldwin
4b6ed0758d cxgbe: Make the TOE ISCSI RX stats per-queue instead of per adapter.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29903
2021-05-14 12:16:33 -07:00
Navdeep Parhar
f4ba035bca cxgbe(4): Use ifaddr_event_ext instead of ifaddr_event for CLIP management.
The _ext event notification includes the address being added/removed and
that gives the driver an easy way to ignore non-IPv6 addresses.  Remove
'tom' from the handler's name while here, it was moved out of t4_tom a
long time ago.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-05-04 20:16:25 -07:00
Navdeep Parhar
b9820bca18 cxgbe(4): Do not panic when tx is called with invalid checksum requests.
There is no need to panic in if_transmit if the checksums requested are
inconsistent with the frame being transmitted.  This typically indicates
that the kernel and driver were built with different INET/INET6 options,
or there is some other kernel bug.  The driver should just throw away
the requests that it doesn't understand and move on.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-04-28 14:04:53 -07:00
Navdeep Parhar
83b5cda106 cxgbe(4): Add support for NIC suspend/resume and live reset.
Add suspend/resume callbacks to the driver and a live reset built around
them.  This commit covers the basic NIC and future commits will expand
this functionality to other stateful parts of the chip.  Suspend and
resume operate on the chip (the t?nex nexus device) and affect all its
ports.  It is not possible to suspend/resume or reset individual ports.
All these operations can be performed on a running NIC.  A reset will
look like a link bounce to the networking stack.

Here are some ways to exercise this functionality:

 /* Manual suspend and resume. */
 # devctl suspend t6nex0
 # devctl resume t6nex0

 /* Manual reset. */
 # devctl reset t6nex0

 /* Manual reset with driver sysctl. */
 # sysctl dev.t6nex.0.reset=1

 /* Automatic adapter reset on any fatal error. */
 # hw.cxgbe.reset_on_fatal_err=1

Suspend disables the adapter (DMA, interrupts, and the port PHYs) and
marks the hardware as unavailable to the driver.  All ifnets associated
with the adapter are still visible to the kernel but operations that
require hardware interaction will fail with ENXIO.  All ifnets report
link-down while the adapter is suspended.

Resume will reattach to the card, reconfigure it as before, and recreate
the queues servicing the existing ifnets.  The ifnets are able to send
and receive traffic as soon as the link comes back up.

Reset is roughly the same as a suspend and a resume with at least one of
these events in between: D0->D3Hot->D0, FLR, PCIe link retrain.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
2021-04-27 22:48:51 -07:00
Navdeep Parhar
43bbae1948 cxgbe(4): Separate the sw- and hw-specific parts of resource allocations
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC.  This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.

This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):

/* Idempotent */
alloc_foo
{
	if (!SW_ALLOCATED)
		init_iq/init_eq/init_fl		no-fail sw init
		alloc_iq_fl/alloc_eq/alloc_wrq	may-fail sw alloc
		add_foo_sysctls, etc.		no-fail post-alloc items
	if (!HW_ALLOCATED)
		alloc_iq_fl_hwq/alloc_eq_hwq	hw resource allocation
}

/* Idempotent */
free_foo
{
	if (!HW_ALLOCATED)
		free_iq_fl_hwq/free_eq_hwq	release hw resources
	if (!SW_ALLOCATED)
		free_iq_fl/free_eq/free_wrq	release sw resources
}

The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent.  The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2021-04-26 14:09:59 -07:00
Navdeep Parhar
50f5d13eeb cxgbe(4): hw.cxgbe.panic_on_fatal_err can be changed any time.
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-04-23 12:17:54 -07:00
Navdeep Parhar
5f00292fe3 cxgbe(4): Move the hw-specific parts of VXLAN setup to a separate function.
It can be called to (re)apply the settings in the driver softc to the
hardware.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-04-23 00:26:47 -07:00
Navdeep Parhar
b47b28e5b2 cxgbe(4): Add flag to reliably stop the driver from accessing hw stats.
There are two kinds of routines in the driver that read statistics from
the hardware: the cxgbe_* variants read the per-port MPS/MAC registers
and the vi_* variants read the per-VI registers.  They can be called
from the 1Hz callout or if_get_counter.  All stats collection now takes
place under the callout lock and there is a new flag to indicate that
these routines should not access any hardware register.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-04-22 17:45:52 -07:00
Navdeep Parhar
dc77e79296 cxgbe(4): Fix minor nit in the display of MPS TCAM entries.
MFC after:	3 days
2021-04-22 15:36:51 -07:00
Navdeep Parhar
8f1bc78ef7 cxgbe(4): make the logging helpers a little more robust.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2021-04-22 15:28:43 -07:00
Navdeep Parhar
557c4521bb cxgbe/t4_tom: Implement tod_pmtu_update.
tod_pmtu_update was added to the kernel in 01d74fe1ffc.

Sponsored by:	Chelsio Communications
2021-04-22 14:48:57 -07:00
Navdeep Parhar
d107ee06f3 cxgbe(4): RSS hash for VXLAN traffic is computed from the inner frame.
Sponsored by:	Chelsio Communications
2021-04-13 16:50:12 -07:00
John Baldwin
774c4c82ff TOE: Use a read lock on the PCB for syncache_add().
Reviewed by:	np, glebius
Fixes:		08d9c9202755a30f97617758595214a530afcaea
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29739
2021-04-13 16:31:04 -07:00
John Baldwin
45d5c28439 cxgbe: Ignore doomed virtual interfaces when updating the clip table.
A doomed VI does not have a valid ifnet.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29662
2021-04-12 14:36:40 -07:00
John Baldwin
568e69e4eb cxgbe: Add counters for iSCSI PDUs transmitted via TOE.
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29297
2021-04-12 13:57:45 -07:00
Navdeep Parhar
bf5057691b cxgbe/tom: Fix potential leak in t4_aiotx_process_job.
The mbuf allocated could be a chain and must be freed with m_freem.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29579
2021-04-11 19:14:18 -07:00
Navdeep Parhar
516fe911a6 cxgbe(4): Always use the per-VI callout to read interface stats.
There is no change in the source of the stats (t4_get_port_stats or
t4_get_vi_stats) but the per-port callout is gone.

Sponsored by:	Chelsio Communications
Reviewed by:	jhb@
Differential Revision:	https://reviews.freebsd.org/D29527
2021-04-01 14:24:29 -07:00
Navdeep Parhar
5394893269 cxgbe/t4_tom: restore socket's protosw before entering TIME_WAIT.
This fixes a panic due to stale so->so_proto if t4_tom is unloaded and
one or more connections that were previously offloaded are still around
in TIME_WAIT state.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29503
2021-03-31 10:54:32 -07:00
John Baldwin
fe496dc02a cxgbe: Make the TOE TLS stats per-queue instead of per-port.
This avoids some atomics by using counter_u64 for TX and relying on
existing single-threading (single ithread per rxq) for RX.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29383
2021-03-26 15:19:58 -07:00
John Baldwin
077ba6a845 cxgbe: Add a struct sge_ofld_txq type.
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues.  Currently it only holds a work queue but will
include additional state in future changes.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29382
2021-03-26 15:19:58 -07:00
Bjoern A. Zeeb
0a7b99553f cxgbe: remove unused linux headers
Remove unused #includes of LinuxKPI headers noticed while trying to
solve LinuxKPI struct net_device and related functions.
Neither netdevice.h nor inetdevice.h nor notifier.h seem to be needed.
This takes cxgbe(4) out of the picture of D29366.

Sponsored-by:	The FreeBSD Foundation
MFC-after:	2 weeks
Reviewed-by:	np
X-D-R:		D29366 (extracted as further cleanup)
Differential Revision:	https://reviews.freebsd.org/D29432
2021-03-26 17:44:38 +00:00
Navdeep Parhar
15f3355567 cxgbe(4): Allow a T6 adapter to switch between TOE and NIC TLS mode.
The hw.cxgbe.kern_tls tunable was used for this in the past and if it
was set then all T6 adapters would be configured for NIC TLS operation
and could not be reconfigured for TOE without a reload.  With this
change ifconfig can be used to manipulate toe and txtls caps like any
other caps.  hw.cxgbe.kern_tls continues to work as usual but its
effects are not permanent any more.

* Enable nic_ktls_ofld in the default configuration file and use the
  firmware instead of direct register manipulation to apply/rollback
  NIC TLS configuration.  This allows the driver to switch the hardware
  between TOE and NIC TLS mode in a safe manner.  Note that the
  configuration is adapter-wide and not per-port.

* Remove the kern_tls config file as it works with 100G T6 cards only
  and leads to firmware crashes with 25G cards.  The configurations
  included with the driver (with the exception of the FPGA configs) are
  supposed to work with all adapters.

Reported by:	Veeresh U.K. at Chelsio
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Reviewed by:	jhb@
Differential Revision: https://reviews.freebsd.org/D29291
2021-03-25 12:39:41 -07:00
John Baldwin
90c74b2b60 cxgbei: Enter network epoch and set vnet around t4_push_pdus().
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29302
2021-03-22 10:05:02 -07:00
John Baldwin
017902fc5f cxgbe ddp: Use CPL_COOKIE_DDP* instead of DDP_BUF*_INVALIDATED.
This avoids mixing the use of two different enums which modern C
compilers warn about.

Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29301
2021-03-22 10:05:02 -07:00
John Baldwin
8855ed61b5 cxgbei: Pass ULP submode directly to set_ulp_mode_iscsi().
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29300
2021-03-22 10:05:02 -07:00
John Baldwin
45eed2331e cxgbei: Move some function prototypes to cxgbei.h.
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29299
2021-03-22 10:05:02 -07:00
John Baldwin
52c11c3f74 cxgbei: Set vnet around tcp_drop() in do_rx_iscsi_ddp().
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29298
2021-03-22 10:05:02 -07:00
Navdeep Parhar
3cc6f777be cxgbe(4): create a separate helper routine to write the global RSS key.
While here, make sure only the PF driver attempts to program the global
RSS key (with options RSS).  The VF driver doesn't have access to those
device registers.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-03-19 13:35:30 -07:00
Navdeep Parhar
a1d803c162 cxgbe(4): make it safe to call setup_memwin repeatedly.
A repeat call will recreate the memory windows in the hardware and move
them to their last-known positions without repeating any of the software
initialization.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-03-19 12:37:44 -07:00