Commit Graph

79 Commits

Author SHA1 Message Date
John Baldwin
2beaefe884 cxgbei: Support unmapped I/O requests.
- Add icl_pdu_append_bio and icl_pdu_get_bio methods.

- Add new page pod routines for allocating and writing page pods for
  unmapped bio requests.  Use these new routines for setting up DDP
  for iSCSI tasks with a SCSI I/O CCB which uses CAM_DATA_BIO.

- When ICL_NOCOPY is used to append data from an unmapped I/O request
  to a PDU, construct unmapped mbufs from the relevant pages backing
  the struct bio.  This also requires changes in the t4_push_pdus path
  to support unmapped mbufs.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34383
2022-03-10 15:50:52 -08:00
John Baldwin
511b83b167 cxgbei: Replace worker thread pools with per-connection kthreads.
Having a single pool of worker threads adds extra complexity and
overhead.  The software backend also uses per-connection kthreads.

Sponsored by:	Chelsio Communications
2022-02-07 16:20:40 -08:00
John Baldwin
fd8f61d6e9 cxgbei: Dispatch sent PDUs to the NIC asynchronously.
Previously the driver was called to send PDUs to the NIC synchronously
from the icl_conn_pdu_queue_cb callback.  However, this performed a
fair bit of work while holding the icl connection lock.  Instead,
change the callback to add sent PDUs to a STAILQ and defer dispatching
of PDUs to the NIC to a helper thread similar to the scheme used in
the TCP iSCSI backend.

- Replace rx_flags int and the sole RXF_ACTIVE flag with a simple
  rx_active bool.

- Add a pool of transmit worker threads for cxgbei.

- Fix worker thread exit to depend on the wakeup in kthread_exit()
  to fix a race with module unload.

Reported by:	mav
Sponsored by:	Chelsio Communications
2022-02-07 16:20:06 -08:00
John Baldwin
74fea8eb4f cxgbei: Rework parsing of pre-offload PDUs.
sbcut() returns mbufs in reverse order so is not suitable for reading
data from the socket buffer.  Instead, check for already-received data
in the receive worker thread before passing offload PDUs up to the
iSCSI layer.  This uses soreceive() to read data from the socket and
is also to use M_WAITOK since it now runs from a worker thread instead
of an interrupt thread.

Also, fix decoding of the data segment length for pre-offload PDUs.

Reported by:	Jithesh Arakkan @ Chelsio
Fixes:		a8c4147edc cxgbei: Parse all PDUs received prior to enabling offload mode.
Sponsored by:	Chelsio Communications
2022-02-04 15:38:49 -08:00
John Baldwin
a8c4147edc cxgbei: Parse all PDUs received prior to enabling offload mode.
Previously this would only handle a single PDU that did not contain
any data.  This should now handle an arbitrary number of PDUs.

While here check for these PDUs in the T6-specific CPL_RX_ISCSI_CMP
handler in addition to CPL_RX_ISCSI_DDP.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications
2022-01-24 14:20:02 -08:00
Navdeep Parhar
39d5cbdc1b cxgbe(4): Fix "set but not used [-Wunused-but-set-variable]" warnings.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2022-01-10 12:15:12 -08:00
John Baldwin
8903d8e37f iscsi: Pass the request PDU to icl_conn_transfer_setup().
This matches icl_conn_task_setup() which passes the PDU and avoids the
need for a layering violation in cxgbei to fetch the request PDU from
the ctl_io.

Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D33746
2022-01-04 14:37:17 -08:00
John Baldwin
752e211e64 cxgbei: Don't fail task setup if the socket is disconnected.
When the initiator is reconnecting to the target, the connection may
temporarily be marked disconnected or not have an associated socket.
New I/O requests received by the initiator in this state should not
fail with ECONNRESET as that results in an I/O error back to userland.
Instead, they need to still succeed so that CAM can queue the requests
and send them once the connection is re-established.

Setting up DDP for zero-copy receive requires a socket, so just punt
on using DDP for these transfers.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications
2021-12-23 14:14:07 -08:00
John Baldwin
e900338c09 Move the ICL_CONN_*LOCK* macros to <dev/iscsi/icl.h>.
These macros are not backend-specific but reference a
backend-independent field in struct icl_conn.

Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32858
2021-11-05 16:38:25 -07:00
John Baldwin
f63ddf465f cxgbei: Only convert "plain" TCP connections to ISCSI.
Reject attempts to convert a connection using a different ULP
mode: (e.g. DDP or TLS) to ISCSI.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications
2021-09-13 09:57:54 -07:00
John Baldwin
b7caa81576 cxgbei: Return early for EBUSY error in icl_cxgbei_conn_handoff.
This permits unindenting almost half of the function.

Sponsored by:	Chelsio Communications
2021-09-13 09:57:54 -07:00
John Baldwin
9b1bb0aee6 cxgbei: Disable ISO for -SO cards without external memory.
Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications
2021-09-13 09:57:54 -07:00
John Baldwin
4d4cf62e29 cxgbei: Handle errors in PDUs.
When a PDU with an error (bad padding, header digest, or data digest)
is received, log the error via ICL_WARN() and then reset the
connection via the ic_error callback.

While here, add per-rxq counters for errors.

Sponsored by:	Chelsio Communications
2021-09-10 15:10:00 -07:00
John Baldwin
d39e65b5bd cxgbei: Add sysctls to report the maximum data segment lengths.
These sysctls report the maximum data segment lengths supported by an
adapter.  These are the values advertised to the remote end during the
login phase.

Sponsored by:	Chelsio Communications
2021-08-30 15:55:40 -07:00
John Baldwin
64f09f2346 cxgbei: Limit T5 transmit data segments to 15k.
This avoids exceeding a limit in the firmware when using ISO with
jumbo frames.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications
2021-08-30 15:27:08 -07:00
John Baldwin
c261b6ea4e iscsi: Teach the iSCSI stack about "large" received PDUs.
When using iSCSI PDU offload (cxgbei) on T6 adapters, a burst of
received PDUs can be reported via a single message to the driver.

Previously the driver passed these multi-PDU bursts up to the iSCSI
stack up as a single "large" PDU by rewriting the buffer offset, data
segment length, and DataSN fields in the iSCSI header.  The DataSN
field in particular was rewritten so that each of the "large" PDUs
used consecutively increasing values.  While this worked, the forged
DataSN values did not match the ExpDataSN value in the subsequent SCSI
Response PDU.  The initiator does not currently verify this value, but
the forged DataSN values prevent adding a check.

To avoid this, allow a logical iSCSI PDU (struct icl_pdu) to describe
a burst of PDUs via a new 'ip_additional_pdus' field.  Normally this
field is set to zero when 'struct icl_pdu' represents a single PDU.
If logical PDU represents a burst of on-the-wire PDUs, then 'ip_npdus'
contains the count of additional on-the-wire PDUs.  The header of this
"large" PDU is still modified, but the DataSN field now contains the
DataSN value of the first on-the-wire PDU in the burst.

Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31577
2021-08-18 10:56:28 -07:00
John Baldwin
d75b0870e5 cxgbei: Restrict received PDUs to 4 DDP pages in length.
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31576
2021-08-17 11:14:37 -07:00
John Baldwin
f28715fdc1 cxgbei: Only round PDU data segment lengths down by 512 on T5.
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31575
2021-08-17 11:14:29 -07:00
John Baldwin
cbc186360c cxgbei: Restructure how PDU limits are managed.
- Compute data segment limits in read_pdu_limits() rather than PDU
  length limits.

- Add back connection-specific PDU overhead lengths to compute PDU
  length limits in icl_cxgbei_conn_handoff().

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31574
2021-08-17 11:14:11 -07:00
John Baldwin
2eb0e53a6b cxgbei: Wait for the final CPL to be received in icl_cxgbei_conn_close.
A socket in the FIN_WAIT_1 state is marked disconnected by
do_close_con_rpl() even though there might still receive data pending.
This is because the socket at that point has set SBS_CANTRCVMORE which
causes the protocol layer to discard any data received before the FIN.
However, icl_cxgbei_conn_close needs to wait until all the data has
been discarded.  Replace the wait for SS_ISDISCONNECTED with instead
waiting for final_cpl_received() to be called.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications
2021-08-12 08:48:35 -07:00
John Baldwin
5b27e4b27c cxgbei: Support for ISO (iSCSI segmentation offload).
ISO can be disabled before establishing a connection by setting
dev.tNnex.N.toe.iso to 0.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31223
2021-08-06 14:21:37 -07:00
John Baldwin
87322a9075 iscsi: Remove icl_soft-only fields from struct icl_conn.
Create a struct icl_soft_conn which extends struct icl_conn and
move fields only used by icl_soft from struct icl_conn to
struct icl_soft_conn.

Reviewed by:	mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31414
2021-08-05 12:05:30 -07:00
John Baldwin
d0d631d5f4 cxgbei: Round up the maximum PDU data length by the MSS for TXDATAPLEN_MAX.
Recent firmware versions round down the value passed here by the MSS
and subsequently mishandle transmitted PDUs larger than the rounded
down value.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications
2021-07-30 13:27:24 -07:00
John Baldwin
67495c13d0 cxgbei: Wait for socket to close in icl_cxgbei_conn_close.
This ensures the TOE has finished processing any in-flight received
data before returning to the caller.  The caller assumes it is safe to
free any open tasks or transfers (and associated buffers) after this
function returns.

Previously, data placed directly via DDP could be written to buffers
after the caller had freed the buffers.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications
2021-07-29 16:34:46 -07:00
John Baldwin
b5e73dd952 cxgbei: Don't assert F for data completion PDUs.
If a data PDU encounters an error such as a digest error, the firmware
will report that data PDU when completion moderation is active even if
it is not the final data PDU in a burst.

Sponsored by:	Chelsio Communications
2021-07-19 15:36:31 -07:00
John Baldwin
4a7d15ebb6 cxgbei: Remove invalid assertion.
A non-placed PDU can be delivered by CPL_RX_ISCSI_CMP in the middle of
a burst of placed PDUs (received via DDP) in which case the rcv_nxt
will not match the start of the non-placed PDU.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications
2021-07-19 15:36:31 -07:00
John Baldwin
abc273a290 cxgbei: Better handle new tasks and transfers when disconnecting.
If the connection is in the process of disconnecting, ic_socket can be
NULL.  For icl_cxgbei_conn_transfer_setup(), lock the connection and
check ic_socket before using it.  For icl_cxgbei_conn_task_setup(),
the caller already holds the connection lock, so assert it and bail
early with ECONNRESET if the connection is disconnecting.

Reported by:	Jithesh Arakkan @ Chelsio
Fixes:	 	f949967c8e cxgbei: Fix a race between transfer setup and a peer reset.
2021-06-22 16:09:54 -07:00
John Baldwin
f949967c8e cxgbei: Fix a race between transfer setup and a peer reset.
In 4427ac3675, the TOM driver stopped sending work requests to
program iSCSI page pods directly and instead queued them to be written
asynchronously with iSCSI PDUs.  The queue of mbufs to send is
protected by the inp lock.  However, the inp cannot be safely obtained
from the toep since a RST from the remote peer might have cleared
toep->inp asynchronously in an ithread.  To fix, obtain the inp from
the socket as is already done in icl_cxgbei_conn_pdu_queue_cb() and
fail the new transfer setup with ECONNRESET if the connection has been
reset.

To avoid passing sockets or inps into the page pod routines, pull the
mbufq out of the two relevant page pod routines such that the routines
queue new work request mbufs to a caller-supplied mbufq.

Reported by:	Jithesh Arakkan @ Chelsio
Fixes:		4427ac3675
2021-05-28 16:47:04 -07:00
John Baldwin
67360f7bb0 cxgbei: Support iSCSI offload on T6.
T6 makes several changes relative to T5 for receive of iSCSI PDUs.

First, earlier adapters issue either 2 or 3 messages to the host for
each PDU received: CPL_ISCSI_HDR contains the BHS of the PDU,
CPL_ISCSI_DATA (when DDP is not used for zero-copy receive) contains
the PDU data as buffers on the freelist, and CPL_RX_ISCSI_DDP with
status of the PDU such as result of CRC checks.  In T6, a new
CPL_RX_ISCSI_CMP combines CPL_ISCSI_HDR and CPL_RX_ISCSI_DDP.  Data
PDUs which are directly placed via DDP only report a single
CPL_RX_ISCSI_CMP message.  Data PDUs received on the free lists are
reported as CPL_ISCSI_DATA followed by CPL_RX_ISCSI_CMP.  Control PDUs
such as R2T are still reported via CPL_ISCSI_HDR and CPL_RX_ISCSI_DDP.

Supporting this requires changing the CPL_ISCSI_DATA handler to
allocate a PDU structure if it is not preceded by a CPL_ISCSI_HDR as
well as support for the new CPL_RX_ISCSI_CMP.

Second, when using DDP for zero-copy receive, T6 will only issue a
CPL_RX_ISCSI_CMP after a burst of PDUs have been received (indicated
by the F flag in the BHS).  In this case, the CPL_RX_ISCSI_CMP can
reflect the completion of multiple PDUs and the BHS and TCP sequence
number included in the message are from the last PDU received in the
burst.  Notably, the message does not include any information about
earlier PDUs received as part of the burst.  Instead, the driver must
track the amount of data already received for a given transfer and use
this to compute the amount of data received in a burst.  In addition,
the iSCSI layer currently has no way to permit receiving a logical PDU
which spans multiple PDUs.  Instead, the driver presents each burst as
a single, "large" PDU to the iSCSI target and initiators.  This is
done by rewriting the buffer offset and data length fields in the BHS
of the final PDU as well as rewriting the DataSN so that the received
PDUs appear to be in order.

To track all this, cxgbei maintains a hash table of 'cxgbei_cmp'
structures indexed by transfer tags for each offloaded iSCSI
connection.  When a SCSI_DATA_IN message is received, the ITT from the
received BHS is used to find the necessary state in the hash table,
whereas SCSI_DATA_OUT replies use the TTT as the key.  The structure
tracks the expected starting offset and DataSN of the next burst as
well as the rewritten DataSN value used for the previously received
PDU.

Discussed with:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30458
2021-05-28 16:45:29 -07:00
John Baldwin
0cc7d64a2a iscsi: Move the maximum data segment limits into 'struct icl_conn'.
This fixes a few bugs in iSCSI backends where the backends were using
the limits they advertised initially during the login phase as the
final values instead of the values negotiated with the other end.

Reported by:	Jithesh Arakkan @ Chelsio
Reviewed by:	mav
Differential Revision:	https://reviews.freebsd.org/D30271
2021-05-20 09:59:11 -07:00
John Baldwin
3bede2908a cxgbei: Add tunable sysctls for the FirstBurstLength and MaxBurstLength.
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30269
2021-05-19 15:56:54 -07:00
John Baldwin
671fd0ec8d cxgbei: Remove unused sysctls.
These were seemingly copied over from icl_soft.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30268
2021-05-19 15:56:45 -07:00
John Baldwin
e73e2ee0ac cxgbei: Handle target transfers with excess unsolicited data.
The CTL frontend might have provided a buffer that is smaller than the
FirstBurstLength and thus smaller than the amount of unsolicited data
included in the request PDU.  Treat these transfers as an empty
transfer.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications

Differential Revision:	https://reviews.freebsd.org/D29940
2021-05-14 12:21:34 -07:00
John Baldwin
e894e3adb2 cxgbei: Explicitly clear the page pode reservation pointer after freeing it.
A single union ctl_io can be reused across multiple transfers (in
particular by the ramdisk backend).  On a reuse, the reservation
pointer would retain its value from the previous transfer tripping an
assertion.

Reported by:	Jithesh Arakkan @ Chelsio
Sponsored by:	Chelsio Communications

Differential Revision:	https://reviews.freebsd.org/D29939
2021-05-14 12:21:34 -07:00
John Baldwin
1ad32ad0be cxgbei: Don't clamp iSCSI PDUs to 8K.
The firmware no longer requires this workaround.

Discussed with:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29912
2021-05-14 12:21:24 -07:00
John Baldwin
4add8e4c89 cxgbei: Don't leak resources for an aborted target transfer.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29911
2021-05-14 12:17:26 -07:00
John Baldwin
a1c687347a cxgbei: Add support for zero-copy iSCSI target transmission/read.
- Switch to allocating the cxgbei version of icl_pdu explicitly
  as a separate refcounted object allocated via malloc/free
  instead of storing it in the bhs mbuf prior to the bhs.

- Support the icl_conn_pdu_queue_cb() method to set a callback
  on a PDU to be invoked when the PDU is freed.

- For ICL_NOCOPY buffers, use an external mbuf to manage the
  storage for the buffer via m_extaddref().  Each external mbuf
  holds a reference on the associated PDU, so the callback is
  invoked once all of the external mbufs have been freed.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29910
2021-05-14 12:17:20 -07:00
John Baldwin
31df8ff73e cxgbei: Rework the pdu_append_data hook to support M_WAITOK.
- Only allocate 16K jumbo mbufs if the region of data to be
  appended is sufficiently large, and use a loop.

- Use m_getm2() to allocate a chain for data less than 16K, or
  if m_getjcl() fails.

- Use ENOMEM as the return value instead of '1' if the hook fails due
  to a memory allocation error.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29909
2021-05-14 12:17:14 -07:00
John Baldwin
46bee8043e cxgbei: Support DDP for target I/O S/G lists with more than one entry.
A CAM target layer I/O CCB can use a S/G list of virtual address ranges
to describe its data buffer.  This change adds zero-copy receive support
for such requests.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29908
2021-05-14 12:17:06 -07:00
John Baldwin
91ca7b0954 cxgbei: Whitespace fixes, comment typo, and rewrap a comment.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29906
2021-05-14 12:16:57 -07:00
John Baldwin
87bb5ed606 cxgbei: Use hardware RX flow control for offloaded iSCSI connections.
Forthcoming T6 iSCSI DDP support requires hardware RX flow control.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29905
2021-05-14 12:16:51 -07:00
John Baldwin
4427ac3675 cxgbe tom: Set the tid in the work requests to program page pods for iSCSI.
As a result, CPL_FW4_ACK now returns credits for these work requests.
To support this, page pod work requests are now constructed in special
mbufs similar to "raw" mbufs used for NIC TLS in plain TX queues.
These special mbufs are stored in the ulp_pduq and dispatched in order
with PDU work requests.

Sponsored by:	Chelsio Communications
Discussed with:	np
Differential Revision:	https://reviews.freebsd.org/D29904
2021-05-14 12:16:40 -07:00
John Baldwin
4b6ed0758d cxgbe: Make the TOE ISCSI RX stats per-queue instead of per adapter.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29903
2021-05-14 12:16:33 -07:00
John Baldwin
077ba6a845 cxgbe: Add a struct sge_ofld_txq type.
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues.  Currently it only holds a work queue but will
include additional state in future changes.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29382
2021-03-26 15:19:58 -07:00
John Baldwin
90c74b2b60 cxgbei: Enter network epoch and set vnet around t4_push_pdus().
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29302
2021-03-22 10:05:02 -07:00
John Baldwin
8855ed61b5 cxgbei: Pass ULP submode directly to set_ulp_mode_iscsi().
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29300
2021-03-22 10:05:02 -07:00
John Baldwin
45eed2331e cxgbei: Move some function prototypes to cxgbei.h.
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29299
2021-03-22 10:05:02 -07:00
John Baldwin
52c11c3f74 cxgbei: Set vnet around tcp_drop() in do_rx_iscsi_ddp().
Reviewed by:	np
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29298
2021-03-22 10:05:02 -07:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Navdeep Parhar
11a82cd688 cxgbei: destroy the worker threads' CV and mutex in stop_worker_threads.
Reported by:	bz@
MFC after:	3 days
2020-08-21 00:34:33 +00:00