Add ICL_NOCOPY flag to icl_pdu_append_data(), specifying that the method
can just reference the data buffer instead of immediately copying it.
Extend the offload KPI with optional PDU queue method, allowing to specify
completion callback, called when all the data referenced by above has been
transferred and won't be accessed any more (the buffers can be freed).
Implement the above functionality in software iSCSI driver using mbufs
with external storage and reference counter. Note that some NICs (ixl(4))
may keep the mbuf in TX queue for a long time, so CTL has to be ready.
Add optional method to struct ctl_scsiio for buffer reference counting.
Implement it for CTL block backend, allowing to delay free of the struct
ctl_be_block_io and memory it references as needed. In first reincarnation
of the patch I tried to delay whole I/O as it is done for FibreChannel,
that was cleaner, but due to the above callback delays I had to rewrite
it this way to not leave LUN referenced potentially for hours or more.
All together on sequential read from ZFS ARC this saves about 30% of CPU
time and memory bandwidth by avoiding one of 3 memory copies (the other
two are from ZFS ARC to DMU cache and then from DMU cache to CTL buffers).
On tests with 2x Xeon Silver 4114 this allows to reach full line rate of
100GigE NIC. Tests with Gold CPUs and two 100GigE NICs are stil TBD,
but expectations to saturate them are pretty high. ;)
Discussed with: Chelsio
Sponsored by: iXsystems, Inc.
Setting so_snd.sb_lowat to at least 1/8 of the socket buffer size allows
send thread more actively use PDUs coalescing, that dramatically reduces
TCP lock congestion and number of context switches, when the socket is
full and PDUs are small.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
rename the source to gsb_crc32.c.
This is a prerequisite of unifying kernel zlib instances.
PR: 229763
Submitted by: Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision: https://reviews.freebsd.org/D20193
can be used prior to the ISCSIDHANDOFF IOCTL which set the negotiated values.
Else the login PDU will fail when passing the "-r" option to "iscsictl" which
means iSCSI over RDMA instead of TCP/IP.
Discussed with: np@ and trasz@
Sponsored by: Mellanox Technologies
MFC after: 1 week
I found that at least with Chelsio NICs TOE sockets quite often report
negative sbspace() values. Using unsigned variable to store it resulted
in attempts to aggregate too much data in one sosend() call, that caused
errors and following connection termination.
MFC after: 2 weeks
All this code is based on assumption that data will be stored in one piece,
and since buffer size if known and fixed, it is easier to hardcode it.
MFC after: 2 weeks
In general case m_pullup() does not really guarantee any data alignment.
Instead of depenting on side effects caused by data being always copied
out of mbuf cluster (which is probably a bug by itself), always allocate
aligned BHS buffer and read data there directly from socket.
While there, reuse new icl_conn_receive_buf() function to read digests.
The code could probably be even more optimized to aggregate those reads,
but until that done, this is still easier then the way it was before.
MFC after: 2 weeks
ip_data_mbuf is always appended to ip_bhs_mbuf, so it does not need own
packet header. This change first avoids allocation/initialization of the
header, and then avoids dropping one when it later gets to socket buffer.
MFC after: 2 weeks
Decouple the send and receive limits on the amount of data in a single
iSCSI PDU. MaxRecvDataSegmentLength is declarative, not negotiated, and
is direction-specific so there is no reason for both ends to limit
themselves to the same min(initiator, target) value in both directions.
Allow iSCSI drivers to report their send, receive, first burst, and max
burst limits explicitly instead of using hardcoded values or trying to
derive all of them from the receive limit (which was the only limit
reported by the drivers prior to this change).
Display the send and receive limits separately in the userspace iSCSI
utilities.
Reviewed by: jpaetzel@ (earlier version), trasz@
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7279
method. This is required for upcoming iSER support.
Obtained from: Mellanox Technologies (earlier version)
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
initiator iSCSI offload. Pass maximum data segment size supported by
chosen offload module to iscsid(8), and make iscsid(8) not try to negotiate
anything larger than that.
MFC after: 1 month
Sponsored by: The FreeBSD Foundation