Commit Graph

170 Commits

Author SHA1 Message Date
Alexander Motin
9a4510ac32 Implement zero-copy iSCSI target transmission/read.
Add ICL_NOCOPY flag to icl_pdu_append_data(), specifying that the method
can just reference the data buffer instead of immediately copying it.

Extend the offload KPI with optional PDU queue method, allowing to specify
completion callback, called when all the data referenced by above has been
transferred and won't be accessed any more (the buffers can be freed).

Implement the above functionality in software iSCSI driver using mbufs
with external storage and reference counter.  Note that some NICs (ixl(4))
may keep the mbuf in TX queue for a long time, so CTL has to be ready.

Add optional method to struct ctl_scsiio for buffer reference counting.
Implement it for CTL block backend, allowing to delay free of the struct
ctl_be_block_io and memory it references as needed.  In first reincarnation
of the patch I tried to delay whole I/O as it is done for FibreChannel,
that was cleaner, but due to the above callback delays I had to rewrite
it this way to not leave LUN referenced potentially for hours or more.

All together on sequential read from ZFS ARC this saves about 30% of CPU
time and memory bandwidth by avoiding one of 3 memory copies (the other
two are from ZFS ARC to DMU cache and then from DMU cache to CTL buffers).
On tests with 2x Xeon Silver 4114 this allows to reach full line rate of
100GigE NIC.  Tests with Gold CPUs and two 100GigE NICs are stil TBD,
but expectations to saturate them are pretty high. ;)

Discussed with:	Chelsio
Sponsored by:	iXsystems, Inc.
2020-06-08 20:53:57 +00:00
Alexander Motin
1f29b46c42 Do not try to fill socket send buffer to the last byte.
Setting so_snd.sb_lowat to at least 1/8 of the socket buffer size allows
send thread more actively use PDUs coalescing, that dramatically reduces
TCP lock congestion and number of context switches, when the socket is
full and PDUs are small.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-05-22 18:10:46 +00:00
Pawel Biernacki
e0d69c5a88 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (1 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Reviewed by:	kib, trasz
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23640
2020-02-15 18:48:38 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Edward Tomasz Napierala
6960c4e135 Fix typo in a warning message.
MFC after:	2 weeks
2018-03-14 18:27:06 +00:00
Edward Tomasz Napierala
8ff2372a2c Check for duplicates when modifying an iSCSI session. Previously we did
this check on open, but "iscsictl -M", or an iSCSI redirect received by
iscsid(8) could end up with two sessions with the same target name and
portal.

MFC after:	2 weeks
2018-03-10 14:21:37 +00:00
Edward Tomasz Napierala
43ee6e9d7b Add SPDX tags to iscsi(4).
MFC after:	2 weeks
2018-01-24 16:58:26 +00:00
Edward Tomasz Napierala
22d3bb2625 Move the DIAGNOSTIC check for lost iSCSI PDUs from icl_conn_close()
to icl_conn_free().  It's perfectly valid for the counter to be non-zero
in the former.

MFC after:	2 weeks
Sponsored by:	playkey.net
2017-12-09 15:34:40 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Hans Petter Selasky
9ac7c5a64c Make sure the iSCSI I/O limits are set properly so that the ISCSIDSEND IOCTL
can be used prior to the ISCSIDHANDOFF IOCTL which set the negotiated values.
Else the login PDU will fail when passing the "-r" option to "iscsictl" which
means iSCSI over RDMA instead of TCP/IP.

Discussed with:	np@ and trasz@
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-23 13:57:44 +00:00
Andriy Gapon
e54fb4ff8c iscsi_shutdown_post: do nothing if panic-ing
There is nothing that that routine should or could really do in that
context.

Reported by:	Ben RUBSON <ben.rubson@gmail.com>
MFC after:	1 week
2017-10-24 14:59:31 +00:00
Andriy Gapon
ad10496cf4 never retry oustanding requests when terminating iscsi session
CAM_REQ_ABORTED sounds natural for aborting outstanding requests when
tearing down a session, but that status actually causes eligible
requests to be tried again.  That's completely useless, so let's use
CAM_DEV_NOT_THERE instead.  Perhaps there is a better status, but this
should be good enough.  The change should affect only the session
termination.

Tested by:	Ben RUBSON <ben.rubson@gmail.com>
Reviewed by:	mav, trasz
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D12653
2017-10-17 16:03:59 +00:00
Andriy Gapon
20e9cab5fa iscsi: do not hold the global lock while tearing down a session
It should be sufficient to hold the lock just for removing the session
from the session list.  Everything else should be covered by the session
specific lock.

On top of that, at present we can get a deadlock caused by waiting on
the CAM SIM reference count while holding the global lock.  A specific
scenario involving ZFS is this:
- concurrent termination of two sessions, S1 and S2
- session S1 completed all I/Os and sleeps in CAM waiting for device
  close by ZFS;
- session S2 is also dead now, but can not forcefully complete
  outstanding requests by calling iscsi_session_cleanup() from
  iscsi_maintenance_thread_terminate(), since it can't get the same
  global sc_lock;
- as soon as there are unfinished requests, ZFS can not do
  spa_config_enter() as writer, and so can not close the device for
  session S1;
- deadlock.

Reported by:	Ben RUBSON <ben.rubson@gmail.com>
Tested by:	Ben RUBSON <ben.rubson@gmail.com>
Reviewed by:	mav, trasz
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D12652
2017-10-17 15:39:38 +00:00
Gleb Smirnoff
779f106aa1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
Alexander Motin
82f7fa7ae6 Inline some trivial wrapper functions.
MFC after:	2 weeks
2017-03-02 16:14:15 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Alexander Motin
4c9ea0ced9 Freeze CAM SIM when request is postponed due to MaxCmdSN.
This allows to avoid resource allocation (especially offload) for requests
that can not be executed at this time any way.

MFC after:	2 weeks
2017-02-17 04:34:17 +00:00
Alexander Motin
5b338bc073 Fix tight loop spinning on postponed requests.
MFC after:	2 weeks
2017-02-17 04:29:23 +00:00
Alexander Motin
605703b5df Fix handling of negative sbspace() return values.
I found that at least with Chelsio NICs TOE sockets quite often report
negative sbspace() values.  Using unsigned variable to store it resulted
in attempts to aggregate too much data in one sosend() call, that caused
errors and following connection termination.

MFC after:	2 weeks
2017-02-15 19:46:00 +00:00
Alexander Motin
33d9db92e2 Directly call m_gethdr() instead of m_getm2() for BHS.
All this code is based on assumption that data will be stored in one piece,
and since buffer size if known and fixed, it is easier to hardcode it.

MFC after:	2 weeks
2017-02-14 18:34:25 +00:00
Alexander Motin
875ac6cfac Temporary attach AHS to BHS to calculate header digest.
MFC after:	2 weeks
2017-02-14 18:29:07 +00:00
Alexander Motin
d0d587c787 Do not rely on data alignment after m_pullup().
In general case m_pullup() does not really guarantee any data alignment.
Instead of depenting on side effects caused by data being always copied
out of mbuf cluster (which is probably a bug by itself), always allocate
aligned BHS buffer and read data there directly from socket.

While there, reuse new icl_conn_receive_buf() function to read digests.
The code could probably be even more optimized to aggregate those reads,
but until that done, this is still easier then the way it was before.

MFC after:	2 weeks
2017-02-14 16:33:42 +00:00
Alexander Motin
898fd11f5e Remove M_PKTHDR from m_getm2() in icl_pdu_append_data().
ip_data_mbuf is always appended to ip_bhs_mbuf, so it does not need own
packet header.  This change first avoids allocation/initialization of the
header, and then avoids dropping one when it later gets to socket buffer.

MFC after:	2 weeks
2017-02-13 20:36:28 +00:00
Navdeep Parhar
48214203c4 Fix send/recv limit mixup. 2016-09-05 23:12:24 +00:00
Navdeep Parhar
97b84d344d Make the iSCSI parameter negotiation more flexible.
Decouple the send and receive limits on the amount of data in a single
iSCSI PDU.  MaxRecvDataSegmentLength is declarative, not negotiated, and
is direction-specific so there is no reason for both ends to limit
themselves to the same min(initiator, target) value in both directions.

Allow iSCSI drivers to report their send, receive, first burst, and max
burst limits explicitly instead of using hardcoded values or trying to
derive all of them from the receive limit (which was the only limit
reported by the drivers prior to this change).

Display the send and receive limits separately in the userspace iSCSI
utilities.

Reviewed by:	jpaetzel@ (earlier version), trasz@
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7279
2016-08-25 05:22:53 +00:00
Edward Tomasz Napierala
66ab9d1591 Consistently use 'unsigned int' for session IDs.
MFC after:	1 month
2016-06-09 13:04:57 +00:00
Edward Tomasz Napierala
aede1e09cf Add some spares to structs used by iscsi(4), to avoid ABI problems
during 11-STABLE.

MFC after:	1 month
2016-06-09 11:39:50 +00:00
Edward Tomasz Napierala
4e5408f10c Report negotiated MaxBurstLength and FirstBurstLength in "iscsictl -v"
and "ctladm islist -v" outputs.

MFC after:	1 month
2016-06-05 08:48:37 +00:00
Edward Tomasz Napierala
ba165a31b3 Add "iscsictl -e". Among other things, it makes it possible to perform
discovery without attaching to the targets ("iscsictl -Ad ... -e off"),
and then attach to selected ones ("iscsictl -Mi ... -e on").

PR:		204129
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6633
2016-05-31 11:32:07 +00:00
Edward Tomasz Napierala
bcec64bc61 Add a special case for iSER data tranfers.
Obtained from:	Mellanox Technologies
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-26 12:43:15 +00:00
Edward Tomasz Napierala
09c84055a0 Add kern.icl.iser_offloads sysctl.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-24 14:34:36 +00:00
Edward Tomasz Napierala
93fb610fe8 Rename kern.icl.drivers to kern.icl.offloads, for consistency.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-24 08:54:41 +00:00
Edward Tomasz Napierala
b891159418 Add mechanism for choosing iSER-capable ICL modules.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-24 08:44:45 +00:00
Edward Tomasz Napierala
a3fd63f223 Properly reset session state when using proxy and fail_on_disconnection=1.
Without it the reconnection would fail due to mismatched sequence numbers.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-21 11:26:03 +00:00
Edward Tomasz Napierala
7deb68ab2c Provide a way for ICL modules to declare they support PIM_UNMAPPED.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-21 11:10:48 +00:00
Edward Tomasz Napierala
b218ca6fdd Pass maxtags value to the ICL module. iSER needs it.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-21 10:59:36 +00:00
Edward Tomasz Napierala
906a424b26 Call the ICL module's handoff method even when using ICL proxy.
The upcoming iSER code uses this.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-20 17:38:51 +00:00
Edward Tomasz Napierala
d66a906bc2 Make ICL proxy use kernel code for handling iSCSI sequence numbers
for PDUs to/from iscsid(8).  This fixes StatSN for Logout PDUs sent
by iscsi_session_logout().

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-19 14:57:37 +00:00
Edward Tomasz Napierala
2f0586b2ce Make it possible to interrupt proxy-mode iscsid receive.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-19 14:37:24 +00:00
Edward Tomasz Napierala
257cbe3410 Rename icl_proxy.c to icl_soft_proxy.c, to make it clear it's a part
of software ICL backend.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-17 15:21:17 +00:00
Edward Tomasz Napierala
0fbbc37da3 Make iscsi_ioctl_daemon_send() actually work by adding missing locking.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-17 11:59:38 +00:00
Edward Tomasz Napierala
f41492b00f Add icl_conn_connect() ICL method, required for iSER.
Obtained from:	Mellanox Technologies (earlier version)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-17 11:10:44 +00:00
Edward Tomasz Napierala
604c023f94 Extend the ICL interface to include the PDU pointer in the task_setup
method.  This is required for upcoming iSER support.

Obtained from:	Mellanox Technologies (earlier version)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-17 08:55:21 +00:00
Edward Tomasz Napierala
47d8fd8502 Make ICL_KERNEL_PROXY compilable.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-17 07:56:45 +00:00
Pedro F. Giffuni
266078c6db dev/iscsi: minor spelling fixes.
No functional change.

Reviewed by:	trasz
2016-05-03 14:49:49 +00:00
Edward Tomasz Napierala
938bcb04fc Fix iSCSI initiator crash that could happen with out-of-memory
conditions with in-flight IO and subsequent reconnection.

PR:		199117
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5673
2016-03-25 16:01:40 +00:00
Edward Tomasz Napierala
e204e2cd93 Add lock assertion.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-03-18 13:26:16 +00:00
Edward Tomasz Napierala
099ad7abd0 Add a kern.icl.drivers sysctl, to retrieve the list of registered
ICL drivers.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-02-10 19:01:26 +00:00