Commit Graph

180 Commits

Author SHA1 Message Date
Alexander Motin
ff751ee05c Remove FirstBurstLength limit for software iSCSI.
For hardware offload solicited data may potentially be handled more
efficiently than unsolicited due to direct data placement.  Or there
can be some unsolicited write buffering limitations.  It may create
situations where FirstBurstLength limit is really useful.

Software driver though has no those factors, having to do memcopy()
any way and having no so hard limit on the temporary storage.  Same
time more active use of unsolicited transfers allows to avoid some
of Ready To Transfer (R2T) PDU round-trip times and processing.

This change effectively doubles from 64KB to 128KB the maximum size
of write command that can be transferred within one link RTT.  Tests
of (64KB, 128KB] QD1 writes mixed with simultaneous QD8 reads over
the same connection, increasing RTT, shows almost double write speed
and half latency, while we should be able to afford few megabytes of
RAM for additional buffering on a target these days.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2021-01-20 23:17:12 -05:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Edward Tomasz Napierala
bce7ee9d41 Drop "All rights reserved" from all my stuff. This includes
Foundation copyrights, approved by emaste@.  It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.

Reviewed by:	emaste, imp, gbe (manpages)
Differential Revision:	https://reviews.freebsd.org/D26980
2020-10-28 13:46:11 +00:00
Alexander Motin
8836496815 Introduce support of SCSI Command Priority.
SAM-3 specification introduced concept of Task Priority, that was renamed
to Command Priority in SAM-4, and supported by all modern SCSI transports.
It provides 15 levels of relative priorities: 1 - highest, 15 - lowest and
0 - default.  SAT specification for SATA devices translates priorities 1-3
into NCQ high priority.

This change adds new "priority" field into empty spots of struct ccb_scsiio
and struct ccb_accept_tio of CAM and struct ctl_scsiio of CTL.  Respective
support is added into iscsi(4), isp(4), mpr(4), mps(4) and ocs_fc(4) drivers
for both initiator and where applicable target roles.  Minimal support was
added to CTL to receive the priority value from different frontends, pass it
between HA controllers and report in few places.

This patch does not add consumers of this functionality, so nothing should
really change yet, since the field is still set to 0 (default) on initiator
and not actively used on target.  Those are to be implemented separately.

I've confirmed priority working on WD Red SATA disks connected via mpr(4)
and properly transferred to CTL target via iscsi(4), isp(4) and ocs_fc(4).

While there, added missing tag_action support to ocs_fc(4) initiator role.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
2020-10-25 19:34:02 +00:00
Richard Scheffenegger
4dfbcffbb9 Add network QoS support for PCP to iscsi initiator.
Make the Ethernet PCP codepoint configurable
for L2 local traffic, to allow lower latency for
iSCSI block IO. This addresses the initiator
side only.

Reviewed by:	mav, trasz, bcr
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D26739
2020-10-24 21:07:13 +00:00
Alexander Motin
7dbbd1aeae Negotiate iSCSIProtocolLevel of 2 (RFC 7144) in initiator.
It does not change anything immediately, but allows further support of
Command Priority, Status Qualifier and new task management functions.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2020-10-22 20:26:27 +00:00
Edward Tomasz Napierala
186bcdaac7 If the SIM freezes the queue at exactly the wrong moment, after
another thread has started to send in a CCB and already checked
the queue wasn't frozen, we would end up with iscsi_action()
being called despite the queue is now frozen.

Add a check to make sure this doesn't happen . Perhaps this should
be fixed at the CAM level instead, but given how the send queue and
SIM are governed by two separate mutexes, it is somewhat hard to do.

Reviewed by:	imp, mav
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26750
2020-10-18 16:30:49 +00:00
Richard Scheffenegger
bfabdade5c Add DSCP support for network QoS to iscsi initiator.
Allow the DSCP codepoint also to be configurable
for the traffic in the direction from the initiator
to the target, such that writes and any requests
are also treated in the appropriate QoS class.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D26714
2020-10-09 14:33:09 +00:00
Mateusz Guzik
2140d5b64f iscsi: clean up empty lines in .c and .h files 2020-09-01 21:30:22 +00:00
Alexander Motin
9a4510ac32 Implement zero-copy iSCSI target transmission/read.
Add ICL_NOCOPY flag to icl_pdu_append_data(), specifying that the method
can just reference the data buffer instead of immediately copying it.

Extend the offload KPI with optional PDU queue method, allowing to specify
completion callback, called when all the data referenced by above has been
transferred and won't be accessed any more (the buffers can be freed).

Implement the above functionality in software iSCSI driver using mbufs
with external storage and reference counter.  Note that some NICs (ixl(4))
may keep the mbuf in TX queue for a long time, so CTL has to be ready.

Add optional method to struct ctl_scsiio for buffer reference counting.
Implement it for CTL block backend, allowing to delay free of the struct
ctl_be_block_io and memory it references as needed.  In first reincarnation
of the patch I tried to delay whole I/O as it is done for FibreChannel,
that was cleaner, but due to the above callback delays I had to rewrite
it this way to not leave LUN referenced potentially for hours or more.

All together on sequential read from ZFS ARC this saves about 30% of CPU
time and memory bandwidth by avoiding one of 3 memory copies (the other
two are from ZFS ARC to DMU cache and then from DMU cache to CTL buffers).
On tests with 2x Xeon Silver 4114 this allows to reach full line rate of
100GigE NIC.  Tests with Gold CPUs and two 100GigE NICs are stil TBD,
but expectations to saturate them are pretty high. ;)

Discussed with:	Chelsio
Sponsored by:	iXsystems, Inc.
2020-06-08 20:53:57 +00:00
Alexander Motin
1f29b46c42 Do not try to fill socket send buffer to the last byte.
Setting so_snd.sb_lowat to at least 1/8 of the socket buffer size allows
send thread more actively use PDUs coalescing, that dramatically reduces
TCP lock congestion and number of context switches, when the socket is
full and PDUs are small.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-05-22 18:10:46 +00:00
Pawel Biernacki
e0d69c5a88 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (1 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Reviewed by:	kib, trasz
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23640
2020-02-15 18:48:38 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Edward Tomasz Napierala
6960c4e135 Fix typo in a warning message.
MFC after:	2 weeks
2018-03-14 18:27:06 +00:00
Edward Tomasz Napierala
8ff2372a2c Check for duplicates when modifying an iSCSI session. Previously we did
this check on open, but "iscsictl -M", or an iSCSI redirect received by
iscsid(8) could end up with two sessions with the same target name and
portal.

MFC after:	2 weeks
2018-03-10 14:21:37 +00:00
Edward Tomasz Napierala
43ee6e9d7b Add SPDX tags to iscsi(4).
MFC after:	2 weeks
2018-01-24 16:58:26 +00:00
Edward Tomasz Napierala
22d3bb2625 Move the DIAGNOSTIC check for lost iSCSI PDUs from icl_conn_close()
to icl_conn_free().  It's perfectly valid for the counter to be non-zero
in the former.

MFC after:	2 weeks
Sponsored by:	playkey.net
2017-12-09 15:34:40 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Hans Petter Selasky
9ac7c5a64c Make sure the iSCSI I/O limits are set properly so that the ISCSIDSEND IOCTL
can be used prior to the ISCSIDHANDOFF IOCTL which set the negotiated values.
Else the login PDU will fail when passing the "-r" option to "iscsictl" which
means iSCSI over RDMA instead of TCP/IP.

Discussed with:	np@ and trasz@
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-23 13:57:44 +00:00
Andriy Gapon
e54fb4ff8c iscsi_shutdown_post: do nothing if panic-ing
There is nothing that that routine should or could really do in that
context.

Reported by:	Ben RUBSON <ben.rubson@gmail.com>
MFC after:	1 week
2017-10-24 14:59:31 +00:00
Andriy Gapon
ad10496cf4 never retry oustanding requests when terminating iscsi session
CAM_REQ_ABORTED sounds natural for aborting outstanding requests when
tearing down a session, but that status actually causes eligible
requests to be tried again.  That's completely useless, so let's use
CAM_DEV_NOT_THERE instead.  Perhaps there is a better status, but this
should be good enough.  The change should affect only the session
termination.

Tested by:	Ben RUBSON <ben.rubson@gmail.com>
Reviewed by:	mav, trasz
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D12653
2017-10-17 16:03:59 +00:00
Andriy Gapon
20e9cab5fa iscsi: do not hold the global lock while tearing down a session
It should be sufficient to hold the lock just for removing the session
from the session list.  Everything else should be covered by the session
specific lock.

On top of that, at present we can get a deadlock caused by waiting on
the CAM SIM reference count while holding the global lock.  A specific
scenario involving ZFS is this:
- concurrent termination of two sessions, S1 and S2
- session S1 completed all I/Os and sleeps in CAM waiting for device
  close by ZFS;
- session S2 is also dead now, but can not forcefully complete
  outstanding requests by calling iscsi_session_cleanup() from
  iscsi_maintenance_thread_terminate(), since it can't get the same
  global sc_lock;
- as soon as there are unfinished requests, ZFS can not do
  spa_config_enter() as writer, and so can not close the device for
  session S1;
- deadlock.

Reported by:	Ben RUBSON <ben.rubson@gmail.com>
Tested by:	Ben RUBSON <ben.rubson@gmail.com>
Reviewed by:	mav, trasz
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D12652
2017-10-17 15:39:38 +00:00
Gleb Smirnoff
779f106aa1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
Alexander Motin
82f7fa7ae6 Inline some trivial wrapper functions.
MFC after:	2 weeks
2017-03-02 16:14:15 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Alexander Motin
4c9ea0ced9 Freeze CAM SIM when request is postponed due to MaxCmdSN.
This allows to avoid resource allocation (especially offload) for requests
that can not be executed at this time any way.

MFC after:	2 weeks
2017-02-17 04:34:17 +00:00
Alexander Motin
5b338bc073 Fix tight loop spinning on postponed requests.
MFC after:	2 weeks
2017-02-17 04:29:23 +00:00
Alexander Motin
605703b5df Fix handling of negative sbspace() return values.
I found that at least with Chelsio NICs TOE sockets quite often report
negative sbspace() values.  Using unsigned variable to store it resulted
in attempts to aggregate too much data in one sosend() call, that caused
errors and following connection termination.

MFC after:	2 weeks
2017-02-15 19:46:00 +00:00
Alexander Motin
33d9db92e2 Directly call m_gethdr() instead of m_getm2() for BHS.
All this code is based on assumption that data will be stored in one piece,
and since buffer size if known and fixed, it is easier to hardcode it.

MFC after:	2 weeks
2017-02-14 18:34:25 +00:00
Alexander Motin
875ac6cfac Temporary attach AHS to BHS to calculate header digest.
MFC after:	2 weeks
2017-02-14 18:29:07 +00:00
Alexander Motin
d0d587c787 Do not rely on data alignment after m_pullup().
In general case m_pullup() does not really guarantee any data alignment.
Instead of depenting on side effects caused by data being always copied
out of mbuf cluster (which is probably a bug by itself), always allocate
aligned BHS buffer and read data there directly from socket.

While there, reuse new icl_conn_receive_buf() function to read digests.
The code could probably be even more optimized to aggregate those reads,
but until that done, this is still easier then the way it was before.

MFC after:	2 weeks
2017-02-14 16:33:42 +00:00
Alexander Motin
898fd11f5e Remove M_PKTHDR from m_getm2() in icl_pdu_append_data().
ip_data_mbuf is always appended to ip_bhs_mbuf, so it does not need own
packet header.  This change first avoids allocation/initialization of the
header, and then avoids dropping one when it later gets to socket buffer.

MFC after:	2 weeks
2017-02-13 20:36:28 +00:00
Navdeep Parhar
48214203c4 Fix send/recv limit mixup. 2016-09-05 23:12:24 +00:00
Navdeep Parhar
97b84d344d Make the iSCSI parameter negotiation more flexible.
Decouple the send and receive limits on the amount of data in a single
iSCSI PDU.  MaxRecvDataSegmentLength is declarative, not negotiated, and
is direction-specific so there is no reason for both ends to limit
themselves to the same min(initiator, target) value in both directions.

Allow iSCSI drivers to report their send, receive, first burst, and max
burst limits explicitly instead of using hardcoded values or trying to
derive all of them from the receive limit (which was the only limit
reported by the drivers prior to this change).

Display the send and receive limits separately in the userspace iSCSI
utilities.

Reviewed by:	jpaetzel@ (earlier version), trasz@
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7279
2016-08-25 05:22:53 +00:00
Edward Tomasz Napierala
66ab9d1591 Consistently use 'unsigned int' for session IDs.
MFC after:	1 month
2016-06-09 13:04:57 +00:00
Edward Tomasz Napierala
aede1e09cf Add some spares to structs used by iscsi(4), to avoid ABI problems
during 11-STABLE.

MFC after:	1 month
2016-06-09 11:39:50 +00:00
Edward Tomasz Napierala
4e5408f10c Report negotiated MaxBurstLength and FirstBurstLength in "iscsictl -v"
and "ctladm islist -v" outputs.

MFC after:	1 month
2016-06-05 08:48:37 +00:00
Edward Tomasz Napierala
ba165a31b3 Add "iscsictl -e". Among other things, it makes it possible to perform
discovery without attaching to the targets ("iscsictl -Ad ... -e off"),
and then attach to selected ones ("iscsictl -Mi ... -e on").

PR:		204129
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6633
2016-05-31 11:32:07 +00:00
Edward Tomasz Napierala
bcec64bc61 Add a special case for iSER data tranfers.
Obtained from:	Mellanox Technologies
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-26 12:43:15 +00:00
Edward Tomasz Napierala
09c84055a0 Add kern.icl.iser_offloads sysctl.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-24 14:34:36 +00:00
Edward Tomasz Napierala
93fb610fe8 Rename kern.icl.drivers to kern.icl.offloads, for consistency.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-24 08:54:41 +00:00
Edward Tomasz Napierala
b891159418 Add mechanism for choosing iSER-capable ICL modules.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-24 08:44:45 +00:00
Edward Tomasz Napierala
a3fd63f223 Properly reset session state when using proxy and fail_on_disconnection=1.
Without it the reconnection would fail due to mismatched sequence numbers.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-21 11:26:03 +00:00
Edward Tomasz Napierala
7deb68ab2c Provide a way for ICL modules to declare they support PIM_UNMAPPED.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-21 11:10:48 +00:00
Edward Tomasz Napierala
b218ca6fdd Pass maxtags value to the ICL module. iSER needs it.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-21 10:59:36 +00:00
Edward Tomasz Napierala
906a424b26 Call the ICL module's handoff method even when using ICL proxy.
The upcoming iSER code uses this.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-20 17:38:51 +00:00
Edward Tomasz Napierala
d66a906bc2 Make ICL proxy use kernel code for handling iSCSI sequence numbers
for PDUs to/from iscsid(8).  This fixes StatSN for Logout PDUs sent
by iscsi_session_logout().

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-19 14:57:37 +00:00