RFC 7143 (11.7.4):
The Target Transfer Tag values are not specified by this protocol,
except that the value 0xffffffff is reserved and means that the
Target Transfer Tag is not supplied.
MFC after: 1 month
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
Before this CTL always allocated MAXPHYS-sized buffers, even for 4KB I/O,
that is even more overkill for MAXPHYS of 1MB. This change limits maximum
allocation to 512KB if MAXPHYS is bigger, plus if one is above 128KB, adds
new 128KB UMA zone for smaller I/Os. The patch factors out alloc/free,
so later we could make it use more zones or malloc() if we'd like.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Foundation copyrights, approved by emaste@. It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.
Reviewed by: emaste, imp, gbe (manpages)
Differential Revision: https://reviews.freebsd.org/D26980
SAM-3 specification introduced concept of Task Priority, that was renamed
to Command Priority in SAM-4, and supported by all modern SCSI transports.
It provides 15 levels of relative priorities: 1 - highest, 15 - lowest and
0 - default. SAT specification for SATA devices translates priorities 1-3
into NCQ high priority.
This change adds new "priority" field into empty spots of struct ccb_scsiio
and struct ccb_accept_tio of CAM and struct ctl_scsiio of CTL. Respective
support is added into iscsi(4), isp(4), mpr(4), mps(4) and ocs_fc(4) drivers
for both initiator and where applicable target roles. Minimal support was
added to CTL to receive the priority value from different frontends, pass it
between HA controllers and report in few places.
This patch does not add consumers of this functionality, so nothing should
really change yet, since the field is still set to 0 (default) on initiator
and not actively used on target. Those are to be implemented separately.
I've confirmed priority working on WD Red SATA disks connected via mpr(4)
and properly transferred to CTL target via iscsi(4), isp(4) and ocs_fc(4).
While there, added missing tag_action support to ocs_fc(4) initiator role.
MFC after: 1 month
Relnotes: yes
Sponsored by: iXsystems, Inc.
It allows to report to initiator LU identifying information, preset via
"ident_info" and "text_ident_info" options.
Unfortunately it is impossible to implement SET IDENTIFYING INFORMATION,
since we have no persistent storage it requires, so the information is
read-only for initiator and has to be set out-of-band.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Add ICL_NOCOPY flag to icl_pdu_append_data(), specifying that the method
can just reference the data buffer instead of immediately copying it.
Extend the offload KPI with optional PDU queue method, allowing to specify
completion callback, called when all the data referenced by above has been
transferred and won't be accessed any more (the buffers can be freed).
Implement the above functionality in software iSCSI driver using mbufs
with external storage and reference counter. Note that some NICs (ixl(4))
may keep the mbuf in TX queue for a long time, so CTL has to be ready.
Add optional method to struct ctl_scsiio for buffer reference counting.
Implement it for CTL block backend, allowing to delay free of the struct
ctl_be_block_io and memory it references as needed. In first reincarnation
of the patch I tried to delay whole I/O as it is done for FibreChannel,
that was cleaner, but due to the above callback delays I had to rewrite
it this way to not leave LUN referenced potentially for hours or more.
All together on sequential read from ZFS ARC this saves about 30% of CPU
time and memory bandwidth by avoiding one of 3 memory copies (the other
two are from ZFS ARC to DMU cache and then from DMU cache to CTL buffers).
On tests with 2x Xeon Silver 4114 this allows to reach full line rate of
100GigE NIC. Tests with Gold CPUs and two 100GigE NICs are stil TBD,
but expectations to saturate them are pretty high. ;)
Discussed with: Chelsio
Sponsored by: iXsystems, Inc.
cs_cmdsn can be incremented with single atomic. expcmdsn/maxcmdsn set in
cfiscsi_pdu_prepare() based on cs_cmdsn are not required to be updated
synchronously, only monotonically, that is achieved with lock there.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
We any way have per-I/O space in CTL_PRIV_FRONTEND, while for PDU private
fields I have better use ideas. Plus to me such use of PDU fields looked
a layering violation.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
I don't see a point to copy io->scsiio.kern_total_len into the request
PDU private field. The io is going to stay with us till the end, and
kern_total_len field is not changed after being first initialized.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
ha_lso is a listening socket (unless bind() has failed), so should use
solisten_upcall_set(NULL, NULL), not soupcall_clear().
MFC after: 1 week
Sponsored by: iXsystems, Inc.
While there, remove ifdef around cs_target check in cfiscsi_ioctl_list().
I am not sure why this ifdef was added, but without this check code will
crash below on NULL dereference.
Submitted by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24587
- Make ctl_add_lun() synchronous. Asynchronous addition was used by
Copan's proprietary code long ago and never for upstream FreeBSD.
- Move LUN enable/disable calls from backends to CTL core.
- Serialize LUN modification and partially removal to avoid double frees.
- Slightly unify backends code.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
These flags have been unused for some time. Some of them were in the
CAM2 specification, but CAM has moved on a bit from that. Some were
used in the old Pluto VideoSpace (and AirSpace) systems which had the
video playback I/O scheduler in userspace, but have been unused since
then.
Reviewed by: chuck, ken
Differential Revision: https://reviews.freebsd.org/D24008
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427
Combined with earlier nstart/nend removal it allows to remove several locks
from request path of GEOM and few other places. It would be cool if we had
more SMP-friendly statistics, but this helps too.
Sponsored by: iXsystems, Inc.
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.
v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.
Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715
If not limited by write_same_max_lba option, split operation into several
2^^31 blocks chunks in a loop. For large disks it may take a while, so
setting write_same_max_lba may be useful to avoid timeouts.
While there, fix build with CAM_CTL_DEBUG.
MFC after: 2 weeks
The values to report can be set via LUN options. It can be useful for
testing, and also required for Drive Maintenance 2016 feature set.
MFC after: 2 weeks
CTL implements all defined feature sets except Drive Maintenance 2016,
which is not very applicable to such a virtual device, and implemented
only partially now. But may be it could be fixed later at least for
completeness.
MFC after: 2 weeks
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
Since in most configurations CTL serves as network service, we found
that this change improves local system interactivity under heavy load.
Priority of main threads is set slightly higher then worker taskqueues
to make them quickly sort incoming requests not creating bottlenecks,
while plenty of worker taskqueues should be less sensitive to latency.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Replace long per-LUN queue of blocked commands, scanned on each command
completion and sometimes even twice, causing up to O(n^^2) processing cost,
by much shorter per-command blocked queues, scanned only when respective
command completes, and check only commands before the previous blocker,
reducing cost to O(n).
While there, unblock aborted commands to make them "complete" ASAP to be
removed from the OOA queue and so not waste time ordering other commands
against them. Aborted commands that were not sent to execution yet should
have no visible side effects, so this is safe and easy optimization now,
comparing to commands already in processing, which are a still pain.
Together those two optimizations should fix quite pathological case, when
due to backend slowness CTL accumulated many thousands of blocked requests,
partially aborted by initiator and so supposedly not even existing, but
still wasting CTL CPU time.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
- Collapse original_sc and serializing_sc fields into one, since they
are never used simultanously, we have only one local I/O and one remote.
- Move remote_sglist and local_sglist fields into CTL_PRIV_BACKEND,
since they are used only on Originating SC in XFER mode, where requests
don't ever reach backends, so we can reuse backend's private storage.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
It was not only disabled for quite a while, but also appeared to be broken
at r325517, when maximum number of ports was made configurable.
MFC after: 1 week
of the fix/workaround for the "ctld hanging on reload" problem.
PR: 220175
Reported by: Eugene M. Zheganin <emz at norma.perm.ru>
Tested by: Eugene M. Zheganin <emz at norma.perm.ru>
Approved by: re (kib)
MFC after: 2 weeks
Sponsored by: playkey.net
Somehow this was working even after PTI in, at least on amd64, and got
broken by something only very recently.
Reviewed by: araujo
Approved by: re (gjb)
for the "ctld hanging on reload" problem observed in same cases under
high load. I'm not 100% sure it's _the_ fix, as the issue is rather hard
to reproduce, but it was tested as part of a larger path and the problem
disappeared. It certainly shouldn't break anything.
Now, technically, it shouldn't be needed. Quoting mav@, "After
ct->ct_online == 0 there should be no new sessions attached to the target.
And if you see some problems abbout it, it may either mean that there are
some races where single cfiscsi_session_terminate(cs) call may be lost,
or as a guess while this thread was sleeping target was reenabbled and
redisabled again". Should such race be discovered and properly fixed
in the future, than this and the followup two commits can be backed out.
PR: 220175
Reported by: Eugene M. Zheganin <emz at norma.perm.ru>
Tested by: Eugene M. Zheganin <emz at norma.perm.ru>
Discussed with: mav
Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: playkey.net
race condition, due to a missing call to cfiscsi_target_release().
Discussed with: mav@
Tested by: Eugene M. Zheganin <emz at norma.perm.ru> (earlier version)
MFC after: 2 weeks
Sponsored by: playkey.net
- Remove unused or dead store variable
- Remove unused function ctl_copyin_alloc
- Add missing curly brackets, this seems a regression in r287720
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15383
ioctl frontend ports.
This revision introduces two changes to CTL:
- Changes the way options are passed to CTL_LUN_REQ and CTL_PORT_REQ ioctls.
Removes ctl_be_arg structure and associated logic and replaces it with
nv(3)-based logic for passing in and out arguments.
- Allows creating multiple ioctl frontend ports using either ctladm(8) or
ctld(8).
New frontend ports are represented by /dev/cam/ctl<pp>.<vp> nodes, eg /dev/cam/ctl5.3.
Those device nodes respond only to CTL_IO ioctl.
New command-line options for ctladm:
# creates new ioctl frontend port with using free pp and vp=0
ctladm port -c
# creates new ioctl frontend port with pp=10 and vp=0
ctladm port -c -O pp=10
# creates new ioctl frontend port with pp=11 and vp=12
ctladm port -c -O pp=11 -O vp=12
# removes port with number 4 (it's a "targ_port" number, not pp number)
ctladm port -r -p 4
New syntax for ctl.conf:
target ... {
port ioctl/<pp>
...
}
target ... {
port ioctl/<pp>/<vp>
...
Note: Most of this work was made by jceel@, thank you.
Submitted by: jceel
Reworked by: myself
Reviewed by: mav (earlier versions and recently during the rework)
Obtained from: FreeNAS and TrueOS
Relnotes: Yes
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D9299
The crash scenario goes like this: there's a thread waiting on "reinstate";
because it doesn't update the timeout counter it gets terminated by the
callout; at this point the maintenance thread starts the termination routine.
The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops
the refcount, which allows the maintenance thread to free its resources. At
this point another thread receives a PDU. Boom.
PR: 222898, 219866
Reported by: Eugene M. Zheganin <emz at norma.perm.ru>
Tested by: Eugene M. Zheganin <emz at norma.perm.ru>
Reviewed by: mav@ (earlier version)
MFC after: 2 weeks
Sponsored by: playkey.net