Add ICL_NOCOPY flag to icl_pdu_append_data(), specifying that the method
can just reference the data buffer instead of immediately copying it.
Extend the offload KPI with optional PDU queue method, allowing to specify
completion callback, called when all the data referenced by above has been
transferred and won't be accessed any more (the buffers can be freed).
Implement the above functionality in software iSCSI driver using mbufs
with external storage and reference counter. Note that some NICs (ixl(4))
may keep the mbuf in TX queue for a long time, so CTL has to be ready.
Add optional method to struct ctl_scsiio for buffer reference counting.
Implement it for CTL block backend, allowing to delay free of the struct
ctl_be_block_io and memory it references as needed. In first reincarnation
of the patch I tried to delay whole I/O as it is done for FibreChannel,
that was cleaner, but due to the above callback delays I had to rewrite
it this way to not leave LUN referenced potentially for hours or more.
All together on sequential read from ZFS ARC this saves about 30% of CPU
time and memory bandwidth by avoiding one of 3 memory copies (the other
two are from ZFS ARC to DMU cache and then from DMU cache to CTL buffers).
On tests with 2x Xeon Silver 4114 this allows to reach full line rate of
100GigE NIC. Tests with Gold CPUs and two 100GigE NICs are stil TBD,
but expectations to saturate them are pretty high. ;)
Discussed with: Chelsio
Sponsored by: iXsystems, Inc.
Some automation tries to detect if nvd or nda is in used, and the presence of
both confuses it. Provide a knob to turn off nvd alias creation
(kern.cam.nda.nvd_compat=0) for these situations. The default is the same:
create the nvd compat link.
cs_cmdsn can be incremented with single atomic. expcmdsn/maxcmdsn set in
cfiscsi_pdu_prepare() based on cs_cmdsn are not required to be updated
synchronously, only monotonically, that is achieved with lock there.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
We any way have per-I/O space in CTL_PRIV_FRONTEND, while for PDU private
fields I have better use ideas. Plus to me such use of PDU fields looked
a layering violation.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
I don't see a point to copy io->scsiio.kern_total_len into the request
PDU private field. The io is going to stay with us till the end, and
kern_total_len field is not changed after being first initialized.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
ha_lso is a listening socket (unless bind() has failed), so should use
solisten_upcall_set(NULL, NULL), not soupcall_clear().
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Often, in traiging core files, one only has a traceback of where a
panic occurred. We have probe* and xpt* routines that live in both the
scsi and ata layers with identical names. To make one or the other
stand out, prefix all the probe and xpt routines in ata with an
'a'. I've left the scsi ones alone since they were there first and are
more numerous. I also rejected using #define to do this as being too
confusing. I chose this method because the CAM name for the probe
device was already 'aprobe'.
Normally, this doesn't matter because file scope protects one from
interfering with the other. However, due to the indirect nature of
CAM's state machine, you don't know if the following traceback is
SCSI or ATA:
xpt_done
probedone
xpt_done_process
xpt_done_td
fork_exit
nvme and mmc already have unique names.
MFC: 1 week
Differential revision: https://reviews.freebsd.org/D24825
While there, remove ifdef around cs_target check in cfiscsi_ioctl_list().
I am not sure why this ifdef was added, but without this check code will
crash below on NULL dereference.
Submitted by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24587
run it). Make sure that we do. Simplify the flow a bit, and fix a
comment since we do need to do these things.
Noticed by: cperciva (not sure why my invariants kernel didn't trigger)
- Make ctl_add_lun() synchronous. Asynchronous addition was used by
Copan's proprietary code long ago and never for upstream FreeBSD.
- Move LUN enable/disable calls from backends to CTL core.
- Serialize LUN modification and partially removal to avoid double frees.
- Slightly unify backends code.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
- maxio should be dp->d_maxsize. This is often MAXPHYS, but not always
(especially if MAXPHYS is > 1MB).
- Unlock the periph before returning. We don't need to relock it to
release the ccb.
- Make sure we release the ccb in error paths.
Reviewed by: cperciva
With these two ioctls implemented in the nda driver, nvmecontrol now
works with nda just like it does with nvd. It eliminates the need to
jump through odd hoops to get this data.
Add the nvmeX device to the XPT_PATH_INQ nvme specific
information. while one could figure this out by looking up the
domain🚌slot:function, it's a lot easier to have the SIM set it
directly since the sim knows this.
This was changed in the review process for the flags sysctl. The
reasons for the change are no longer valid as the code changed after
that. Cast the one place where it might make a difference (but I don't
think it does). This restores the ability to see flags for softc in
gdb.
Allocate a temporary buffer in the kernel to serve as the CCB data
pointer for a pass-through transaction and use copyin/copyout to
shuffle the data to/from the user buffer.
Reviewed by: scottl, brooks
Obtained from: CheriBSD
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24489
The handle_string callback for the ENCIOC_SETSTRING ioctl was passing
a user pointer to memcpy(). Fix by using copyin() instead.
For ENCIOC_GETSTRING ioctls, the handler was storing the user pointer
in a CCB's data_ptr field where it was indirected by other code. Fix
this by allocating a temporary buffer (which ENCIOC_SETSTRING already
did) and copying the result out to the user buffer after the CCB has
been processed.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24487
The handle_string callback for the ENCIOC_GET_ENCNAME and
ENCIOC_GETENCID ioctls tries to copy the size of the generated string
out to userland. However, the callback only has access to the kernel
copy of the structure populated by copyin(). The copyout() call
simply overwrites the value in the kernel's copy preventing the
subsequent overflow prevention logic from working.
Fix this by instead doing a copyout() of the updated length in the
caller after the callback returns.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24456
copyin/copyout are sufficient to guard against bad addresses. They will return
EFAULT if the user is up to no good (by choice or ignorance). There's no point
in checking, since it doesn't even improve the error messages.
Noticed by: jhb
Reviewed by: brooks, jhb
SES specifications allows the string to be NULL-terminated, while previous
code was considering it as invalid due to incorrectly ordered conditions.
MFC after: 1 week
Sponsored by: iXsystem, Inc.
Broadcom 9400-8i8e HBAs report virtual SES device, where slots representing
external connectors are reported having no phys. Since sasdev_phys is NULL
there and proto_hdr is a union, ses_paths_iter() misinterpreted them as ATA.
Add explicit protocol check to properly differentiate them.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Two arguments were reversed in calls to cam_strvis() in
nvme_da.c. This was found by a Coverity scan of this code within Dell
(Isilon). These are also marked in the FreeBSD Coverity scan as CIDs
1400526 & 1400531.
Submitted by: robert.herndon@dell.com
Reviewed by: vangyzen@, imp@
MFC after: 3 days
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D24117
generate for a race where a device goes away, we start to tear down
the periph state for the device, and then the device suddently
reappears. The key that makes it work is removal of periph from the
drv list before calling the deferred callback.
Hat tip to: mav@
Print the pointer to ccb so we can find it (for what good it does)
as well as the type of operation in flight when the cam_path has
been freed out from under us. This helps both core analysis as well
as automated systems that collect panic strings but little else.
declarations.
We typically don't use them elsewhere in the kernel, and they aren't
needed here: the actual functions are a few lines away and aren't
mutually recursive.
These are no longer needed now that it's embedded in cam_ccbq. They are also
unused.
Reviewed by: ken, chuck
Differential Revision: https://reviews.freebsd.org/D24008
It's used in exactly one place. In that place it's used so we can hold the lock
on the device associated with the path (since we do a xpt_path_lock and unlock
pair around the callback). Instead, inline taking and dropping the reference to
the device so we can ensure we can unlock the mutex after the callback finishes
if the path in the ccb that's queued to be processed by xpt_scanner_thread is
destroyed while being processed. We don't actually need the path itself for
anything other than dereferencing it to get the device to do the lock and
unlock.
This also makes the locking / use model for cam_path a little cleaner by
eliminating a case where we needlessly copy the object.
Reviewed by: chuck, chs, ken
Differential Revision: https://reviews.freebsd.org/D24008
These flags have been unused for some time. Some of them were in the
CAM2 specification, but CAM has moved on a bit from that. Some were
used in the old Pluto VideoSpace (and AirSpace) systems which had the
video playback I/O scheduler in userspace, but have been unused since
then.
Reviewed by: chuck, ken
Differential Revision: https://reviews.freebsd.org/D24008
of xpt_done(). Add the missing XPT_ASYNC case to xpt_action_default. xpt_async
wants to use the side-effect of the xpt_done() routine to queue this to the
camisr thread so it can be done in that context. However, this breaks the
symmetry that you create a ccb and call xpt_action() for it to be
dispatched. Restore that symmetry by having it go through that path. As far as I
can tell, this is the only CCB that we create and call xpt_done() on directly.
Consistently omit /* FALLTHROUGH */ when we have a case statement that does
nothing. Since compilers don't warn about stacked case statements, and we were
inconsistent, resolve by removing extras.