According to specs and common sense, all sense data reported in descriptor
format should be valid. But practice shows different, some devices return
descriptors with invalid data, resulting in error messages looking worse.
Decouple block/stream commands sense data and information field printing.
Looking on present specs, there are much more cases when those fields are
not related, and incomplete old code was not printing valid sense data and
leaving empty lines for invalid.
MFC after: 2 weeks
SDIO command CMD53 (IO_RW_EXTENDED) allows data transfers using blocks of 1-2048 bytes,
with a maximum of 511 blocks per request.
Extend mmc_data structure to properly describe such requests,
and initialize the new fields in kernel and userland consumers.
No actual driver changes happen yet, these will follow in the separate changes.
Reviewed by: bz
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D19779
Before this I suppose it was impossible load CAM-based NVMe as module.
Plus this appeared to be needed to build r345815 without NVMe driver.
MFC after: 2 weeks
There is some code duplication in error handling paths in a few functions.
Create a function for printing such errors in human-readable way and get rid
of duplicates.
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D15912
Using DFLTPHYS/MAXPHYS is not always OK, instead make it possible for the
controller driver to provide maximum data size to MMCCAM, and use it there.
The old stack already does this.
Reviewed by: manu
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D15892
CAM IOCTL interfaces traditionally mapped user-space data buffers to KVA.
It was nice originally, but now it takes too much to handle respective
TLB shootdowns, while small kernel memory allocations up to 64KB backed
by UMA and accompanied by copyin()/copyout() can be much cheaper.
For large buffers mapping still may have sense, and unmapped I/O would
be even better, but the last unfortunately is more tricky, since unmapped
I/O API is too specific to struct bio now.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
The 16GB, 32GB and 128GB versions of this product all have the same
problem. For some reason, the RC10 size is correct, while the RC16
size is larger (oddly by the capacity size / 1024 bytes). Using the
RC16 size results in illegal LBA range errors when geom tastes the
device. So, expand the quirk to cover all versions of this chip.
Ideally, we'd get both READ CAPACITY 10 and READ CAPACITY 16 sizes and
print a warnnig if they differ and use the smaller of the two numbers,
though that may be problematical as well. Furthermore, SBC-4
encourages users transition to RC16 only, which suggests that in the
future RC10 may disappear from some drives. It's unclear how to cope
with these drives generically.
PR: 234503
MFC After: 1 week
Since in most configurations CTL serves as network service, we found
that this change improves local system interactivity under heavy load.
Priority of main threads is set slightly higher then worker taskqueues
to make them quickly sort incoming requests not creating bottlenecks,
while plenty of worker taskqueues should be less sensitive to latency.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Replace long per-LUN queue of blocked commands, scanned on each command
completion and sometimes even twice, causing up to O(n^^2) processing cost,
by much shorter per-command blocked queues, scanned only when respective
command completes, and check only commands before the previous blocker,
reducing cost to O(n).
While there, unblock aborted commands to make them "complete" ASAP to be
removed from the OOA queue and so not waste time ordering other commands
against them. Aborted commands that were not sent to execution yet should
have no visible side effects, so this is safe and easy optimization now,
comparing to commands already in processing, which are a still pain.
Together those two optimizations should fix quite pathological case, when
due to backend slowness CTL accumulated many thousands of blocked requests,
partially aborted by initiator and so supposedly not even existing, but
still wasting CTL CPU time.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
- Collapse original_sc and serializing_sc fields into one, since they
are never used simultanously, we have only one local I/O and one remote.
- Move remote_sglist and local_sglist fields into CTL_PRIV_BACKEND,
since they are used only on Originating SC in XFER mode, where requests
don't ever reach backends, so we can reuse backend's private storage.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
It was not only disabled for quite a while, but also appeared to be broken
at r325517, when maximum number of ports was made configurable.
MFC after: 1 week
The panic message lead people to believe some userland CAM request had
caused a problem when in reallity it was for a kernel request (eg the
USER bit was cleared). Reword message. Also, improve a couple of
comments to reflect that the periph shouldn't be completely torn down
before we get here (so the path and sim pointers should be valid, but
aren't and the code is designed to be robust enough in the face of
that to give a specific panic message).
In `probedone()`, for the `PROBE_REPORT_LUNS` case, all paths that
fall to the bottom of the case set `lp` to `NULL`, so the test for a
non-NULL value of `lp` and call to `free()` if true is dead code as
the test can never be true. Fix by eliminating the whole if
statement. To guard against a possible future change that accidentally
violates this assumption, use a `KASSERT()` to catch if `lp` is
non-NULL.
Reviewed by: cem
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19109
Certain versions of Sandisk x400 firmware can hang under extremely
heavly load of large I/Os for prolonged periods of time. Newer /
current versions work fine, and should be used where possible. Where
not possible, this quirk ensures that I/O requests are limited to 128k
to avoids the bug, even under extreme load. Since MAXPHYS is 128k,
only users with custom kernels are at risk on the older firmware.
Once all known users of the older firmware have upgraded, this quirk
will be removed.
Sponsored by: Netflix, Inc.
Use recent best practices for Copyright form at the top of
the license:
1. Remove all the All Rights Reserved clauses on our stuff. Where we
piggybacked others, use a separate line to make things clear.
2. Use "Netflix, Inc." everywhere.
3. Use a single line for the copyright for grep friendliness.
4. Use date ranges in all places for our stuff.
Approved by: Netflix Legal (who gave me the form), adrian@ (pmc files)
r212160 tightened this from always using MSG_SIMPLE_Q_TAG to always
MSG_ORDERED_Q_TAG. Since it also marked all BIO_FLUSH requests with
BIO_ORDERED, this commit changes nothing immediately, but it returns
BIO_FLUSH callers ability to actually specify ordering they really
need, alike to other request types.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
- Add ADA_Q_NO_TRIM quirk to be used with the device that falsely advertise TRIM support
- Add ADA_Q_NO_TRIM entry for KingDian S200 SSD
PR: 222802
Submitted by: Bertrand Petit <bsdpr@phoe.frmug.org>
MFC after: 1 week
o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
pbufs are we going to have set.
In various subsystems that are going to utilize pbufs create private zones
via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
and sets a limit on created zone. After startup preallocate pbufs according
to requirements of all pbuf zones.
Subsystems that used to have a private limit with old allocator now have
private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
swap, vnode pager.
The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
aio(4). They should have their private limits, but changing that is out of
scope of this commit.
o Fetch tunable value of kern.nswbuf from init_param2() and while here move
NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
this option.
Default values aren't touched by this commit, but they probably should be
reviewed wrt to modern hardware.
This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.
Together with: gallatin
Tested by: pho
It has been reported that on some systems (with real hardware passed
through to a virtual machine) the WP detection causes USB disk probing
failures.
While here, also fix the selection of the next state in the case
of malloc failure in DA_STATE_PROBE_WP. It was DA_STATE_PROBE_RC
unconditionally even when it should have been DA_STATE_PROBE_RC16.
PR: 225794
Reported by: David Boyd <David.Boyd49@twc.com>
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D18496
In the nda(4) driver, only set DISKFLAG_CANDELETE (a.k.a. can support
BIO_DELETE) if the drive supports Dataset Management. There are reports
that without this check, VMWare Workstation does not work reliably.
Fix is to check the ONCS field in the NVMe Controller Data structure for
support. This check previously existed but did not survive the
big-endian changes.
Reported by: yuripv@yuripv.net
Reviewed by: imp, mav, jimharris
Approved by: imp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18493
Add the ability to set two goals for trims in the I/O scheduler. The
first goal is the number of BIO_DELETEs to accumulate
(kern.cam.XX.U.trim_goal). When non-zero, this many trims will be
accumulated before we start to transfer them to lower layers. This is
useful for devices that like to get lots of trims all at once in one
transaction (not all devices are like this, and some vary by workload).
The second is a number of ticks to defer trims. If you've set a trim
goal, then kern.cam.XX.U.trim_ticks controls how long the system will
defer those trims before timing out and sending them anyway. It has no
effect when trim_goal is 0.
In any event, a BIO_FLUSH will cause all the TRIMs to be released to
the periph drivers. This may be a minor overloading of what BIO_FLUSH
is supposed to mean, but it's useful to preserve other ordering
semantics that users of BIO_FLUSH reply on.
Sponsored by: Netflix, Inc
It's often useful to have a callback when an I/O takes more than a
threshold amount of time. This adds the infrastructure for periph
devices to register one.
One use-case is as a debugging aide when you need a semi-realtime
indication of an I/O outlier so you can trigger bus capture gear for
vendor analysis.
Sponsored by: Netflix, Inc
Many async events that we see are called for this specific path. When
calling an async callback for a targetted device, XTP will lock that
specific device's path lock (same as what cam_periph_lock does). For
those AC_ events, assert we have the lock rather than trying to
recusrively take it (which causes panics since it's not recursive).
Add annotations about this and about the fact that AC_SCSI_AEN events
are generated now only in the ata stack (which cannot have a scsi_da
attachment). Leave it in place in case I've overlooked something as
the code is harmless.
This is fallout from my attempts to "fix" locking for softc->flags in
r330796 that's not been triggered often enough to get my attention
until now.
Sponsored by: Netflix
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D17837
Add a counter for the LBAs, Ranges and hardware commands so that we
can provide additional color to the statistics we provide to vendors.
Sponsored by: Netflix, Inc
scsi_low was a common set of routines to do the SCSI bus sequencing
for the ncv, nsp and stg drivers. Those have been removed, so it's no
longer needed since nothing else in the tree uses it and nothing
likely ever will (it's for super-low-end 8-bit parallel SCSI cards).
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.
The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other members which require no translation.
Reviewed by: kib (prior version), jhb
Approved by: re (rgrimes)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Review: https://reviews.freebsd.org/D17378
In the probe case for SCSI SMR Host Aware or Most Managed drives, be sure
to free allocated memory.
sys/cam/scsi/scsi_da.c:
In dadone_probezone(), free the data pointer before returning.
MFC after: 3 days
Sponsored by: Spectra Logic
Approved by: re (kib)
of the fix/workaround for the "ctld hanging on reload" problem.
PR: 220175
Reported by: Eugene M. Zheganin <emz at norma.perm.ru>
Tested by: Eugene M. Zheganin <emz at norma.perm.ru>
Approved by: re (kib)
MFC after: 2 weeks
Sponsored by: playkey.net
Somehow this was working even after PTI in, at least on amd64, and got
broken by something only very recently.
Reviewed by: araujo
Approved by: re (gjb)
for the "ctld hanging on reload" problem observed in same cases under
high load. I'm not 100% sure it's _the_ fix, as the issue is rather hard
to reproduce, but it was tested as part of a larger path and the problem
disappeared. It certainly shouldn't break anything.
Now, technically, it shouldn't be needed. Quoting mav@, "After
ct->ct_online == 0 there should be no new sessions attached to the target.
And if you see some problems abbout it, it may either mean that there are
some races where single cfiscsi_session_terminate(cs) call may be lost,
or as a guess while this thread was sleeping target was reenabbled and
redisabled again". Should such race be discovered and properly fixed
in the future, than this and the followup two commits can be backed out.
PR: 220175
Reported by: Eugene M. Zheganin <emz at norma.perm.ru>
Tested by: Eugene M. Zheganin <emz at norma.perm.ru>
Discussed with: mav
Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: playkey.net
The original NVMe API used bit-fields to represent fields in data
structures defined by the specification (e.g. the op-code in the command
data structure). The implementation targeted x86_64 processors and
defined the bit fields for little endian dwords (i.e. 32 bits).
This approach does not work as-is for big endian architectures and was
changed to use a combination of bit shifts and masks to support PowerPC.
Unfortunately, this changed the NVMe API and forces #ifdef's based on
the OS revision level in user space code.
This change reverts to something that looks like the original API, but
it uses bytes instead of bit-fields inside the packed command structure.
As a bonus, this works as-is for both big and little endian CPU
architectures.
Bump __FreeBSD_version to 1200081 due to API change
Reviewed by: imp, kbowling, smh, mav
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D16404