- Remove layering violation, when NVMe SIM code accessed CAM internal
device structures to set pointers on controller and namespace data.
Instead make NVMe XPT probe fetch the data directly from hardware.
- Cleanup NVMe SIM code, fixing support for multiple namespaces per
controller (reporting them as LUNs) and adding controller detach support
and run-time namespace change notifications.
- Add initial support for namespace change async events. So far only
in CAM mode, but it allows run-time namespace arrival and departure.
- Add missing nvme_notify_fail_consumers() call on controller detach.
Together with previous changes this allows NVMe device detach/unplug.
Non-CAM mode still requires a lot of love to stay on par, but at least
CAM mode code should not stay in the way so much, becoming much more
self-sufficient.
Reviewed by: imp
MFC after: 1 month
Sponsored by: iXsystems, Inc.
We're dropping the periph lock then dropping the refcount. However,
that violates the locking protocol and is racy. This seems to be
the cause of weird occasional panics with a bogus assert.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15517
For MMC cards, add partitions found on the card as separate disk(9) devices.
Don't do anything with RPMB partition for now.
Lots of code is copied almost 1:1 from the mmcsd.c in the old stack,
credits Marius Strobl (marius@FreeBSD.org)
Reviewed by: marius
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D12762
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.) This extends
the method used in da to ada, nda, and mda.
Sponsored by: Netflix
Submitted by: Chuck Silvers
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.)
Sponsored by: Netflix
Submitted by: Chuck Silvers
Differential Revision: https://reviews.freebsd.org/D15435
- Remove unused or dead store variable
- Remove unused function ctl_copyin_alloc
- Add missing curly brackets, this seems a regression in r287720
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15383
ioctl frontend ports.
This revision introduces two changes to CTL:
- Changes the way options are passed to CTL_LUN_REQ and CTL_PORT_REQ ioctls.
Removes ctl_be_arg structure and associated logic and replaces it with
nv(3)-based logic for passing in and out arguments.
- Allows creating multiple ioctl frontend ports using either ctladm(8) or
ctld(8).
New frontend ports are represented by /dev/cam/ctl<pp>.<vp> nodes, eg /dev/cam/ctl5.3.
Those device nodes respond only to CTL_IO ioctl.
New command-line options for ctladm:
# creates new ioctl frontend port with using free pp and vp=0
ctladm port -c
# creates new ioctl frontend port with pp=10 and vp=0
ctladm port -c -O pp=10
# creates new ioctl frontend port with pp=11 and vp=12
ctladm port -c -O pp=11 -O vp=12
# removes port with number 4 (it's a "targ_port" number, not pp number)
ctladm port -r -p 4
New syntax for ctl.conf:
target ... {
port ioctl/<pp>
...
}
target ... {
port ioctl/<pp>/<vp>
...
Note: Most of this work was made by jceel@, thank you.
Submitted by: jceel
Reworked by: myself
Reviewed by: mav (earlier versions and recently during the rework)
Obtained from: FreeNAS and TrueOS
Relnotes: Yes
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D9299
a 1500 line switch statement. Callers now specify a discrete completion
handler, though they're still welcome to track state via ccb_state.
Sponsored by: Netflix
pointer. It's now unhelpful and misleading for callers to continue to set
it, so bring all callers into conformance. There's no real functional change,
but it makes reading the code a lot less confusing.
Sponsored by: Netflix
When the tape position is inside the Early Warning area, the tape
drive will return a sense key of NO SENSE, and an ASC/ASCQ of
0x00,0x02, which means: End-of-partition/medium detected". If
this was in response to a control command like WRITE FILEMARKS,
we correctly translate this as informational status and return
0 from saerror().
Programmable Early Warning should be handled the same way, but
we weren't handling it that way. As a result, if a PEW status
(sense key of NO SENSE, ASC/ASCQ of 0x00,0x07, "Programmable early
warning detected") came back in response to a WRITE FILEMARKS,
we returned an error.
The impact of this was that if an application was writing to a
sa(4) device, and a PEW area was set (in the Device Configuration
Extension subpage -- mode page 0x10, subpage 1), and a filemark
needed to be written on close, we could wind up returning an error
to the user on close because of a "failure" to write the filemarks.
It actually isn't a failure, but rather just a status report from
the drive, and shouldn't be treated as a failure.
sys/cam/scsi/scsi_sa.c:
For control commands in saerror(), treat asc/ascq 0x00,0x07
the same as 0x00,{0-5} -- not an error. Return 0, since
the command actually did succeed.
Reported by: Dr. Andreas Haakh <andreas@haakh.de>
Tested by: Dr. Andreas Haakh <andreas@haakh.de>
Sponsored by: Spectra Logic
MFC after: 3 days
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
have one pending. Otherwise, we can race and send two, which is
wasteful in close proximity. It can also cause the acaquire/release
count for TUR to be > 1, which is undexpected.
PR: 226510
Differential Review: https://reviews.freebsd.org/D14792
do this right, except when there's no BP and we do a TUR by request.
In that case, we clear the flag, but don't release the reference,
leaking the reference on rare occasion.
PR: 226510
Sponsored by: Netflix
<machine/stdarg.h> is a kernel-only header. The standard header for
userland is <stdarg.h>. Using the standard header in userland avoids
weird build errors when building with external compilers that include
their own stdarg.h header.
Reviewed by: arichardson, brooks, imp
Sponsored by: DARPA / AFRL
Differential Revision: https://reviews.freebsd.org/D14776
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.
Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.
Sponsored by: Netflix
The ch(4) driver was missed in change 328918, which changed
cam_periph_acquire() to return an errno instead of cam_status.
As a result, ch(4) failed to attach.
Sponsored by: Spectra Logic
softc for an async unit attention. CAM locks, sometimes, the periph
lock and other times does not. We were taking the lock always and
running into lock recursion issues on a non-recursive lock. Now we
take it selectively. It's not clear why xpt takes the lock selectively
before calling us, though, and that's still under investigation.
Reported by: avg
PR: 226510 (same panic, differnt circumstances)
Sponsored by: Netflix
The crash scenario goes like this: there's a thread waiting on "reinstate";
because it doesn't update the timeout counter it gets terminated by the
callout; at this point the maintenance thread starts the termination routine.
The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops
the refcount, which allows the maintenance thread to free its resources. At
this point another thread receives a PDU. Boom.
PR: 222898, 219866
Reported by: Eugene M. Zheganin <emz at norma.perm.ru>
Tested by: Eugene M. Zheganin <emz at norma.perm.ru>
Reviewed by: mav@ (earlier version)
MFC after: 2 weeks
Sponsored by: playkey.net
kern.cam.{,a,n}da.X.invalidate=1 forces *daX to detach by calling
cam_periph_invalidate on the underlying periph. This is for testing
purposes only. Include only with options CAM_TEST_FAILURE and rename
the former [AN]DA_TEST_FAILURE, and fix nda to compile with it set.
We're using it at work to harden geom and the buffer cache to be
resilient in the face of drive failure. Today, it far too often
results in a panic. While much work was done on SIM initiated removal
for the USB thumnb drive removal work, little has been done for periph
initiated removal. This simulates what *daerror() does for some errors
nicely: we get the same panics with it that we do with failing drives.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14581
When multiple trims are in the queue, collapse them as much as
possible. At present, this usually results in only a few trims being
collapsed together, but more work on that will make it possible to do
hundreds (up to some configurable max).
Sponsored by: Netflix
When the ccb is NULL to cam_iosched_bio_complete, just update the
other statistics, but not the time. If many operations are collapsed
together, this is needed to keep stats properly for the grouped bp.
This should fix trim accounting.
Sponsored by: Netflix
The ioctl objects contain pointers and require translation and some
refactoring of the infrastructure to work. For now prevent opertion
on garbage values. This is very slightly overbroad in that ENCIOC_INIT
is safe.
Reviewed by: imp, kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14671
These take a union ccb argument which is full of kernel pointers.
Substantial translation efforts would be required to make this work.
By rejecting the request we avoid processing or returning entierly
wrong data.
Reviewed by: imp, ken, markj, cem
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14654
Make sure the periph lock is held around rmw access to softc data,
espeically flags, including work flags in iosched.
Add asserts for the periph lock where it should be held.
PR: 226510
Sponsored by: Netflix
Differential Review: https://reviews.freebsd.org/D14456
Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.
NVMe is now working on powerpc64 host.
Submitted by: Michal Stanek <mst@semihalf.com>
Obtained from: Semihalf
Reviewed by: imp, wma
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916
Now that we're queueing BIO_DELETE requests in the CAM I/O scheduler,
it make sense to try to combine as many as possible into a single
request to send down to hardware. Hopefully, lots of larger requests
like this are better than lots of individual transactions.
Note for future: need to limit based on total size of the trim
request. Should also collapse adjacent ranges where possible to
increase the size of the max payload.
Sponsored by: Netflix
Introduce flags word to describe the capacities of the peripheral.
First bit will describe if the periph driver allows multiple
outstanding TRIMS to be active in a device.
Modify the I/O scheduler so that the nda driver can queue trims
for a while after the first one arrives. We'll queue until we see
a I/O scheduler tick, then we'll schedule as many TRIMs as allowed
by other factors (currently this is slocts in the NVMe controller).
This mariginally helps the read latency issues we see with reads,
but sets the stage for the nda driver to do TRIM collapsing like the
da and ada drivers do today.
Sponsored by: Netflix
To help implement a policy of 'queue all trims until next I/O sched
tick' policy to help coalesce them, note when we tick so we can do
something special on the first call after the tick to get more work.
Sponsored by: Netflix
While the code for ada and da both assume that the trim list is
ordered when doing the coaleascing the TRIMs, it turns out that
creating the sorted list uses more resources than are saved by having
slightly fewer trims sent to the device.
Sponsored by: Netflix