generate for a race where a device goes away, we start to tear down
the periph state for the device, and then the device suddently
reappears. The key that makes it work is removal of periph from the
drv list before calling the deferred callback.
Hat tip to: mav@
Print the pointer to ccb so we can find it (for what good it does)
as well as the type of operation in flight when the cam_path has
been freed out from under us. This helps both core analysis as well
as automated systems that collect panic strings but little else.
declarations.
We typically don't use them elsewhere in the kernel, and they aren't
needed here: the actual functions are a few lines away and aren't
mutually recursive.
These are no longer needed now that it's embedded in cam_ccbq. They are also
unused.
Reviewed by: ken, chuck
Differential Revision: https://reviews.freebsd.org/D24008
It's used in exactly one place. In that place it's used so we can hold the lock
on the device associated with the path (since we do a xpt_path_lock and unlock
pair around the callback). Instead, inline taking and dropping the reference to
the device so we can ensure we can unlock the mutex after the callback finishes
if the path in the ccb that's queued to be processed by xpt_scanner_thread is
destroyed while being processed. We don't actually need the path itself for
anything other than dereferencing it to get the device to do the lock and
unlock.
This also makes the locking / use model for cam_path a little cleaner by
eliminating a case where we needlessly copy the object.
Reviewed by: chuck, chs, ken
Differential Revision: https://reviews.freebsd.org/D24008
These flags have been unused for some time. Some of them were in the
CAM2 specification, but CAM has moved on a bit from that. Some were
used in the old Pluto VideoSpace (and AirSpace) systems which had the
video playback I/O scheduler in userspace, but have been unused since
then.
Reviewed by: chuck, ken
Differential Revision: https://reviews.freebsd.org/D24008
of xpt_done(). Add the missing XPT_ASYNC case to xpt_action_default. xpt_async
wants to use the side-effect of the xpt_done() routine to queue this to the
camisr thread so it can be done in that context. However, this breaks the
symmetry that you create a ccb and call xpt_action() for it to be
dispatched. Restore that symmetry by having it go through that path. As far as I
can tell, this is the only CCB that we create and call xpt_done() on directly.
Consistently omit /* FALLTHROUGH */ when we have a case statement that does
nothing. Since compilers don't warn about stacked case statements, and we were
inconsistent, resolve by removing extras.
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
handler to accept a poitner to a u_int. To make the type of the softc flags
stable and defined, make it a u_int. Cast the enum types to u_int for arg2 so
when passing to dabitsysctl it's a u_int.
Noticed by: emax@
Differential Revision: https://reviews.freebsd.org/D23785
It's valid for a periph to be removed with outstanding transactions on the
device. In CAM, multiple periphs attach to a single device. There's no interlock
to prevent one of these going away while other periphs have outstanding CCBs and
it's not an error either. Remove this overly agressive KASSERT to prevent
false-positive panics when devices depart.
know that if there are any outstanding CCBs, then when they dereference the path
that's freed at the bottom of camperiphfree there will be some flavor of
panic. This moves that eventual panic to a traceback of when we free the last
reference on the device, which is earlier but may not be early enough.
Rotating and unmapped_io are really da flags. Convert them to a flag so it will
be reported with the other flags for the device. Deprecate the .rotating and
.unmapped_io sysctls in FreeBSD 14 and remove the softc ints.
Differential Revision: https://reviews.freebsd.org/D23417
Export the current flags. They can be useful to other programs wanting to do
special thigns for removable or similar devices.
Differential Revision: https://reviews.freebsd.org/D23417
BIO_READ and BIO_WRITE, we've handled this expanded syntax poorly in
drivers when the driver doesn't support a particular command. Do a
sweep and fix that.
Reported by: imp
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427
Combined with earlier nstart/nend removal it allows to remove several locks
from request path of GEOM and few other places. It would be cool if we had
more SMP-friendly statistics, but this helps too.
Sponsored by: iXsystems, Inc.
Since we are already using malloc()+copyin()/copyout() for smaller data
blocks, and since new asynchronous API does it always, I see no reason
to keep this ugly artificial size/alignment limitation in old API.
Tape applications suffer enough from the MAXPHYS limitations by itself,
and additional alignment requirement, often halving effectively usable
block size, does not help.
It would be good to use unmapped I/O here instead, but it require some
HBA drivers polishing first to support non-BIO unmapped buffers.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
While it works on nda, it fails on ada and/or da for at least zfs with a modify
after free issue on a trim BIO. Revert while I rework it to fix those devices.
React to the BIO_SPEED command in the cam io scheduler by completing
as successful BIO_DELETE commands that are pending, up to the length
passed down in the BIO_SPEEDUP cmomand. The length passed down is a
hint for how much space on the drive needs to be recovered. By
completing the BIO_DELETE comomands, this allows the upper layers to
allocate and write to the blocks that were about to be trimmed. Since
FreeBSD implements TRIMSs as advisory, we can eliminliminate them and
go directly to writing.
The biggest benefit from TRIMS coomes ffrom the drive being able t
ooptimize its free block pool inthe log run. There's little nto no
bene3efit in the shoort term. , sepeciall whn the trim is followed by
a write. Speedup lets us make this tradeoff.
Reviewed by: kirk, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D18351
Rather than a trim active flag, have a counter that can be used to
have a absolute limit on the number of trims in flight independent of
any I/O limiting factors.
Sponsored by: Netflix
For each of the different queue types, list the name of the
queue. While it can be worked out from context, this makes it more
useful and clearer.
Sponsored by: Netflix
Add rate limiters to trims. Trims are a bit different than reads or
writes in that they can be combined, so some care needs to be taken
where we rate limit them. Additional work will be needed to push the
working rate limit below the I/O quanta rate for things like IOPS.
Sponsored by: Netflix
Add two sysctls to control pacing of nvme
trims. kern.cam.nda.X.goal_trim is the number of upper layer
BIO_DEELETE requests to try to collecet before sending TRIM down too
the nvme drive. trim_ticks is the number of ticks, at mosot, to wait
for at least goal_trim BIOS_DELEETE requests to come in.
Trim pacing is useful when a large number off disjoint trims are
comoing in from the upper layers. Since we have no way to chain
toogether trims from the upper layers that are sent down, this acts as
a hueristic to group trims into reasonable sized chunks. What's
reasonable varies from drive to drive.
Sponsored by: Netflix
Excesively large TRIMs can result in timeouts, which cause big
problems. Limit trims to 1GB to mititgate these issues.
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D22809
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.
v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.
Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715
The SES4r3 standard requires that element descriptors may only contain ASCII
characters in the range 0x20 to 0x7e. Some SuperMicro expanders violate
that rule. This patch adds a sanity check to ses(4). Descriptors in
violation will be replaced by "<invalid>".
This patch fixes "sesutil --libxo xml" on such systems. Previously it would
generate non-well-formed XML output.
PR: 241929
Reviewed by: allanjude
MFC after: 2 weeks
Sponsored by: Axcient
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
All Rights Reserved on same line as other copyright holders (but not
me). Other such holders are also listed last where it's clear.
My changes in 351599 (kindly committed by avg) made the cd(4) media check
asynchronous to avoid a sleep while holding a mutex.
There was a difficult to reproduce bug with those changes that caused a
hang on boot on some single processor machines/VMs. Leandro Lupori
managed to reproduce the bug, diagnose it, and supplied a patch! Here is
his analysis, from the PR:
======
I was able to reproduce the problem described in comment#14.
Actually, I wasn't trying to reproduce it, I just started seeing it a few
weeks ago, in CURRENT.
I can reproduce it consistently, by using QEMU to run a PowerPC64 VM with a
single core/thread (-smp 1).
It happens only when there is no media in the emulated CD-ROM, a device
that QEMU adds by default, unless -nodefaults is specified in command line.
I've debugged it and this is what I've found:
1- After the CD probe is successful, GEOM will try to open the device,
which will end up calling cdcheckmedia(), that sets CD state to
CD_STATE_MEDIA_PREVENT.
2- Next, scsi_prevent() is executed and succeeds, the CD_FLAG_DISC_LOCKED
flag is set and CD state moves to CD_STATE_MEDIA_SIZE.
3- Next, scsi_read_capacity() is executed and fails, state is set to
CD_STATE_MEDIA_ALLOW, cdmediaprobedone() is called and wakes up
cdcheckmedia().
4- Then, when cdstart() is invoked to process CD_STATE_MEDIA_ALLOW, it
first checks if CD_FLAG_DISC_LOCKED is set, and if so skips directly to
CD_STATE_MEDIA_SIZE state. This will repeat the steps of bullet 3, entering
an infinite MEDIA_SIZE command loop.
When there is a least another core/thread, the GEOM thread that performed
the initial cdopen() will get scheduled again, closing the CD device, that
will call cdprevent(PR_ALLOW) that clears the CD_FLAG_DISC_LOCKED flag and
breaks the loop.
So, apparently, the problem is CD_STATE_MEDIA_ALLOW being skipped when
CD_FLAG_DISC_LOCKED is set. If I understand correctly, in this case, the
state should be advanced to CD_STATE_MEDIA size only when the current state
is CD_STATE_MEDIA_PREVENT.
=====
PR: kern/219857
Submitted by: Leandro Lupori <leandro.lupori@gmail.com>
MFC after: 1 week