Commit Graph

302 Commits

Author SHA1 Message Date
Alexander Motin
90bcc81bc3 Delay GEOM disk_create() until CAM periph probe completes.
Before this patch CAM periph drivers called both disk_alloc() and
disk_create() same time on periph creation.  But then prevented disks
from opening until the periph probe completion with cam_periph_hold().
As result, especially if disk misbehaves during the probe, GEOM event
thread, triggered to taste the disk, got blocked on open attempt,
potentially for a long time, unable to process other events.

This patch moves disk_create() call from periph creation to the end of
the probe. To allow disk_create() calls from non-sleepable CAM contexts
some of its duties requiring memory allocations are moved either back
to disk_alloc() or forward to g_disk_create(), so now disk_alloc() and
disk_add_alias() are the only disk methods that require sleeping.  If
disk fails during the probe disk_create() may just be skipped, going
directly to disk_destroy().  Other method calls during that time are
just ignored.  Since GEOM may now see the disks after CAM bus scan is
already completed, introduce per-periph boot hold functions. Enclosure
driver already had such mechanism, so just generalize it.

Reviewed by:	imp
MFC after:	1 month
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D35784
2022-07-14 16:17:36 -04:00
Mitchell Horne
489ba22236 kerneldump: remove physical argument from d_dumper
The physical address argument is essentially ignored by every dumper
method. In addition, the dump routines don't actually pass a real
address; every call to dump_append() passes a value of zero for
physical.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D35173
2022-05-13 10:42:48 -03:00
Warner Losh
1907e1c07c ada: Move comment
Move the comment about releasing ccb before periph to adaprobedone()
where it belongs.

Sponsored by:		Netflix
2022-05-04 16:54:38 -06:00
Warner Losh
6c8ab086fe ada: Retry commands with retries left on CAM_SEL_TIMEOUT
The AHCI and ATA SIMs will return CAM_SEL_TIMEOUT when an underlying
device has stopped responding. This is usually seen after a timeouted
out command and can be a transient event. Rather than fail the
peripheral immediately after seeing this, queue a retry. For transient
events, this allows drives to continue to provide data, though with some
added latency, just like we do when we have some other kind of retriable
error. If the error isn't transient (the drive is truly gone), then
we'll discover that eventually and fail the transaction and invalidate
the drive like we do today.

This helps us avoid a panic at the end of camperiphfree when
CAM_PERIPH_NEW_DEV_FOUND is set. However, the deferred callback should
be queued to xpt_async_td instead of being made inline there. This issue
will be solved in a different patch that does that. PR 263703.

This also helps us avoid another bug where we can drop all references to
the device (causing us to go through camperiphfree and destroy the path)
while we have an I/O pending in the ata_da state machine (usually in
state ADA_STATE_RAHEAD with ATA_SETFEATURES ATA_SF_ENAB_RCACHE
command). It's not clear why the reference that we take out to do the
reprobe isn't effective at blocking this. By retrying this condition,
though we avoid this bug (at least more often, I don't have a good
reproduction test case, I just see this panic a few times a month at
work on systems that have transient disk errors on ahci connected SATA
SSDs). PR 263704. It's too soon to know how much this helps us avoid
this bug.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D34977
2022-05-01 11:08:56 -06:00
Warner Losh
ae1955cd67 adaasync: Harmonize with daasync
We should call cam_periph_async() always, like SCSI does. This routine
is supposed to be more of a catch-all.

cam_periph_async() only does actions for AC_LOST_DEVICE. It ignores all
other events (today), but this may not always be true. So this is a nop
change.

Drop in a 'break' so we don't fall through unnecessarily.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D35057
2022-04-26 11:01:39 -06:00
Warner Losh
ccaec73d0b ada: Eliminate dead code
We never use the cgd that we get from the XPT_GDEV_TYPE call. Prior to
9a6844d55f we used it to determine if READ AHEAD or WRITE CACHING was
supported. However, all that information was moved into adasetflags so
we no longer need to this since it's cached in the softc and updated
with the IDENTIFY data changes automatically.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D35039
2022-04-25 12:55:04 -06:00
Warner Losh
9d899bbcb7 cam: Small reorg of ata xpt async code
Use a switch rather than a nested if to simplify the async event
processing code. No functional changes intended.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D35038
2022-04-25 12:55:04 -06:00
Warner Losh
b43cfe7171 ada/da: Borrow comment from nda about cleanup
Remove a XXX comment and replace it with a more accurate comment about
what happens to I/O queued to the hardware.

Sponsored by:		Netflix
2022-04-24 15:11:56 -06:00
Warner Losh
48ae3f4f64 ata/nvme: Add comment
Steal the comment from daonninvalidate about the call to disk_gone().

Sponsored by:		Netflix
2022-04-24 15:11:52 -06:00
Alexander Motin
38f8addaab CAM: Replicate e0ceec676d from da to ada and nda.
MFC after:	1 week
2022-04-23 20:15:17 -04:00
Gordon Bergling
49dace1d46 cam: Fix typos in source code comments
- s/paniced/panicked/

MFC after:	3 days
2022-04-02 10:13:35 +02:00
Warner Losh
4762aa57ad ata_xpt: Rename probe_softc to aprobe_softc
Since both scsi_xpt and ata_xpt use the same name for the softc, this
can lead to problems in gdb. Avoid the issue by renaming the ata
probe_softc to aprobe_softc as has been done for the aprobe in
0f280cbd0a. This was overlooked at the time.

Sponsored by:		Netflix
MFC After:		2 weeks
2022-01-14 17:21:09 -07:00
Andriy Gapon
75bc7150f4 add and use defintions for ATA power modes
Those can be returned by CHECK POWER MODE command (0xe5).
Note that some of the definitions duplicate definitions for Extended
Power Conditions.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33646
2022-01-11 15:41:38 +02:00
Alexander Motin
0e5c50bf60 cam: Relax callouts precisions.
On large systems even relatively rare callouts may fire many times
per second.  This should allow them to aggregate better, since we do
not require any precision when polling for media change, etc.

MFC after:	2 weeks
2022-01-07 12:59:16 -05:00
Andriy Gapon
15910dc0bc adaspindown: check disk power mode before sending IDLE command
If a disk is already in STANDBY mode, then setting IDLE mode can
actually spin it up.

Reviewed by:	mav
MFC after:	4 weeks
Differential Revision:	https://reviews.freebsd.org/D33588
2021-12-24 11:02:22 +02:00
Jessica Clarke
f350bc1dd3 ada: Fix intra-object buffer overread of identify strings
In the ATA/ATAPI spec these are space-padded fixed-length strings with
no NUL-terminator (and byte swapped). When performing the identify we
call ata_param_fixup to swap the bytes back to be in order, strip any
leading/trailing spaces and coalesce consecutive spaces, padding with
NULs. However, if the input has no padding spaces, the fixed-up strings
are still not NUL-terminated. This causes two issues. The first is that
strlcpy will truncate the string by replacing the final byte with a NUL.
The second is that strlcpy will keep reading src until it finds a NUL in
order to calculate the return value, which is defined as the length of
src (so that callers can then compare it with the dsize input to see if
the input string was truncated), thereby reading past the end of the
buffer and into whatever adjacent fields are in the structure. In
practice there's a NUL byte somewhere in the structure, but on CHERI
with subobject bounds enabled in the compiler this overread will be
detected and trap as a bounds violation.

Note this matches ata_xpt's aprobedone, which does a bcopy to a
malloc'ed buffer and manually NUL-terminates it for the CAM path's
device's serial_num.

Found by:	CHERI
Reviewed by:	imp, scottl
Differential Revision:	https://reviews.freebsd.org/D32567
2021-10-27 18:38:37 +01:00
Alexander Motin
303477d325 cam(4): Mark all sysctls as CTLFLAG_MPSAFE.
This code does not use Giant lock for very long time.

MFC after:	2 weeks
2021-08-10 20:07:19 -04:00
Edward Tomasz Napierala
b0cf8194c2 cam: revert half of 75b5caa08e
This turns debugging printf() into a KASSERT().
It's for ATA for now; SCSI will came later.

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31380
2021-08-08 13:24:19 +00:00
Young Xiao
431ddd9436 Fix potential NULL pointer dereference of device physical path
In ata_dev_advinfo() and nvme_dev_advinfo(), if the physical path is
being stored and there is a malloc failure (malloc(9) is called with
M_NOWAIT), we could wind up in a situation where the device's
physpath_len is set to the length the user provided, but the physpath
itself is NULL.

If another context then comes in to fetch the physical path value, we
would wind up trying to memcpy a NULL pointer into the caller's buffer.

So, set the physpath_len to 0 when we free the physpath on entry into
the store case for the physical path.  Reset the length to a non-zero
value only after we've successfully malloced a buffer to hold it.

This code mirrors scsi_xpt.c does already as well.

Signed-off-by:	Young Xiao <92siuyang@gmail.com>
Reviewed by:	imp
PR:		238014
2021-07-13 14:13:21 -06:00
Edward Tomasz Napierala
6f147a0734 cam: enable kern.cam.ada.enable_uma_ccbs by default
This makes the ada(4) driver use UMA for its CCBs.  While it's
da(4) counterpart needs some more testing, this one seems to be
safe now.

Please let me know via email if you notice any suspicious kernel
messages,

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D30567
2021-07-07 09:40:34 +01:00
Edward Tomasz Napierala
75b5caa08e cam: turn KASSERTs into printfs for now
It looks like I've missed a couple of places where we don't clear
stack-allocated CCBs.  Don't panic when that happens, just print
a warning.

This is a temporary measure until I get those cases fixed.

Reviewed By:	markj
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D30296
2021-05-16 20:19:19 +01:00
Edward Tomasz Napierala
0f206cc912 cam: add missing zeroing of a stack-allocated CCB.
This could cause a panic at boot.

Reported By:	Shawn Webb <shawn.webb AT hardenedbsd.org>
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
2021-05-16 11:38:26 +01:00
Edward Tomasz Napierala
3394d4239b cam: allocate CCBs from UMA for SCSI and ATA IO
This patch makes it possible for CAM to use small CCBs allocated
from an periph-specific UMA zone instead of the usual, huge ones.
The end result is that CCBs issued via da(4) take 544B (size of
ccb_scsiio) instead of the usual 2kB (size of 'union ccb', ~1.5kB,
rounded up by malloc(9)).  For ATA it's 272B.  We waste less
memory, we avoid zeroing the unused 1kB, and it should be easier
to allocate those CCBs in low memory conditions.  It should also
be possible to use uma_zone_reserve(9) to improve behaviour
in low memory conditions even further.

Note that this does not change the size, or the layout, of CCBs
as such.  CCBs get allocated in various different ways, in particular
on the stack, and I don't want to redo all that.  Instead, this
provides an opt-in mechanism for the periph to declare "my start()
callback is fine with receiving a CCB allocated from this UMA zone".
In other words, most of the code works exactly as it used to; the
change only happens to IOs issued by xpt_run_allockq(), which
is - conveniently - pretty much all that matters for performance.

The reason for doing it this way is that it's pretty small, localized
change, and can be implemented gradually and iteratively: take a
periph, make sure its start() callback only casts the CCBs it takes
to a particular type of CCB, for example ccb_scsiio, and that it only
casts CCBs returned by cam_periph_getccb() to that type, then add UMA
zone for that size, and declare it safe to XPT.

This is disabled by default.  Set 'kern.cam.ada.enable_uma_ccbs=1'
and 'kern.cam.da.enable_uma_ccbs=1' tunables to enable it.  Testing
is welcome; I will flip the default to enable in two weeks from now.

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D28674
2021-05-15 12:03:49 +01:00
Edward Tomasz Napierala
ec5325dbca cam: make sure to clear even more CCBs allocated on the stack
This is my second pass, this time over all of CAM except
for the SCSI target bits.  There should be no functional
changes.

Reviewed By:	imp
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29549
2021-04-11 15:24:22 +01:00
John Baldwin
e07ac3f2fd cam: Don't permit crashdumps on non-pollable devices.
If a disk's SIM doesn't support polling, then it can't be used to
store crashdumps.  Leave d_dump NULL in that case so that dumpon(8)
fails gracefully rather than having dumps fail at crash time.

Reviewed by:	scottl, mav, imp
MFC after:	2 weeks
Sponsored by:	Chelsio
Differential Revision:	https://reviews.freebsd.org/D28454
2021-02-11 13:52:18 -08:00
Marius Strobl
eae35125e9 ada(4): remove remainder of MD geometry translation support
This was missed in 9cf738228d and
r359718 respectively.
2020-12-25 20:20:54 +01:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Alexander Motin
cd500da924 Fix sbuf_finish() error code check in user-space.
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2020-10-13 23:29:06 +00:00
Mateusz Guzik
27dcd3d90b cam: clean up empty lines in .c and .h files 2020-09-01 22:13:48 +00:00
Adrian Chadd
69c41b7071 [ata_da] remove duplicate definition; it trips up ye olde gcc-6 on mips32
Checked first with: irc
2020-05-27 02:10:09 +00:00
Conrad Meyer
9982b3ee29 cam: ANSIfy 0-argument function definitions
No functional change.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D24854
2020-05-16 14:33:08 +00:00
Warner Losh
0f280cbd0a Make the ata probe* and xpt* routines aprobe* and axpt* respectively.
Often, in traiging core files, one only has a traceback of where a
panic occurred. We have probe* and xpt* routines that live in both the
scsi and ata layers with identical names. To make one or the other
stand out, prefix all the probe and xpt routines in ata with an
'a'. I've left the scsi ones alone since they were there first and are
more numerous. I also rejected using #define to do this as being too
confusing. I chose this method because the CAM name for the probe
device was already 'aprobe'.

Normally, this doesn't matter because file scope protects one from
interfering with the other. However, due to the indirect nature of
CAM's state machine, you don't know if the following traceback is
SCSI or ATA:
	xpt_done
	probedone
	xpt_done_process
	xpt_done_td
	fork_exit

nvme and mmc already have unique names.

MFC: 1 week
Differential revision: https://reviews.freebsd.org/D24825
2020-05-13 00:18:44 +00:00
Warner Losh
96eb32bf0f Convert rotating to a flag bit.
Move rotating to a flag bit. Add bit definitions for it. Create a
compat sysctl for it.
2020-04-27 23:43:12 +00:00
Warner Losh
cf3ff63e55 Convert unmappedio over to a flag.
Make unmappedio a flag. Move it to the flags definition. Add compat
sysctl for it.
2020-04-27 23:43:08 +00:00
Warner Losh
aeab0812e6 Add flags sysctl to ada
Report the ada device flags like we do the da devices. No booleans
have (yet) been converted, but iomapped and rotating are planned.
2020-04-27 23:43:04 +00:00
Warner Losh
9cf738228d Now that we don't have special-case geom hacking defined in md_var.h, stop
including it. sparc64 was the last straggler here, but these weren't removed at
the time.
2020-04-07 22:23:22 +00:00
Scott Long
ecca2aa545 Add a quirk for the WDC Green series of SSDs to disable NCQ TRIM, as this
avoids silent data corruption.

PR:		225666
Submitted by:	anders lundgren
MFC after:	3 days
2020-02-27 05:00:21 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Scott Long
1353215314 Add rudamentary support for UFS to probe whether a block device supports the
BIO_SPEEDUP command.  Add complimentary support to the CAM periphs that
support it.  This is a redo of r357710.
2020-02-16 23:10:59 +00:00
Scott Long
85eb41f751 Revert r357710 and 357711 until they can be debugged 2020-02-10 14:27:28 +00:00
Scott Long
7d99bda79e Add rudamentary support for UFS to probe whether a block device supports the
BIO_SPEEDUP command.  Add complimentary support to the CAM periphs that
support it.
2020-02-10 00:23:20 +00:00
Scott Long
d176b8039e Ever since the block layer expanded its command syntax beyond just
BIO_READ and BIO_WRITE, we've handled this expanded syntax poorly in
drivers when the driver doesn't support a particular command.  Do a
sweep and fix that.

Reported by:	imp
2020-02-07 09:22:08 +00:00
Warner Losh
83b75bb3cc Revert r355813
It was extracted from a larger tree and is incomplete. Will resubmit after
reworking.
2019-12-16 19:16:26 +00:00
Warner Losh
68e1c49a96 Implement a system-wide limit or da and ada devices for delete.
Excesively large TRIMs can result in timeouts, which cause big
problems. Limit trims to 1GB to mititgate these issues.

Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D22809
2019-12-16 18:16:44 +00:00
John Baldwin
5773ac113c Use callout_func_t instead of the deprecated timeout_t.
Reviewed by:	kib, imp
Differential Revision:	https://reviews.freebsd.org/D22752
2019-12-10 22:06:53 +00:00
Edward Tomasz Napierala
b5961be1ab Add GEOM attribute to report physical device name, and report it
via 'diskinfo -v'.  This avoids the need to track it down via CAM,
and should also work for disks that don't use CAM.  And since it's
inherited thru the GEOM hierarchy, in most cases one doesn't need
to walk the GEOM graph either, eg you can use it on a partition
instead of disk itself.

Reviewed by:	allanjude, imp
Sponsored by:	Klara Inc
Differential Revision:	https://reviews.freebsd.org/D22249
2019-11-09 17:30:19 +00:00
Alexander Motin
6a216c0bb5 Take proper lock in ses_setphyspath_callback().
XPT_DEV_ADVINFO call should be protected by the lock of the specific
device it is addressed to, not the lock of SES device.  In some weird
case, probably with hardware violating standards, it sometimes caused
NULL dereference due to race.

To protect from it further, add lock assertion to *_dev_advinfo().

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-08-29 17:02:02 +00:00
Alexander Motin
c15a591cbd Make camcontrol sanitize support also ATA devices.
ATA sanitize is functionally identical to SCSI, just uses different
initiation commands and status reporting mechanism.

While there, make kernel better handle sanitize commands and statuses.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-07-25 18:48:31 +00:00
Alexander Motin
76d843dab2 Make CAM ATA stack handle disk resizes.
While for ATA disks resize is even more rare situation than for SCSI, it
may happen in case of HPA or AMA being used.  Make ATA XPT report minor
IDENTIFY DATA change to upper layers with AC_GETDEV_CHANGED, and ada(4)
periph driver handle that event, recalculating all the disk properties and
signalling resize to GEOM.  Since ATA has no mechanism of UNIT ATTENTIONs,
like SCSI, it has no way to detect that something has changed.  That is why
this functionality depends on explicit reprobe via XPT_REPROBE_LUN call.

MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
2019-07-23 02:11:14 +00:00
Brooks Davis
c7bacdcc32 ata_xpt: Use the correct union member when accessing valid.
In principle this should not matter as it's a union and they point to
the same memory location but based on the code above we should be
accessing .sata and not .ata.

Submitted by:	arichardson
Reviewed by:	scottl, imp
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D21002
2019-07-22 21:07:58 +00:00