Element Descriptor page if it is not supported. This removes one error
message from verbose logs during boot on systems with some enclosures.
Sponsored by: iXsystems, Inc.
safe in some cases to reduce CCB priority after it was scheduled with high
priority. This fixes reproducible deadlock when command sent through the
pass interface while ATA XPT recovers from command timeout.
Instead of that enforce priority at passioctl(). libcam provides no obvious
interface to specify CCB priority and so much (all?) code specifies zero
(highest) priority. This change limits pass CCBs priority to NORMAL run
level, allowing XPT to complete bus and device recovery after reset before
running any payload.
without holding SIM lock. It really doesn't need that lock, but adding it
removes that specific exception, allowing to assert locking there later.
Submitted by: ken@ (earlier version)
returns zero while request status is not CAM_REQ_CMP. That could cause
partial device attach or other unexpected results.
Found by: Clang Static Analyzer
drivers:
- Remove scsi_low_pisa.*, they were unused.
- Remove <compat/netbsd/physio_proc.h> and calls to the stubs in that
header. They were empty nops.
- Retire sl_xname and use device_get_nameunit() and device_printf() with
the underlying device_t instead.
- Remove unused {ct,ncv,nsp,stg}print() functions.
- Remove empty SOFT_INTR_REQUIRED() macro and the unused sl_irq member.
NetBSD/pc98 was never merged into the main NetBSD tree and is no longer
developed. Adding locking to these drivers would have made the compat
shims hard to impossible to maintain, so remove the shims to ease
future changes.
These changes were verified by md5. Some additional shims can be removed
that do affect the compiled results that I will probably do in another
round.
Approved by: nyan (tentatively)
of this hardware still running (close to twenty years now).
2. Quiesece and use ENC_VLOG instead of ENC_LOG for most
complaints. That is, they're visible with bootverbose, but
otherwise quiesced and not repeatedly spamming messages
with constant reminders that hardware in this space is
rarely fully compliant.
MFC after: 1 month
'encapsulating interface' used with IPsec and has nothing to do with
storage 'enclosure' services.
MFC after: 3 days
Noticed while: debugging why enc(4) is no longer automatically created
It includes three parts:
1) Modifications to CAM to detect media media changes and report them to
disk(9) layer. For modern SATA (and potentially UAS) devices it utilizes
Asynchronous Notification mechanism to receive events from hardware.
Active polling with TEST UNIT READY commands with 3 seconds period is used
for incapable hardware. After that both CD and DA drivers work the same way,
detecting two conditions: "NOT READY: Medium not present" after medium was
detected previously, and "UNIT ATTENTION: Not ready to ready change, medium
may have changed". First one reported to disk(9) as media removal, second
as media insert/change. To reliably receive second event new
AC_UNIT_ATTENTION async added to make UAs broadcasted to all periphs by
generic error handling code in cam_periph_error().
2) Modifications to GEOM core to handle media remove and change events.
Media removal handled by spoiling all consumers attached to the provider.
Media change event also schedules provider retaste after spoiling to probe
new media. New flag G_CF_ORPHAN was added to consumers to reflect that
consumer is in process of destruction. It allows retaste to create new
geom instance of the same class, while previous one is still dying.
3) Modifications to some GEOM classes: DEV -- to report media change
events to devd; VFS -- to handle spoiling same as orphan to prevent
accessing replaced media. PART class already handles spoiling alike to
orphan.
Reviewed by: silence on geom@ and scsi@
Tested by: avg
Sponsored by: iXsystems, Inc. / PC-BSD
MFC after: 2 months
Just free inclomplete daemon cache instead to let it retry next time.
Premature ses_softc_cleanup() caused NULL dereference when freed softc
was accessed later.
kern.cam.da.send_ordered, more in line with the other da sysctls/tunables.
PR: 169765
Submitted by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: mav
a CD or DVD drive with a damaged disc often benefit from a shorter
timeout. Also, when retries are set to 0, an application is expecting
errors and recovering them so do not print the error into the log.
The number of expected errors can literally be in the hundreds of
thousands which significantly slows data recovery.
Reviewed by: ken@ (but quite some time ago).
a da(4) instance going away while GEOM is still probing it.
In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.
While that event is queued, the da(4) instance goes away. When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.
The solution is to add a callback to the GEOM disk code that is
called when all of its resources are cleaned up. This is
implemented inside GEOM by adding an optional callback that is
called when all consumers have detached from a provider, and the
provider is about to be deleted.
scsi_cd.c,
scsi_da.c: In the register routine for the cd(4) and da(4)
routines, acquire a reference to the CAM peripheral
instance just before we call disk_create().
Use the new GEOM disk d_gone() callback to register
a callback (dadiskgonecb()/cddiskgonecb()) that
decrements the peripheral reference count once GEOM
has finished cleaning up its resources.
In the cd(4) driver, clean up open and close
behavior slightly. GEOM makes sure we only get one
open() and one close call, so there is no need to
set an open flag and decrement the reference count
if we are not the first open.
In the cd(4) driver, use cam_periph_release_locked()
in a couple of error scenarios to avoid extra mutex
calls.
geom.h: Add a new, optional, providergone callback that
is called when a provider is about to be deleted.
geom_disk.h: Add a new d_gone() callback to the GEOM disk
interface.
Bump the DISK_VERSION to version 2. This probably
should have been done after a couple of previous
changes, especially the addition of the d_getattr()
callback.
geom_disk.c: Add a providergone callback for the disk class,
g_disk_providergone(), that calls the user's
d_gone() callback if it exists.
Bump the DISK_VERSION to 2.
geom_subr.c: In g_destroy_provider(), call the providergone
callback if it has been provided.
In g_new_geomf(), propagate the class's
providergone callback to the new geom instance.
blkfront.c: Callers of disk_create() are supposed to pass in
DISK_VERSION, not an explicit disk API version
number. Update the blkfront driver to do that.
disk.9: Update the disk(9) man page to include information
on the new d_gone() callback, as well as the
previously added d_getattr() callback, d_descr
field, and HBA PCI ID fields.
MFC after: 5 days
defect information it has before grabbing the full defect list.
This works around a bug with some Hitachi drives that generate data overrun
errors when they are asked for more defect data than they have.
The change is done in a spec-compliant way, so it should have no negative
impact on drives that don't have this issue.
This is based on work originally done at Sandvine.
scsi_da.h: Add a define for the maximum amount of data that can be
contained in a defect list.
camcontrol.c: Update the readdefects() function to issue an initial
command to determine the length of the defect list, and
then use that length in the request for the full defect
list.
camcontrol.8: Add a note that some drives will report 0 defects available
if you don't request either the PLIST or GLIST.
Submitted by: Mark Johnston <markjdb@gmail.com> (original version)
MFC after: 3 days
invalidated while open, cam_periph_hold() will return error and won't
get the reference. Following reference release will crash the system.
Sponsored by: iXsystems, Inc.
MFC after: 3 days
the pass(4) and enc(4) drivers and devfs.
The pass(4) driver uses the destroy_dev_sched() routine to
schedule its device node for destruction in a separate thread
context. It does this because the passcleanup() routine can get
called indirectly from the passclose() routine, and that would
cause a deadlock if the close routine tried to destroy its own
device node.
In any case, once a particular passthrough driver number, e.g.
pass3, is destroyed, CAM considers that unit number (3 in this
case) available for reuse.
The problem is that devfs may not be done cleaning up the previous
instance of pass3, and will panic if isn't done cleaning up the
previous instance.
The solution is to get a callback from devfs when the device node
is removed, and make sure we hold a reference to the peripheral
until that happens.
Testing exposed some other cases where we have reference counting
issues, and those were also fixed in the pass(4) driver.
cam_periph.c: In camperiphfree(), reorder some of the operations.
The peripheral destructor needs to be called before
the peripheral is removed from the peripheral is
removed from the list. This is because once we
remove the peripheral from the list, and drop the
topology lock, the peripheral number may be reused.
But if the destructor hasn't been called yet, there
may still be resources hanging around (like devfs
nodes) that haven't been fully cleaned up.
cam_xpt.c: Add an argument to xpt_remove_periph() to indicate
whether the topology lock is already held.
scsi_enc.c: Acquire an extra reference to the peripheral during
registration, and release it once we get a callback
from devfs indicating that the device node is gone.
Call destroy_dev_sched_cb() in enc_oninvalidate()
instead of calling destroy_dev() in the cleanup
routine.
scsi_pass.c: Add reference counting to handle peripheral and
devfs object lifetime issues.
Add a reference to the peripheral and the devfs
node in the peripheral registration.
Don't attempt to add a physical path alias if the
peripheral has been marked invalid.
Release the devfs reference once the initial
physical path alias taskqueue run has completed.
Schedule devfs node destruction in the
passoninvalidate(), and release our peripheral
reference in a new routine, passdevgonecb() once
the devfs node is gone. This allows the peripheral
to fully go away, and the peripheral destructor,
passcleanup(), will get called.
MFC after: 3 days
Sponsored by: Spectra Logic
reporting. It includes:
- removing of error messages controlled by bootverbose, replacing them
with more universal and informative debugging on CAM_DEBUG_INFO level,
that is now built into the kernel by default;
- more close following to the arguments submitted by caller, such as
SF_PRINT_ALWAYS, SF_QUIET_IR and SF_NO_PRINT; consumer knows better which
errors are usual/expected at this point and which are really informative;
- adding two new flags SF_NO_RECOVERY and SF_NO_RETRY to allow caller
specify how much assistance it needs at this point; previously consumers
controlled that by not calling cam_periph_error() at all, but that made
behavior inconsistent and debugging complicated;
- tuning debug messages and taken actions order to make debugging output
more readable and cause-effect relationships visible;
- making camperiphdone() (common device recovery completion handler) to
also use cam_periph_error() in most cases, instead of own dumb code;
- removing manual sense fetching code from cam_periph_error(); I was told
by number of people that it is SIM obligation to fetch sense data, so this
code is useless and only significantly complicates recovery logic;
- making ada, da and pass driver to use cam_periph_error() with new limited
recovery options to handle error recovery and debugging in common way;
as one of results, CAM_REQUEUE_REQ and other retrying statuses are now
working fine with pass driver, that caused many problems before.
- reverting r186891 by raj@ to avoid burning few seconds in tight DELAY()
loops on device probe, while device simply loads media; I think that problem
may already be fixed in other way, and even if it is not, solution must be
different.
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks
CAM_DEBUG_CDB, CAM_DEBUG_PERIPH and CAM_DEBUG_PROBE) by default.
List of these flags can be modified with CAM_DEBUG_COMPILE kernel option.
CAMDEBUG kernel option still enables all possible debug, if not overriden.
Additional 50KB of kernel size is a good price for the ability to debug
problems without rebuilding the kernel. In case where size is important,
debugging can be compiled out by setting CAM_DEBUG_COMPILE option to 0.
are handled in most CAM peripheral drivers that are not handled by
GEOM's disk class.
The usual character driver open and close semantics are that the
driver gets N open calls, but only one close, when the last caller
closes the device.
CAM peripheral drivers expect that behavior to be honored to the
letter, and the CAM peripheral driver code (specifically
cam_periph_release_locked_busses()) panics if it is done incorrectly.
Since devfs has to drop its locks while it calls a driver's close
routine, and it does not have a way to delay or prevent open calls
while it is calling the close routine, there is a race.
The sequence of events, simplified a bit, is:
- devfs acquires a lock
- devfs checks the reference count, and if it is 1, continues to close.
- devfs releases the lock
- 2nd process open call on the device happens here
- devfs calls the driver's close routine
- devfs acquires a lock
- devfs decrements the reference count
- devfs releases the lock
- 2nd process close call on the device happens here
At the second close, we get a panic in
cam_periph_release_locked_busses(), complaining that peripheral
has been released when the reference count is already 0. This is
because we have gotten two closes in a row, which should not
happen.
The fix is to add the D_TRACKCLOSE flag to the driver's cdevsw, so
that we get a close() call for each open(). That does happen
reliably, so we can make sure that our reference counts are
correct.
Note that the sa(4) and pt(4) drivers only allow one context
through the open routine. So these drivers aren't exposed to the
same race condition.
scsi_ch.c,
scsi_enc.c,
scsi_enc_internal.h,
scsi_pass.c,
scsi_sg.c:
For these drivers, change the open() routine to
increment the reference count for every open, and
just decrement the reference count in the close.
Call cam_periph_release_locked() in some scenarios
to avoid additional lock and unlock calls.
scsi_pt.c: Call cam_periph_release_locked() in some scenarios
to avoid additional lock and unlock calls.
MFC after: 3 days