are handled in most CAM peripheral drivers that are not handled by
GEOM's disk class.
The usual character driver open and close semantics are that the
driver gets N open calls, but only one close, when the last caller
closes the device.
CAM peripheral drivers expect that behavior to be honored to the
letter, and the CAM peripheral driver code (specifically
cam_periph_release_locked_busses()) panics if it is done incorrectly.
Since devfs has to drop its locks while it calls a driver's close
routine, and it does not have a way to delay or prevent open calls
while it is calling the close routine, there is a race.
The sequence of events, simplified a bit, is:
- devfs acquires a lock
- devfs checks the reference count, and if it is 1, continues to close.
- devfs releases the lock
- 2nd process open call on the device happens here
- devfs calls the driver's close routine
- devfs acquires a lock
- devfs decrements the reference count
- devfs releases the lock
- 2nd process close call on the device happens here
At the second close, we get a panic in
cam_periph_release_locked_busses(), complaining that peripheral
has been released when the reference count is already 0. This is
because we have gotten two closes in a row, which should not
happen.
The fix is to add the D_TRACKCLOSE flag to the driver's cdevsw, so
that we get a close() call for each open(). That does happen
reliably, so we can make sure that our reference counts are
correct.
Note that the sa(4) and pt(4) drivers only allow one context
through the open routine. So these drivers aren't exposed to the
same race condition.
scsi_ch.c,
scsi_enc.c,
scsi_enc_internal.h,
scsi_pass.c,
scsi_sg.c:
For these drivers, change the open() routine to
increment the reference count for every open, and
just decrement the reference count in the close.
Call cam_periph_release_locked() in some scenarios
to avoid additional lock and unlock calls.
scsi_pt.c: Call cam_periph_release_locked() in some scenarios
to avoid additional lock and unlock calls.
MFC after: 3 days
process exit. Instead use CAM's standard reference counting to prevent
periph going away until process won't complete. I think that sleep in
single CAM SWI thread is not a good idea and may lead to deadlocks if
daemon process waits for some command completion. Combined with recent
patch avoiding use of CAM SWI for ATA it just causes panics because of
sleeps prohibited in interrupt thread context.
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.
Sponsored by: Spectra Logic Corporation
Sponsored by: iXsystems, Inc.
Submitted by: gibbs, will, mav