freebsd-nq

d/freebsd-nq

Fork 0

Commit Graph

Author	SHA1	Message	Date
Kenneth D. Merry	ee9c90c75c	Fix a problem with the way we handled device invalidation when attaching to a device failed. In theory, the same steps that happen when we get an AC_LOST_DEVICE async notification should have been taken when a driver fails to attach. In practice, that wasn't the case. This only affected the da, cd and ch drivers, but the fix affects all peripheral drivers. There were several possible problems: - In the da driver, we didn't remove the peripheral's softc from the da driver's linked list of softcs. Once the peripheral and softc got removed, we'd get a kernel panic the next time the timeout routine called dasendorderedtag(). - In the da, cd and possibly ch drivers, we didn't remove the peripheral's devstat structure from the devstat queue. Once the peripheral and softc were removed, this could cause a panic if anyone tried to access device statistics. (one component of the linked list wouldn't exist anymore) - In the cd driver, we didn't take the peripheral off the changer run queue if it was scheduled to run. In practice, it's highly unlikely, and maybe impossible that the peripheral would have been on the changer run queue at that stage of the probe process. The fix is: - Add a new peripheral callback function (the "oninvalidate" function) that is called the first time cam_periph_invalidate() is called for a peripheral. - Create new foooninvalidate() routines for each peripheral driver. This routine is always called at splsoftcam(), and contains all the stuff that used to be in the AC_LOST_DEVICE case of the async callback handler. - Move the devstat cleanup call to the destructor/cleanup routines, since some of the drivers do I/O in their close routines. - Make sure that when we're flushing the buffer queue, we traverse it at splbio(). - Add a check for the invalid flag in the pt driver's open routine. Reviewed by: gibbs	1998-10-22 22:16:56 +00:00
Kenneth D. Merry	60a899a075	Fix a bug in the error recovery code. It was possible to have more than one error recovery action oustanding for a given peripheral. This is bad for several reasons. The first problem is that the error recovery actions would likely be to fix the same problem. (e.g., we queue 5 CCBs to a disk, and the first one comes back with 0x04,0x02. We start error recovery, and the second one comes back with the same status. Then the third one comes back, and so on. Each one causes the drive to get nailed with a start unit, when we really only need one.) The other problem is that we only have space to store one CCB while we're doing error recovery. The subsequent error recovery actions that got started were over-writing the CCBs from previous error recovery actions, but we still tried to call the done routine N times for N error recovery actions. Each call to dadone() was done with the same CCB, though. So on the second one, we got a "biodone: buffer not busy" panic, since the buffer in question had already been through biodone(). In any case, this fixes things so that any any given time, there's only one error recovery action outstanding for any given peripheral driver. Reviewed by: gibbs Reported by: Philippe Regnauld <regnauld@deepo.prosa.dk> [ Philippe wins the "bug finder of the week" award ]	1998-10-13 21:41:32 +00:00
Justin T. Gibbs	8b8a9b1d3e	CAM Transport Layer (XPT). Submitted by: The CAM Team	1998-09-15 06:33:23 +00:00

Author

SHA1

Message

Date

Kenneth D. Merry

ee9c90c75c

Fix a problem with the way we handled device invalidation when attaching

to a device failed.

In theory, the same steps that happen when we get an AC_LOST_DEVICE async
notification should have been taken when a driver fails to attach.  In
practice, that wasn't the case.

This only affected the da, cd and ch drivers, but the fix affects all
peripheral drivers.

There were several possible problems:
 - In the da driver, we didn't remove the peripheral's softc from the da
   driver's linked list of softcs.  Once the peripheral and softc got
   removed, we'd get a kernel panic the next time the timeout routine
   called dasendorderedtag().
 - In the da, cd and possibly ch drivers, we didn't remove the
   peripheral's devstat structure from the devstat queue.  Once the
   peripheral and softc were removed, this could cause a panic if anyone
   tried to access device statistics.  (one component of the linked list
   wouldn't exist anymore)
 - In the cd driver, we didn't take the peripheral off the changer run
   queue if it was scheduled to run.  In practice, it's highly unlikely,
   and maybe impossible that the peripheral would have been on the
   changer run queue at that stage of the probe process.

The fix is:
 - Add a new peripheral callback function (the "oninvalidate" function)
   that is called the first time cam_periph_invalidate() is called for a
   peripheral.

 - Create new foooninvalidate() routines for each peripheral driver.  This
   routine is always called at splsoftcam(), and contains all the stuff
   that used to be in the AC_LOST_DEVICE case of the async callback
   handler.

 - Move the devstat cleanup call to the destructor/cleanup routines, since
   some of the drivers do I/O in their close routines.

 - Make sure that when we're flushing the buffer queue, we traverse it at
   splbio().

 - Add a check for the invalid flag in the pt driver's open routine.

Reviewed by:	gibbs

1998-10-22 22:16:56 +00:00

Kenneth D. Merry

60a899a075

Fix a bug in the error recovery code. It was possible to have more than

one error recovery action oustanding for a given peripheral.

This is bad for several reasons.  The first problem is that the error
recovery actions would likely be to fix the same problem.  (e.g., we
queue 5 CCBs to a disk, and the first one comes back with 0x04,0x02.  We
start error recovery, and the second one comes back with the same status.
Then the third one comes back, and so on.  Each one causes the drive to get
nailed with a start unit, when we really only need one.)

The other problem is that we only have space to store one CCB while we're
doing error recovery.  The subsequent error recovery actions that got
started were over-writing the CCBs from previous error recovery actions,
but we still tried to call the done routine N times for N error recovery
actions.  Each call to dadone() was done with the same CCB, though.  So on
the second one, we got a "biodone: buffer not busy" panic, since the buffer
in question had already been through biodone().

In any case, this fixes things so that any any given time, there's only one
error recovery action outstanding for any given peripheral driver.

Reviewed by:	gibbs
Reported by:	Philippe Regnauld <regnauld@deepo.prosa.dk>
[ Philippe wins the "bug finder of the week" award ]

1998-10-13 21:41:32 +00:00

Justin T. Gibbs

8b8a9b1d3e

CAM Transport Layer (XPT).

Submitted by:	The CAM Team

1998-09-15 06:33:23 +00:00

3 Commits