Clear residual counts after a successful samount (the user doesn't
care that we got an N-kbyte residual on our test read).
Change a lot of error handling code.
1. If we end up in saerror, check more carefully about the kind of
error. If it is a CAM_SCSI_STATUS_ERROR and it is a read/write
command, we'll be handling this in saerror. If it isn't a read/write
command, check to see whether this is just an EOM/EOP check condition-
if it is, just set residual and return normally. A residual and
then a NO SENSE check condiftion with the ASC of 0 and ASCQ of
between 1 and 4 are normal 'signifying' events, not errors per se,
and we shouldn't give the command to cam_periph_error to do something
relatively unpredictable with.
2. If we get a Bus Reset, had a BDR sent, or get the cam status of
CAM_REQUEUE_REQ, check the retry count on the command. The default
error handler, cam_periph_error, doesn't honor retry count in these
cases. This may change in the future, but for now, make sure we
set EIO and return without calling cam_periph_error if the retry
count for the command with an error is zero.
3. Clean up the pending error case goop and handle cases more
sensibly.
The rules are:
If command was a Write:
If we got a SSD_KEY_VOLUME_OVERFLOW, the resid is
propagated and we set ENOSPC as the error.
Else if we got an EOM condition- just mark EOM pending.
And set a residual of zero. For the longest time I was just
propagating residual from the sense data- but my tape
comparison tests were always failing because all drives I
tested with actually *do* write the data anyway- the EOM
(early warning) condition occurred *prior* to all of the
data going out to media- that is, it was still buffered by
the drive. This case is described in SCSI-2, 10.2.14,
paragraph #d for the meaning of 'information field'. A
better fix for this would be to issue a WFM command of zero
to cause the drive to flush any buffered data, but this
would require a fairly extensive rewrite.
Else if the command was a READ:
If we got a SSD_KEY_BLANK_CHECK-
If we have a One Filemark EOT model- mark EOM as pending,
otherwise set EIO as the erorr.
Else if we found a Filemark-
If we're in Fixed Block mode- mark EOF pending.
If we had an ILI (Incorrect Length Indicator)-
If the residual is less than zero, whine about tape record
being too big for user's buffer, otherwise if we were in
Fixed Block mode, mark EIO as pending.
All 'pending' conditions mean that the command in question completes
without error indication. It had succeeded, but a signifying event
occurred during its execution which will apply to the *next* command
that would be exexcuted. Except for the one EOM case above, we always
propagate residual.
Now, way back in sastart- if we notice any of the PENDING bits set,
we don't run the command we've just pulled off the wait queue. Instead,
we then figure out it's disposition based upon a previous command's
association with a signifying event.
If SA_FLAG_EOM_PENDING is set, we don't set an error. We just complete
the command with residual set to the request count (not data moved,
but no error). We continue on.
If SA_FLAG_EOF_PENDING- if we have this, it's only because we're in
Fixed Block mode- in which case we traverse all waiting buffers (which
we can get in fixed block mode because physio has split things up) and
mark them all as no error, but no data moved and complete them.
If SA_FLAG_EIO_PENDING, just mark the buffer with an EIO error
and complete it.
Then we clear all of the pending state bits- we're done.
MFC after: 4 weeks
Handle both old and new TARGIOALLOCUNIT/TARGIOFREEUNIT cases- the new
one allows us to specify inquiry data we want to use.
Handle more of the CAM_DIS_DISCONNECT case.
Move TARGCTLIOALLOCUNIT to OTARGCTLIOALLOCUNIT, TARGCTLIOFREEUNIT
to OTARGCTLIOFREEUNIT and redefine old associated structure to be
old_ioc_alloc_unit- deprecation but preservation of binaries.
Add new structure for same- but this one contains a pointer to
user defined INQUIRY data so you can define what the target
device looks like to the outside world.
1. If we get frozen, unfreeze for disable disconnects.
2. Put CAM_DIS_DISCONNECT commands at the head of the work queue
(we have a target still connected and we can't run anything else
until this command completes).
If we had an error sending the last CTIO, unfreeze the queue anyway.
o Much cleanly separate NetBSD(XS) / FreeBSD(CAM) codes.
o Improve tagged queing support (full QTAG).
o Improve quirk support.
o Improve parity error retry.
o Impliment wide negotheation.
o Cmd link support.
o Add copyright of CAM part.
o Change for CAM_NEW_TRAN_CODE.
o Work around for buggy KME UJDCD450.
o stg: add disconnet condition.
o nsp: use suspend I/O.
and more. I thank Honda-san.
conf/options.pc98: add CT_USE_RELOCATE_OFFSET and CT_BUS_WEIGHT
dev/{ct,ncv,nsp,stg}/*_{pccard,isa}.c: add splcam() before calling
attach/detach functions.
Tested by: bsd-nomads
Obtained from: NetBSD/pc98
This is useful if you want to dynamically move into a Fibre Channel
or Multi-initiator environment that happens to be particularly noisy
and ugly that requires a lot of retries (with shorter I/O timeouts)
for commands destried by LIPs or Bus Resets.
Reviewed by: deafening silence on audit && scsi on the retry counts
MFC after: 2 weeks
1. Add SA_IO_TIMEOUT as an option (4 minutes default) to cover reads,
writes, wfm, test unit ready.
2. Add internal SCSIOP_TIMEOUT (e.g., for mode sense) at 1 minute. This
should not require an option, but is cleaner to parameterize.
MFC after: 1 week
drivers.
- change daprevent() to set CAM_RETRY_SELTO and SF_RETRY_UA when it calls
cam_periph_runccb().
- change the pt(4) driver to ignore unit attentions
- change the targ(4) driver to retry selection timeouts
- clean up a few formatting glitches in the targ(4) driver
Reviewed by: gibbs
prevent scsi_sense_desc() from deferencing a NULL pointer when a drive
happens to return one of these sense keys.
Reported by: Michael Samuel <michael@miknet.net>
With the recent changes in the CAM error handling, some problems in
the error handling of sa(4) have been uncovered. Basically, a number
of conditions that are not actually errors have been mistreated as
genuine errors. In particular:
. Trying to read in variable length mode with a mismatched blocksize
between the on-tape (virtual) blocks and the read(2) supplied buffer
size, causing an ILI SCSI condition, have caused an attempt to retry
the supposedly `errored' transfer, causing the tape to be read
continuously until it eventually hit EOM. Since by default any
simple mt(1) operation does an initial test read, an `mt stat' was
sufficient to trigger this bug.
Note that it's Justin's opinion that treating a NO SENSE as an EIO
is another bug in CAM. I feel not authorized to fix cam_periph.c
without another confirmation that i'm on the right track, however.
. Hitting a filemark caused the read(2) syscall to return EIO, instead
of returning a `short read'. Note that the current fix only solves
this problem in variable length mode. Fixed length mode uses a
different code path, and since i didn't grok all the intentions behind
that handling, i did not touch it (IOW: it's still broken, and you get
an EIO upon hitting a filemark).
The solution is to keep track of those conditions inside saerror(),
and upon completion to not call cam_periph_error() in that case. We
need to make sure that the device gets unfrozen if needed though (in
case of actual errors, cam_periph_error() does this on our behalf).
Not objected by: mjacob (who currently doesn't have the time to
review the patch)
Some of the major changes include:
- The SCSI error handling portion of cam_periph_error() has
been broken out into a number of subfunctions to better
modularize the code that handles the hierarchy of SCSI errors.
As a result, the code is now much easier to read.
- String handling and error printing has been significantly
revamped. We now use sbufs to do string formatting instead
of using printfs (for the kernel) and snprintf/strncat (for
userland) as before.
There is a new catchall error printing routine,
cam_error_print() and its string-based counterpart,
cam_error_string() that allow the kernel and userland
applications to pass in a CCB and have errors printed out
properly, whether or not they're SCSI errors. Among other
things, this helped eliminate a fair amount of duplicate code
in camcontrol.
We now print out more information than before, including
the CAM status and SCSI status and the error recovery action
taken to remedy the problem.
- sbufs are now available in userland, via libsbuf. This
change was necessary since most of the error printing code
is shared between libcam and the kernel.
- A new transfer settings interface is included in this checkin.
This code is #ifdef'ed out, and is primarily intended to aid
discussion with HBA driver authors on the final form the
interface should take. There is example code in the ahc(4)
driver that implements the HBA driver side of the new
interface. The new transfer settings code won't be enabled
until we're ready to switch all HBA drivers over to the new
interface.
src/Makefile.inc1,
lib/Makefile: Add libsbuf. It must be built before libcam,
since libcam uses sbuf routines.
libcam/Makefile: libcam now depends on libsbuf.
libsbuf/Makefile: Add a makefile for libsbuf. This pulls in the
sbuf sources from sys/kern.
bsd.libnames.mk: Add LIBSBUF.
camcontrol/Makefile: Add -lsbuf. Since camcontrol is statically
linked, we can't depend on the dynamic linker
to pull in libsbuf.
camcontrol.c: Use cam_error_print() instead of checking for
CAM_SCSI_STATUS_ERROR on every failed CCB.
sbuf.9: Change the prototypes for sbuf_cat() and
sbuf_cpy() so that the source string is now a
const char *. This is more in line wth the
standard system string functions, and helps
eliminate warnings when dealing with a const
source buffer.
Fix a typo.
cam.c: Add description strings for the various CAM
error status values, as well as routines to
look up those strings.
Add new cam_error_string() and
cam_error_print() routines for userland and
the kernel.
cam.h: Add a new CAM flag, CAM_RETRY_SELTO.
Add enumerated types for the various options
available with cam_error_print() and
cam_error_string().
cam_ccb.h: Add new transfer negotiation structures/types.
Change inq_len in the ccb_getdev structure to
be "reserved". This field has never been
filled in, and will be removed when we next
bump the CAM version.
cam_debug.h: Fix typo.
cam_periph.c: Modularize cam_periph_error(). The SCSI error
handling part of cam_periph_error() is now
in camperiphscsistatuserror() and
camperiphscsisenseerror().
In cam_periph_lock(), increase the reference
count on the periph while we wait for our lock
attempt to succeed so that the periph won't go
away while we're sleeping.
cam_xpt.c: Add new transfer negotiation code. (ifdefed
out)
Add a new function, xpt_path_string(). This
is a string/sbuf analog to xpt_print_path().
scsi_all.c: Revamp string handing and error printing code.
We now use sbufs for much of the string
formatting code. More of that code is shared
between userland the kernel.
scsi_all.h: Get rid of SS_TURSTART, it wasn't terribly
useful in the first place.
Add a new error action, SS_REQSENSE. (Send a
request sense and then retry the command.)
This is useful when the controller hasn't
performed autosense for some reason.
Change the default actions around a bit.
scsi_cd.c,
scsi_da.c,
scsi_pt.c,
scsi_ses.c: SF_RETRY_SELTO -> CAM_RETRY_SELTO. Selection
timeouts shouldn't be covered by a sense flag.
scsi_pass.[ch]: SF_RETRY_SELTO -> CAM_RETRY_SELTO.
Get rid of the last vestiges of a read/write
interface.
libkern/bsearch.c,
sys/libkern.h,
conf/files: Add bsearch.c, which is needed for some of the
new table lookup routines.
aic7xxx_freebsd.c: Define AHC_NEW_TRAN_SETTINGS if
CAM_NEW_TRAN_CODE is defined.
sbuf.h,
subr_sbuf.c: Add the appropriate #ifdefs so sbufs can
compile and run in userland.
Change sbuf_printf() to use vsnprintf()
instead of kvprintf(), which is only available
in the kernel.
Change the source string for sbuf_cpy() and
sbuf_cat() to be a const char *.
Add __BEGIN_DECLS and __END_DECLS around
function prototypes since they're now exported
to userland.
kdump/mkioctls: Include stdio.h before cam.h since cam.h now
includes a function with a FILE * argument.
Submitted by: gibbs (mostly)
Reviewed by: jdp, marcel (libsbuf makefile changes)
Reviewed by: des (sbuf changes)
Reviewed by: ken
inq_len member of the ccb_getdev structure, but we've never filled that
value in..
So we now get the length from the inquiry data returned by the drive.
(Since we will fetch as much inquiry data as the drive claims to support.)
Reviewed by: mjacob
Reported by: Andrzej Tobola <san@iem.pw.edu.pl>
o Offset and period in synch messages and width negotiation should be
done for per target not per lun. Move these from *lun_info to
*targ_info.
o Change in handling XPT_RESET_DEV and XPT_GET_TRAN_SETTINGS .
o Change CAM_* xpt_done return values.
o Busy loop did not timeout. Change this to timeout as original NetBSD/pc98.
Reviewed by: bsd-nomads ML
not be retried. It is an indication that there was an error that was
corrected during the execution of the command. This is per ANSI SCSI2
spec.
It's possible that these should also be noted to the console (as indicative,
perhaps, of growing media defect lists in drives), but the default of
printing errors out if bootverbose in this case is probably enough.
Also, there'd been a missing ERESTART for that clause anyway.
2. If you have an ABORTED COMMAND, it's almost invariably a SCSI parity
error. You should never be silent about these since users should do something
about this if it occurs (moving that power cord *away* from the SCSI cable is
always a good first start). This should print irrespective of bootverbose
because it's an actual real error even if we retry a transmission.
Reviewed by: audit@freebsd.org, gibbs@freebsd.org
we *really* are.
It should be noted that there is a degenerate case where soft tape
location will be lost (not causing a frozen state- but causing
the loss of reporting fileno/blockno)- that's where you backspace
over a filemark- you stop backspacing as soon as you cross the
filemark, but you have no idea what the record number now is because
you have no idea how many records you are into the file you just
backed into. Such is life.
While I'm at it, also pick up residuals from writing filemarks.
PR: 24222