Commit Graph

64 Commits

Author SHA1 Message Date
gibbs
95dc85b099 Add the XPT_PATH_STATS and XPT_GDEV_STATS function codes. These ccb
types allow the reporting of error counts and other statistics.  Currently
we provide information on the last BDR or bus reset as well as active
transaction inforamtion, but this will be expanded as more information is
added to aid in error recovery.

Use the 'last reset' information to better handle bus settle delays.
Peripheral drivers now control whether a bus settle delay occurs and
for how long.  This allows target mode peripheral drivers to avoid
having their device queue frozen by the XPT for what shoudl only be
initiator type behavior.

Don't perform a bus reset if the target device is incapable of performing
transfer negotiation (e.g. Fiber Channel).

If we don't perform a bus reset but the controller is capable of transfer
negotiations, force negotiations on the first transaction to go to the
device.  This ensures that we aren't tripped up by a left over negotiation
from the prom, BIOS, loader, etc.

Add a default async handler funstion to cam_periph.c to remove duplicated
code in all initiator type peripheral drivers.

Allow mapping of XPT_CONT_TARGET_IO ccbs from userland.  They are
itentical to XPT_SCSI_IO ccbs as far as data mapping is concerned.
1999-05-22 21:58:47 +00:00
ken
fce282444d Add a facility in the CAM error handling code to retry selection timeouts.
If the client requests that the error recovery code retry a selection
timeout, it will be retried after half a second.  The delay is to give the
device time to recover.

For most of these drivers, I only added selection timeout retries where
they were also retrying unit attention type errors.  The sa(4) driver calls
saerror() in a number of places, but most of them don't request retrying
unit attentions.

Also, bump the default minimum CD changer timeout from 2 to 5 seconds and
the maximum timeout from 10 to 15 seconds.  Some Pioneer changers seem to
have trouble with the shorter timeout.

Reviewed by:	gibbs
1999-05-09 01:25:34 +00:00
gibbs
752ab9a858 cam_periph.c:
Move handling of CAM_AUTOSENSE_FAIL into block dealing with
	all other scsi status errors.

cam_queue.c:
cam_queue.h:
	Fix 'off by one' heap bug in a more efficient manner.  Since
	heap algorithms like to deal with indexes started from 1,
	offset our heap array pointer at allocation time to make this
	so for a C environment.  This makes the implementation of the
	algorithm a bit more efficient.

cam_xpt.c:
	Use macros for accessing the head of the heap so that code
	is isolated from implementation details of the heap.
1999-04-19 21:26:08 +00:00
peter
30a2f6d6a7 Use PHOLD/PRELE rather than P_PHYSIO. 1999-04-06 03:05:36 +00:00
dillon
df24433bbe This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
jdp
b5fcc979e2 Replace includes of <sys/kernel.h> with includes of
<sys/linker_set.h> in those files that use only the linker set
definitions.
1999-01-14 06:22:10 +00:00
ken
36eb372526 At Justin's request, limit the size of buffers that can be mapped into
and out of kernel address space (via the pass(4) and xpt(4) peripheral
drivers) to 64K (DFLTPHYS).  Some controllers, like the Adaptec 1542,
don't support more than 64K transactions.

We plan on eventually having the capability of limiting this size based
on min(MAXPHYS, controller max), but since that capability isn't here yet,
limit things to the lowest common denominator.
1998-12-16 21:00:06 +00:00
ken
60f9c7106b Probable fix for the "cdda2wav" panics that various people have been
reporting since this past summer.  (I think Daniel O'Conner was the first.)

The problem appears to have been something like this:

 - cdda2wav by default passes in a buffer that is close to the 128K MAXPHYS
   limit.
 - many times, the buffer is not page aligned
 - vmapbuf() truncates the address, so that it is page aligned
 - that causes the total size of the buffer to be greater than MAXPHYS,
   which of course is a bad thing.

Here's a quote from the PR (kern/9067):

==================
In particular, note bp->b_bufsize = 0x0001f950 and bp->b_data = 0xf2219960
(which does not start on a page boundary).  vunmapbuf() loops through all
the pages without any difficulty until addr reaches 0xf2239000, and then
the panic occurs.  This seems to indicate that we are exceeding MAXPHYS
since we actually started from the middle of a page (the data is being
transfered to a non page aligned location).

To complete the description, note that the system call originates from
ReadCddaMMC12() (in scsi_cmds.c of cdda2wav) with a request to read 55
audio sectors of 2352 bytes (which is calculated to fall under MAXPHYS).
This in turn ends up calling scsi_send() (in scsi-bsd.c) which calls
cam_fill_csio() and cam_send_ccb().  This results in a CAMIOCOMMAND ioctl
with a ccb function code of XPT_SCSI_IO.
==================

The fix is to change the size check in cam_periph_mapmem() so that it is
like the one in minphys().  In particular, it is something like:

if ((buffer_length + (buf_ptr & PAGE_MASK)) > MAXPHYS)
	buffer is too big

My fix is based on the one in the PR, but I cleaned up a fair number of
things in cam_periph_mapmem().  The checks for each buffer to be mapped
are now in a separate loop from the actual mapping operation.  With the new
arrangement, we don't have to bother with unmapping any previously mapped
buffers if one of the checks fails.

Many thanks to James Liu for tracking this down.  I'd appreciate it if some
vm-savvy folks would look this over.  I believe this fix is correct, but I
could be wrong.

PR:		kern/9067 (also, kern/8112)
Reviewed by:	gibbs
Submitted by:	"James T. Liu" <jtliu@phlebas.rockefeller.edu>
1998-12-16 18:00:39 +00:00
ken
123c4e5742 Fix a problem with the way we handled device invalidation when attaching
to a device failed.

In theory, the same steps that happen when we get an AC_LOST_DEVICE async
notification should have been taken when a driver fails to attach.  In
practice, that wasn't the case.

This only affected the da, cd and ch drivers, but the fix affects all
peripheral drivers.

There were several possible problems:
 - In the da driver, we didn't remove the peripheral's softc from the da
   driver's linked list of softcs.  Once the peripheral and softc got
   removed, we'd get a kernel panic the next time the timeout routine
   called dasendorderedtag().
 - In the da, cd and possibly ch drivers, we didn't remove the
   peripheral's devstat structure from the devstat queue.  Once the
   peripheral and softc were removed, this could cause a panic if anyone
   tried to access device statistics.  (one component of the linked list
   wouldn't exist anymore)
 - In the cd driver, we didn't take the peripheral off the changer run
   queue if it was scheduled to run.  In practice, it's highly unlikely,
   and maybe impossible that the peripheral would have been on the
   changer run queue at that stage of the probe process.

The fix is:
 - Add a new peripheral callback function (the "oninvalidate" function)
   that is called the first time cam_periph_invalidate() is called for a
   peripheral.

 - Create new foooninvalidate() routines for each peripheral driver.  This
   routine is always called at splsoftcam(), and contains all the stuff
   that used to be in the AC_LOST_DEVICE case of the async callback
   handler.

 - Move the devstat cleanup call to the destructor/cleanup routines, since
   some of the drivers do I/O in their close routines.

 - Make sure that when we're flushing the buffer queue, we traverse it at
   splbio().

 - Add a check for the invalid flag in the pt driver's open routine.

Reviewed by:	gibbs
1998-10-22 22:16:56 +00:00
ken
8c808a1f97 Clean up some unused variables.
Reviewed by:	ken
Submitted by:	phk
1998-10-15 17:46:26 +00:00
ken
3edc6b1249 Fix a bug in the error recovery code. It was possible to have more than
one error recovery action oustanding for a given peripheral.

This is bad for several reasons.  The first problem is that the error
recovery actions would likely be to fix the same problem.  (e.g., we
queue 5 CCBs to a disk, and the first one comes back with 0x04,0x02.  We
start error recovery, and the second one comes back with the same status.
Then the third one comes back, and so on.  Each one causes the drive to get
nailed with a start unit, when we really only need one.)

The other problem is that we only have space to store one CCB while we're
doing error recovery.  The subsequent error recovery actions that got
started were over-writing the CCBs from previous error recovery actions,
but we still tried to call the done routine N times for N error recovery
actions.  Each call to dadone() was done with the same CCB, though.  So on
the second one, we got a "biodone: buffer not busy" panic, since the buffer
in question had already been through biodone().

In any case, this fixes things so that any any given time, there's only one
error recovery action outstanding for any given peripheral driver.

Reviewed by:	gibbs
Reported by:	Philippe Regnauld <regnauld@deepo.prosa.dk>
[ Philippe wins the "bug finder of the week" award ]
1998-10-13 21:41:32 +00:00
bde
02489e6f7f Fixed printf format errors. u_long is not necessarily suitable for casting
pointers to, and %d is not suitable for printing uint32_t's.
1998-09-29 09:18:08 +00:00
gibbs
eccdd13267 cam_xpt.c:
Add quirk entry for a Samsung drive that doesn't like experiencing
	the queue full condition.

	Bump the timeouts for all probe activities to 60s.  We don't know
	what the seletion timeout (or equivelent on other mediums) is
	for controllers, which can make the transactions at the tail
	end of a parallel probe take a while to complete.  The DPT
	seems to be a card that takes a long time to see a selection timeout.

cam_periph.c:
	Don't call a device "gone" after a single selection timeout.  We
	need to come up with a better policy.  Until that time, you'll
	have to manually re-scan a bus via camcontrol for the system to
	decide that a device is really gone.  This should give devices
	experiencing temporary insanity to escape death.
1998-09-20 07:14:36 +00:00
gibbs
855593c295 CAM Transport Layer (XPT).
Submitted by:	The CAM Team
1998-09-15 06:33:23 +00:00