a race condition in how SDTR and WDTR negotiation are handled, fixes for multi-lun
non-tagged device recovery, and ensuring that the timedout scbs in the waiting queue
are cleaned up.
Fix a problem with SCB paging that caused bogus residuals to be reported.
will increase the overhead of queueing a command, but some recent bug reports
make me believe that AAP isn't really working and that we are losing some
SCBs from the input queue. Hopefully this will cure that problem.
Fix some bugs in the error recovery code. Mainly these could cause us to
inadvertantly forget to untimeout an SCB that was recovered causing later
confusion.
filed in the hardware SCB not changing during the course of a transaction.
Since the sequencer now DMAs the hardware SCB back up to the host when it
detects a residual, this is no longer the case. I added a field to the
"software" scb to mirror this information and it is now used for doing the
residual calculation.
Fix a few panics during error recovery:
1) Stupid mistake in the "no SCB match handler" where I was using the wrong
variable (busy_scbid instead of scb_index).
2) Unbusy the target of an abort request if the command we are trying to
abort is an untagged transaction. If we don't, we get a fatal NO_MATCH_BUSY
condition which "should never happen".
3) When an abort completes, turn off ahc->in_timeout or else the next timeout
will hit the protective "scb timesout again" panic.
4) Fix a typo that caused the requeued "abort" SCB to have its TAG_ENB and
disconnect bits to be cleared (missing ~) so that devices would complain
about overlapped commands.
Be sure to turn off the unexpected busfree interrupt after we do a bus
reset since we are expecting the bus to go free in that case.
Return XS_TIMEOUT instead of XS_DRIVERSTUFFUP in certain scenarios. XS_TIMEOUT
allows for retries, XS_DRIVERSTUFFUP does not.
Allow commands with SDTR and WDTR negotiation to be tagged. The SCSI II spec
says that you probably should not do this for fear of hitting bogus devices.
The driver did this in the past for almost two years without any problem,
and not doing it causes problems during error recovery to a tag capable device
as the number of openings is higher than two and we'll start sending it
tagged commands causing "overlapped commands attempted" type errors. The
real fix needs to happen in the generic SCSI layer which can limit the
number and type of transactions to a device during error recovery efficiently.
Give ourselves at least 100ms to perform a request sense instead of relying
on the original timeout to be long enough to complete this new command as
well as the one that generated the condition.
Removed some redundant code.
If we can, use timeouts instead of DELAYs when dealing with a bus reset.
This prevents us from holding up the whole machine for a noticible amount
of time (especially for a real time app).
Make a pass over the timeout/error handling code. Aborts are more
reliable. We actually handle parity errors correctly now instead of
locking up the bus. Added code to properly clean up disconnected SCBs
down on the card during error handling. Improved robustness in several
areas.
If we are using defaults, but are an Ultra card, negotiate at 20MHz instead
of 10.
Don't attempt to handle any commands for 100ms after a reset has occured.
This is the minimum time before a target will respond to selection. Also
disable the busfree interrupt before doing a bus reset. This prevents the
driver from getting confused by an "unexpected busfree".
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
uses one or the other. This required some changes to the ahc_reset()
function, and how early the probes had to allocate their softc.
Turn the AHC_IN/OUT* macros into inline functions and lowercase their names
to indicate this change. Geting AHC_OUTSB to work as a macro doing
conditional memory mapped I/O would have been too gross.
Stop setting STPWEN in the main driver and let the PCI front end do it
instead. It knows better.
Add the clearing of the QOUTQCNT variable during command complete processing
in the SCB paging case.
Go back to doing unconditional retries for the QUEUE FULL status condition.
This is really a kludge, but the code to handle it properly is on the SCSI
branch and will not make it into 2.2.
it automatically. The AHC_FORCE_PIO option wasn't having any effect because
the PCI probe code didn't include this file.
Fix some problems with the new sync and wide negotiation code. First off,
go back to async transfers by using a message reject again. The SCSI II and
III spec indicate that if a target's response to an initiater does not suit
(i.e. its too low), then performing a message reject is the appropriate
response. If, on the other hand, the initiator begins the negotiation and
we want to go async, we will send back an SDTR message with a 0 period and
offset.
Also fix a really bad negotiation problem caused by a missing "break". This
would usually hit people that had "smart" wide devices that immediately
attempt sync negotiation after a successful wide negotiation.
2.2 Candidate.
This involves expanding the support of the SEEPROM routines to deal with
the larger SEEPROMs on these cards and providing a mechanism to share
SCB arrays between multiple controllers.
Most of the 398X support came from Dan Eischer.
ahc_data -> ahc_softc
Clean up some more type bogons I missed from the last pass.
Be more clear when handing the NO_MATCH condition. NO_MATCH can also
happen when the sequencer encounters an SCB we've asked to be aborted.
- Add support for memory mapped I/O.
- Use DMA to get SCBs down to the adapters.
- Remove old paging code.
- Be much smarter about how we allocate SCB space. The old, simple method
wasted almost half a page per SCB. Ooops.
- Make command complete interrupt processing more efficient.
- Break the monolithic ahc_intr into sub-routines. The sub-routines handle
rare, special case events so the function call is not a penalty and the
removal of the code from the main routine most likely improves performance
instruction prefech will work better and less code is pushed into the cache.
- Never, ever allow tagged queueing if a device has disconnection disabled.
- Clean up and simplify timeout code. Many of the changes are to handle the
new DMA scheme.
Add a panic for attempts to page in a non paged out SCB.
Re-order some of the interrupt routine for better performance.
NetBSD/OpenBSD support Submitted by:Noriyuki Soda <soda@sra.co.jp>,
Pete Bentley <pete@demon.net>,
Charles M. Hannum <mycroft@mit.edu>,
Theo de Raadt <deraadt@theos.com>
channel B first as approriate.
Only reset the SCSI bus if the RESET_SCSI bit of SCSICONF is set. This
makes the aic7xxx driver honor all of the configuration settings availible
in SCSI-Select or the ECU.
Fix a benign bug in the reset code that caused us to always wait a full
second after the chip reset. This should shave some time off the probe.
Bug found by pedrosal@nce.ufrj.br (Pedro Salenbauch)
It seems that only the top three sync rates are doubled when in ultra mode,
so update the syncrates table as appropriate.
Found by "Dan Willis" <dan@plutotech.com> and his SCSI bus analyzer
Ensure that queued commands are not touched by the abort code by setting
the SCB status to indicate what queue it is in.
Fix deadlocks when using SCB paging by using SCBs from the assigned_scbs
queue or an SCB that completed during the same interrupt if needed.
Don't ever use insl to pull SCBs from any of the controllers. You can
only do 8bit PIO reads. This only affected SCB paging.
With this checkin, SCB paging works quite a bit better, but I still have
some problems with it that may be caused by a firmware problem in my
PD1800s. It seems that using a tag number higher than the maximum number
of tags allowed by the device, confuses it. For example, if I queue
two commands, tagged 3 and 36, it never reconnects for tag 36.
(Rev E or greater), aic7850, aic7860, aic7870, and aic7880 controllers.
SCB paging is enabled with the option "AHC_SCBPAGING_ENABLE". Full
comments on the algorithm are at the top of i386/scsi/aic7xxx.c.
options "AHC_TAGENABLE" and "AHC_QUEUE_FULL" have been removed. The
default is 4 tags without SCB paging, 8 with.
Clear the SCSIRSTI bit after throwing a bus reset. Some cards seem to get
confused otherwise.
Handle SCSIRSTI interrupts before checking to see if there is a valid
SCB in use since this can happen. (Clears PR# i386/1123)
Clean up the way we determine the number of SCBs on the card
(courtesy of Dan Eischen).
Guard against attempts to negotiate wide to a narrow controller.
Fix some comments.
Update my copyrights.
QINCNT. The 7850 puts random garbage in the high bits and all my attempts
to determine the cause of this failed. This approach does seem to work
around the problem.
Go back to relying on the SCSIPERR interrupt instead of having the sequencer
interrupt at the beginning of ITloop after a parity error occured.
Determine the number of SCBs on a card automatically and base the qcntmask
on the number of SCBs.
Add entries for 11.4MHz, 8.8MHz, 8.0MHz, and 7.2MHz to ULTRA portion of
the syncrate table. They seem to work fine on the 2940UW I have here and
will allow more non-ultra devices (like my tape drive) to run sync while
the adapter is in ULTRA mode.
Return XS_SELTIMEOUT instead of XS_TIMEOUT for selection timeouts. I was
getting sick of waiting for the SCSI code to retry each non-existant unit
multiple times during boot and XS_SELTIMEOUT bypasses all retries.
Use new SLIST queue macros. This was inspired by NetBSD using TAILQs in
their SCSI drivers. For optimum cache hits, the free scb list should
be LIFO which is what the old and new code does. NetBSD implemented a
FIFO queue for some reason.
Spaces -> tabs.
Cleanse the SCSI subsystem of its internally defined types
u_int32, u_int16, u_int8, int32, int16, int8.
Use the system defined *_t types instead.
aic7xxx.c:
Fix the reset code.
Instead of queing up all of the SCBs that timeout during timeout
processing, we take the first and have it champion the effort.
Any other scbs that timeout during timeout handling are given
another lifetime to complete in the hopes that once timeout
handing is finished, they will complete normally. If one of
these SCBs times out a second time, we panic and Justin tries
again.
The other major change is to queue flag aborted SCBs during timeout
handling, and "ahc_done" them all at once as soon as we have the
controller back into a sane state. Calling ahc_done any earlier
will cause the SCSI subsystem to toss the command right back at
us and the attempt to queue the command will conflict with what
the timeout routine is trying to accomplish.
The aic7xxx driver will now respond to bus resets initiated by
other devices.
Simplify the initialization of adapters by pulling all card specific
initialization to the card specific modules.
Update comments and fix formating.
Pass struct ahc_data*'s to functions instead of unit numbers.
Take advantage of the quad word alignment of SCB fields.
Adapt to new sequencer changes:
1) Waiting scb list no longer has a tail.
2) Fill the message buffer as appropriate during a parity error.
3) Count all of the SGs involved in a residual instead of just
the current one.
The reset/abort code still needs a lot of work.
Reviewed by: David Greenman <davidg@FreeBSd.org>
Start the revamp of the initialiation process. New routines include
ahc_alloc, ahc_free, and ahc_reset. These help divide the work of staring
up a board more logically between probe and attach.
ahcintr now takes a (void *) and returns int. The pci code uses it directly.
Until the PCI code for shared edged triggered interrupts is removed, the
eisa code uses a stub (ahc_eisa_intr) that throws away the int returned
by ahcintr.
Use MHz instead of MB/s for printing out sync rates.
Print out "aic7880" instead of "aic7870" for the new aic7880 chips.
incompatible with the type of a PCI interrupt handler. A new entry
point `ahc_pci_intr()' is used for PCI. ISA and PCI interrupts are
penalized equally (:-) by calling a common handler `ahc_intr()'. This
should be reorganized. Some strings now name the wrong function...
the new seeprom format and negotiate up to 20MHz sync if set in SCSI-Select.
Reduce the complexity of the timeout code by running it at splhigh(). Fix
a bug that caused rescheduled timeouts at 0 clock ticks in the future causing
an infinite loop.
Obtained from: Timeout bug noticed by David Greenman and wcarchive.
1) Make the driver "quiet" by sticking most boot messages behind
bootverbose conditionals. This means that you won't see the
sync and wide negotiation, but you will find out if they fail.
2) Add support to the 93cx6 serial eeprom code to read at an abitrary
offset. This is needed so that we can access the second half
of the eeprom on 3940 cards where the second channel's config
is stored.
3) Add flags argument to ahcprobe(). This is used by the pci probe code
to tell the generic driver that an adapter should be treated
as a channel B device as well as notify it of the presence of
external SCB SRAM. These are needed for some motherboard
implementations of the aic7870 and for the 3940 controllers.
4) Print "Channel A"/"Channel B" instead of "Single Channel" for the
two busses of the 3940. I received many reports of confusion
about how the 3940 was probed since most people belived that
only one ahc entry was needed. This will hopefully make it
clearer.
5) Walk the SCBs to determine just how many their are if external SCB
ram is detected.
6) Hard code that external SCB ram is present for the 3940 since it doesn't
use the documented reporting facility for reporting the SRAM. :(
255 commands per channel are supported on the 3940.
7) Read the seeprom starting at addres 32 for the second channel of the
3940 so we get the right info for that channel.
8) Clean up printing of the "Disabling tagged queuing message".
9) Queue timeouts if they occur while we are handling a timeout. The code
was totally unprotected in this scenario.
Reviewed by: Timeout code reviewed by David Greenman <davidg>
optimizations I have been working on yet, but does bring in some bug fixes
and performance improvments that were easy to regression test:
Setup the data fifo threshold and bus off timing correctly for 27/284x cards.
Users of these adapters with fast periferals (greater than 5MB/s) will notice
a big performance difference. (Sometimes as large as going from 3.7->8.3MB/s).
Fix handling of the active target flags. Some of the outbs where missing
the base offset in the abort code. The abort code still needs lots of work.
Support 3940 controllers, but only with 16 SCBs for now. Eventually I'll
add support for all 255, but I need to find a tester for the code first since
we have to enable the cards external SRAM to do this.
Add Dan Eischen's serial eeprom reading facilities. This allows the 2940
adapters to pull additional information left over from SCSI-Select right out
out of the configuration seeprom.
If the BIOS is disabled on 274x controllers, reset all target parameters
to there defaults since you can't rely on what is stored in scratch ram.
Report motherboard controllers as such.
Stick the first SG address and count into the SCB data and count areas for
all transfers in preparation of a later sequencer optimization.
Keep track of which targets can are allowed to have the disconnection
priveledge since this will be handled by the kernel driver in the future.
If a target issues a message reject in response to a tagged message,
disable tagged queuing for that target. Some seagates say they can do
tagged queuing, but lie, and its a shame to have to disable tagged queuing
on all devices just because you have one that can't cope.
1) If a target initiated a sync negotiation with us and happened to chose a
value above 15, the old code inadvertantly truncated it with an "& 0x0f".
If the periferal picked something really bad like 0x32, you'd end up with
an offset of 2 which would hang the drive since it didn't expect to ever
get something so low. We now do a MIN(maxoffset, given_offset).
2) In the case of Wide cards, we were turning on sync transfers after a
sucessfull wide negotiation. Now we leave the offset alone in the per
target scratch space (which implies asyncronous transfers since we initialize
it that way) until a syncronous negotation occurs.
3) We were advertizing a max offset of 15 instead of 8 for wide devices.
4) If the upper level SCSI code sent down a "SCSI_RESET", it would hang the
system because we would end up sending a null command to the sequencer. Now
we handle SCSI_RESET correctly by having the sequencer interrupt us when it
is about to fill the message buffer so that we can fill it in ourselves.
The sequencer will also "simulate" a command complete for these "message only"
SCBs so that the kernel driver can finish up properly. The cdplay utility
will send a "SCSI_REST" to the cdplayer if you use the reset command.
5) The code that handles SCSIINTs was broken in that if more than one type
of error was true at once, we'd do outbs without the card being paused.
The else clause after the busfree case was also an accident waiting to
happen. I've now turned this into an if, else if, else type of thing, since
in most cases when we handle one type of error, it should be okay to ignore
the rest (ie if we have a SELTO, who cares if there was a parity error on
the transaction?), but the section should really be rewritten after 2.0.5.
This fix was the least obtrusive way to patch the problem.
6) Only tag either SDTR or WDTR negotiation on an SCB. The real problem is
that I don't account for the case when an SCB that is tagged to do a particular
type of negotiation completes or SELTOs (selection timeout) without the
negotiation taking place, so the accounting of sdtrpending and wdtrpending
gets screwed up. In the wide case, if we tag it to do both wdtr and sdtr,
it only performs wdtr (since wdtr must occur first and we spread out the
negotiation over two commands) so we always have sdtrpending set for that
target and we never do a real SDTR. I fill properly fix the accounting
after 2.0.5 goes out the door, but this works (as confirmed by Dan) on
wide targets.
Other stuff that is also included:
1) Don't do a bzero when recycling SCBs. The only thing that must explicitly
be set to zero is the scb control byte which is done in ahc_get_scb. We also
need to set the SG_list_pointer and SG_list_count to 0 for commands that do
not transfer data.
2) Mask the interrupt type printout for the aic7870 case. The bit we were
using to determine interrupt type is only valid for the aic7770.
Submitted by: Justin Gibbs
It is the kernel driver's responsibility to do the list manipulation whenever
a selection timeout or a request sense occurs.
Print out the interrupt type that the device has been set to. It seems that
one of the Asus motherboards botches this and David thought a diagnostic would
be nice.
Fix a bug in my diagnostic code that David found.
Reviewed by: Wcarchive and David Greenman
higher level scsi code.
Spls should never be conditionalized, so don't do so here.
Restructure the get_scb routine so that we can't get into an infinite
loop if the ccbs are exhausted and we are are called with SCSI_NOSLEEP set.
Other driver maintainer's that based their scb allocation routines on Julian's
code should look at these changes and implement them for their driver.
The aic7xxx driver inspired these changes because early revs of the
aic7770 chips have so few SCBs that you can actually run out. If you
have a rev C or aic7770 (as is reported by the driver probe) and had more
than 2 drives, you could get into an infinite loop when using up all of
the SCBs. Since the driver will only allow two SCBs per device and I
only had two devices, I never saw this problem on my Rev C card.
Bzero only 19 bytes of the scb instead of 2k (ack!). This was a hold
over from when a struct SCB only contained the information downloaded
to the board, but we now store kernel driver data in there as well. This
greatly lowers the overhead for small transactions (I get ~1MB/sec for
dds with a 512 byte block size).
Submitted by: John Dyson with the aic7xxx specific optimization by me
- catch the interrupt type (EDGE/LEVEL) before chip reset instead
of guessing the right type.
- Add pause variable to the ahc struct to better handle the different
interrupt types and pausing the sequencer.
- CLRINTSTAT -> CLRSCSIINT: This is a documented bit in the CLRINT
register in newer Adaptec documentation, so use their name for it.
- Report valid residual byte counts.
- Don't mess with the target scratch areas > id 8 on single, narrow,
channel devices. The BIOS does a checksum of this area and can
flip out if we zero it out.
- Initialize the sequencer FLAGS scratch ram variable in the single
channel devices to 0. This was the cause of the annoying warning
where we would get a cmdcmplt the first time we did any type of
transfer negotiation with no valid scb. It also fixes the problem
that looked like the INTSTAT register wasn't clearing fast enough.
This only showed up on 294x cards, not motherboard aic7870s.
- Add the AHC_AIC7870 type and use it as the superset of aic7870
based controllers.
- clear the sync offset section of the targ scratch area so that
we default to asyncronous transfers. This was only a problem
for wide controllers because there was a scenario where the
offset wouldn't get updated before a data(out/in) phase would
occur. This required some change in the sequencer code since we
were depending on this field to hold the rate to negotiate.
- allow sync and wide negotiated commands to be tagged (the sequencer
now handles this properly).
commands per target. I could have followed the route of the ncr driver
and gone to great lengths to get the SCSI subsystem to support more, but
I think I'll use the time saved to help Julian and Peter make tagged
queuing a better handled generic feature. This also includes some comment
and enum clean up and a possible fix for the hanging PCI controllers.
message instead of relying on the fact that we are scheduled to send them.
The old method worked 99.9% of the time, but someone reported some periferals
that did MSG_REJECT at odd times (sometimes before we could send an SDTR
or WDTR) that we would construe as the response to an SDTR or WDTR message.
This also removes a possible race condition where after a bus reset (the
result of a command time out not during intial probe time), we might queue
two commands both requesting SDTR, WDTR or both.
WDTR, and message reject handlers so they don't need to exist in the
sequencer. All three of these cases are not on the critical path, so it
makes little sense to use up precious sequencer ram for them.