numerous error recovery buglets.
Many thanks to Tor Egge for his assistance in diagnosing problems with
the error recovery code.
aic7xxx.c:
Report missed bus free events using their own sequencer interrupt
code to avoid confusion with other "bad phase" interrupts.
Remove a delay used in debugging. This delay could only be hit
in certain, very extreme, error recovery scenarios.
Handle transceiver state changes correctly. You can now
plug an SE device into a hot-plug LVD bus without hanging
the controller.
When stepping through a critical section, panic if we step
more than a reasonable number of times.
After a bus reset, disable bus reset interupts until we either
our first attempt to (re)select another device, or another device
attemps to select us. This removes the need to busy wait in
kernel for the scsi reset line to fall yet still ensures we
see any reset events that impact the state of either our initiator
or target roles. Before this change, we had the potential of
servicing a "storm" of reset interrupts if the reset line was
held for a significant amount of time.
Indicate the current sequencer address whenever we dump the
card's state.
aic7xxx.reg:
Transceiver state change register definitions.
Add the missed bussfree sequencer interrupt code.
Re-enable the scsi reset interrupt if it has been
disabled before every attempt to (re)select a device
and when we have been selected as a target.
When being (re)selected, check to see if the selection
dissappeared just after we enabled our bus free interrupt.
If the bus has gone free again, go back to the idle loop
and wait for another selection.
Note two locations where we should change our behavior
if ATN is still raised. If ATN is raised during the
presentation of a command complete or disconnect message,
we should ignore the message and expect the target to put
us in msgout phase. We don't currently do this as it
requires some code re-arrangement so that critical sections
can be properly placed around our handling of these two
events. Otherwise, we cannot guarantee that the check of
ATN is atomic relative to our acking of the message in
byte (the kernel could assert ATN).
Only set the IDENTIFY_SEEN flag after we have settled
on the SCB for this transaction. The kernel looks at
this flag before assuming that SCB_TAG is valid. This
avoids confusion during certain types of error recovery.
Add a critical section around findSCB. We cannot allow
the kernel to remove an entry from the disconnected
list while we are traversing it. Ditto for get_free_or_disc_scb.
aic7xxx_freebsd.c:
Only assume that SCB_TAG is accurate if IDENTIFY_SEEN is
set in SEQ_FLAGS.
Fix a typo that caused us to execute some code for the
non-SCB paging case when paging SCBs. This only occurred
during error recovery.
When restarting the sequencer, ensure that the SCBCNT register
is 0. A non-zero count will prevent the setting of the CCSCBDIR
bit in any future dma operations. The only time CCSCBCNT would
be non-zero is if we happened to halt the dma during a reset,
but even that should never happen. Better safe than sorry.
When a command completes before the target responds to an
ATN for a recovery command, we now notify the kernel so that
any recovery operation requeued in the qinfifo can be removed
safely. In the past, we did this in ahc_done(), but ahc_done()
may be called without the card paused. This also avoids a
recursive call to ahc_search_qinifo() which could have occurred if
ahc_search_qinififo() happened to be the routine to complete
a recovery action.
Fix 8bit math used for adjusting the qinfifo. The index must
be wrapped properly within the 256 entry array. We rely on the
fact that qinfifonext is a uint8_t in most cases to handle
this wrap, but we missed a few spots where the resultant
calculation was promoted to an int.
Change the way that we deal with aborting the first or second
entry from the qinfifo. We now swap the first entry in the
qinfifo with the "next queued scb" to force the sequencer
to see an abort collision if we ever touch the qinififo while
the sequencer is mid SCB dma.
aic7xxx.reg:
Add new MKMSG_FAILED sequencer interrupt. This displaced
the BOGUS_TAG interrupt used in some previous sequencer code
debugging.
aic7xxx.seq:
Increment our position in the qinfifo only once the dma
is complete and we have verified that the queue has not
been changed during our DMA. This simplifies code in the
kernel.
Protect against "instruction creep" when issuing a pausing
sequencer interrupt. On at least the 7890/91/96/97, the
sequencer will coast after issuing the interrupt for up
to two instructions. In the past we delt with this by
using carefully placed nops. Now we call a routine to
issue the interrupt followed by a nop and a ret.
Tell the kernel should an SCB complete with the MK_MESSAGE
flag still set. This means the target ignored our ATN request.
Clear the channel twice as we exit the data phase. On the
aic7890/91, the S/G preload logic may require the second
clearing to get the last S/G out of the FIFO.
aic7xxx_freebsd.c:
Don't bother searching the qinfifo for a doubly queued
recovery scb in ahc_done. This case is handled by the
core driver now.
Free the path used to issue async callbacks after the callback
is complete.
aic7xxx_inline.h:
Split the SCB queue routine into a routine that swaps
the SCB with the "next queued SCB" and a routine that
calls the swapping routine and notifies the card of
the new SCB. The swapping routine is now also used by
ahc_search_qinfifo.
Filter incoming transfer negotiation requests to ensure they
never exceed the settings specified by the user.
In restart sequencer attempt to deal with a bug in the aic7895.
If a third party reset occurs at just the right time, the
stack register can lock up. When restarting the sequencer
after handling the SCSI reset, poke SEQADDR1 before resting
the sequencers program counter.
When something strange happens, dump the card's transaction
state via ahc_dump_card_state(). This should aid in debugging.
Handle request sense transactions via the QINFIFO instead of
attaching them to the waiting queue directly. The waiting
queue consumes card SCB resources and, in the pathological case
of every target on the bus beating our selection attemps and
issuing a check condition, could have caused us to run out
of SCBs. I have never seen this happen, and only early
cards with 3 or 4 SCBs had any real chance of ever getting
into this state.
Add additional sequencer interrupt codes to support firmware
diagnostics. The diagnostic code is enabled with the
AHC_DEBUG_SEQUENCER kernel option.
Make it possible to switch into and out of target mode on
the fly. The card comes up by default as an initiator but
will switch into target mode as soon as an enable lun operation
is performed. As always, target mode behavior is gated
by the AHC_TMODE_ENABLE kernel option so most users will
not be affected by this change.
In ahc_update_target_msg_request(), also issue a new
request if the ppr_options have changed.
Never issue a PPR as a target. It is forbidden by the spec.
Correct a bug in ahc_parse_msg() that prevented us from
responding to PPR messages as a target.
Mark SCBs that are on the untagged queue with a flag instead
of checking several fields in the SCB to see if the SCB should
be on the queue. This makes it easier for things like automatic
request sense requests to be queued without touching the
untagged queues even though they are untagged requests.
When dealing with ignore wide residue messages that occur
in the middle of a transfer, reset HADDR, not SHADDR for
non-ultra2 chips. Although SHADDR is where the firmware
fetches the ending transfer address for a save data pointers
request, it is readonly. Setting HADDR has the side effect
of also updating SHADDR.
Cleanup the output of ahc_dump_card_state() by nulling out the
free scb list in the non-paging case. The free list is only
used if we must page SCBs.
Correct the transmission of cdbs > 12 bytes in length. When
swapping HSCBs prior to notifing the sequencer of the new
transaction, the bus address pointer for the cdb must also
be recalculated to reflect its new location. We now defer
the calculation of the cdb address until just before queing
it to the card.
When pulling transfer negotiation settings out of scratch
ram, convert 5MHz/clock doubled settings to 10MHz.
Add a new function ahc_qinfifo_requeue_tail() for use by
error recovery actions and auto-request sense operations.
These operations always occur when the sequencer is paused,
so we can avoid the extra expense incurred in the normal
SCB queue method.
Use the BMOV instruction for all single byte moves on
controllers that support it. The bmov instruction is
twice as fast as an AND with an immediate of 0xFF as
is used on older controllers.
Correct a few bugs in ahc_dump_card_state(). If we have
hardware assisted queue registers, use them to get the
sequencer's idea of the head of the queue. When enumerating
the untagged queue, it helps to use the correct index for
the queue.
aic7xxx.h:
Indicate via a feature flag, which controllers can take
on both the target and the initiator role at the same time.
Add the AHC_SEQUENCER_DEBUG flag.
Add the SCB_CDB32_PTR flag used for dealing with cdbs
with lengths between 13 and 32 bytes.
Add new prototypes.
aic7xxx.reg:
Allow the SCSIBUSL register to be written to. This is
required to fix a selection timeout problem on the 7892/99.
Cleanup the sequencer interrupt codes so that all debugging
codes are grouped at the end of the list.
Correct the definition of the ULTRA_ENB and DISC_DSB locations
in scratch ram. This prevented the driver from properly honoring
these settings when no serial eeprom was available.
Remove an unused sequencer flag.
aic7xxx.seq:
Just before a potential select-out, clear the SCSIBUSL
register. Occasionally, during a selection timeout, the
contents of the register may be presented on the bus,
causing much confusion.
Add sequencer diagnostic code to detect software and or
hardware bugs. The code attempts to verify most list
operations so any corruption is caught before it occurs.
We also track information about why a particular reconnection
request was rejected.
Don't clobber the digital REQ/ACK filter setting in SXFRCTL0
when clearing the channel.
Fix a target mode bug that would cause us to return busy
status instead of queue full in respnse to a tagged transaction.
Cleanup the overrun case. It turns out that by simply
butting the chip in bitbucket mode, it will ack any
bytes until the phase changes. This drasticaly simplifies
things.
Prior to leaving the data phase, make sure that the S/G
preload queue is empty.
Remove code to place a request sense request on the waiting
queue. This is all handled by the kernel now.
Change the semantics of "findSCB". In the past, findSCB
ensured that a freshly paged in SCB appeared on the disconnected
list. The problem with this is that there is no guarantee that
the paged in SCB is for a disconnected transation. We now
defer any list manipulation to the caller who usually discards
the SCB via the free list.
Inline some busy target table operations.
Add a critical section to protect adding an SCB to
the disconnected list.
aic7xxx_freebsd.c:
Handle changes in the transfer negotiation setting API
to filter incoming requests. No filtering is necessary
for "goal" requests from the XPT.
Set the SCB_CDB32_PTR flag when queing a transaction with
a large cdb.
In ahc_timeout, only take action if the active SCB is
the timedout SCB. This deals with the case of two
transactions to the same device with different timeout
values.
Use ahc_qinfifo_requeu_tail() instead of home grown
version.
aic7xxx_inline.h:
Honor SCB_CDB32_PTR when queuing a new request.
aic7xxx_pci.c:
Use the maximum data fifo threshold for all chips.
past we stored this data in the CCB and attained the CCB via a pointer
in the SCB. In ahc_timeout(), however, the timedout SCB may have already
been completed (inherent race), meaning that the CCB could have been recycled,
and the ahc pointer reset.
Clean up the logic in ahc_search_qinfifo that deals with the busy device
table. For some reason it assumed that the only valid time to search
to see if additional lun entries should be checked was if lun 0 matched.
Now we properly itterate through the necessary luns. The busy device
table is used to detect invalid reselections, so a device would have had
to perform an unexpected reselection for this to cause problems. Further,
all luns are collapsed to a single entry unless we have external ram
with large SCBs (3940AU models) so the chance of this happening was
rather remote.
Clean up the logic for dealing with the untagged queues. We now set a
flag in the SCB that indicates that it is on the untagged queue instead
of inferring this from the type and setup of the CCB pased into us by
CAM.
In ahc_timeout(), don't print the path of the SCB until the controller
is paused and we are sure that it has not completed yet. This, in
conjunction with referencing the ahc pointer in the SCB rather than
the CCB in the SCB avoids panics in the case of a timedout scb completing
just before the timeout handler runs. This turns out to be guaranteed
if interrupt delivery is failing, as we run our interrupt handler to
flush any "just missed events" when a timeout occurs. Mention the
likelyhood of broken interrupts if a timedout SCB is completed by
our call to ahc_intr().
a resource shortage occurs, freeze our queue and then set the resource
shortage flag while the controller data structure is locked. The old
code did these in the wrong order potentially allowing our interrupt
handler to release the queue and clear the flag before the freeze
ever occurred.
aic7xxx.c:
In target mode, reset the TQINPOS on every restart of the sequencer.
In the past we did this only during a bus reset, but there are other
reasons the sequencer might be reset.
In ahc_clear_critical_section(), disable pausing chip interrupts while
we step the sequencer out of a critical section. This avoids the
possibility of getting a pausing interrupt (unexpected bus free,
bus reset, etc.) that would prevent the sequencer from stepping.
Send the correct async notifications in the case of a BDR or bus reset.
In ahc_loadseq(), correct the calculation of our critical sections.
In some cases, the sections would be larger than needed.
aic7xxx.h:
Remove an unused SCB flag.
aic7xxx.seq:
MK_MESSAGE is cleared by the kernel, there is no need to waste
a sequencer instruction clearing it.
aic7xxx_freebsd.c:
Go through the host message loop instead of issuing a single
byte message directly in the ahc_timeout() case where we
are currently on the bus to the device. The effect is the same,
but this way we get a nice printf saying that an expected BDR
was delivered instead of an unexpected bus free.
If we are requeuing an SCB for an error recovery action, be sure
to set the DISCONNECTED flag in the in-core version of the SCB.
This ensures that, in the SCB-paging case, the sequencer will
still recognize the reselection as valid even if the version
of the SCB with this flag set was never previously paged out
to system memory. In the non-paging case, set the MK_MESSAGE
flag in SCB_CONTROL directly.
aic7xxx_pci.c:
Enable the Memeory Write and Invalidate bug workaround for
all aic7880 chips with revs < 1. This bug is rarely triggered
in FreeBSD as most transfers end on cache-aligned boundaries,
but a recheck of my references indicates that these chips
are affected.
non-LVD controllers. We only need to take special action on the qinfifo
if we have dectected the case of an SCB that has been removed from the
qinfifo but has not been fully DMAed to the controller. A missing
conditional caused this code to be executed every time an SCB was
aborted from the queue
Don't attempt to print the path of an SCB that has been freed.
Clean up the traversal of the pending scb list in
ahc_update_pending_syncrates(). This has no functional change.
Correct ahc_timeout()'s requeing of a timedout SCB to effect a
recovery action. We now use ahc_qinfifo_requeue() and a
new function ahc_qinfifo_count() instead of performing the
requeue inline. The old code did not conform to the new qinfifo
method.
Clear the timedout SCB from the disconnected list. This ensures
that the SCB_NEXT field is free to be used for queuing us to
the qinfifo.
In ahc_search_qinfifo, the SEARCH_REMOVE case must also handle
an SCB that has been removed from the QINFIFO but not yet been
fully dmaed to the card.
Correct locking for ahc_get_scb() calls.
Set SCB syncrate settings in ahc_execute_scb() to avoid a race
condition that could allow a newly queued SCB to be missed
by ahc_update_pending_syncrates().
When notifying the system of transfer negotiation updates, only
set the valid bits for tagged queuing and disconnection if the
path is fully qualified. Sync/Wide settins apply to all luns
of a target, but tagged queuing and disconnection may change
on a per-lun basis.
Add missing ahc_unlock() calls in ahc_timeout() for the target
mode case.
of two (one to access the circular input fifo, the other to get the SCB).
This costs us a command slot so the driver can now only queue 254
simultaneous commands.
Have the kernel driver honor critical sections in sequencer code.
When prefetching S/G segments only pull a cacheline's worth but
never less than two elements. This reduces the impact of the
prefetch on the main data transfer when compared to the 128
byte fetches the driver used to do.
Add "bootverbose" logging for transfer negotiations.
Correct a bug in ahc_set_syncrate() that would prevent an update
of the sync parameters if only the ppr_options had changed.
Correct locking for calls to ahc_free_scb(). ahc_free_scb() is no
longer protected internally to simplify ports to other platforms.
Make sure we unfreeze our SIMQ if a resource shortage has occurred
and an SCB is been freed.
ahc_pci.c:
Turn on cacheline streaming for all controllers that support it.
Clarify diagnostic messages about PCI interrupts.
ahc_pci.c:
Bring back the AHC_ALLOW_MEMIO option at least until the
memory mapped I/O problem on the SuperMicro 370DR3 is
better understood.
aic7xxx.c:
If we see a spurious SCSI interrupt, attempt to clear it and
continue by unpausing the sequencer.
Change the interface to ahc_send_async(). Some async messages
need to be broadcast to all the luns of a target or all the
targets of a bus. This is easier to achieve by passing explicit
channel, target, and lun parameters instead of attempting to
construct a device info struct to match.
Filter the sync parameters for the PPR message in exactly the
same way we do for an old fashioned SDTR message.
Correct some typos and correct a panic message.
Handle rejected PPR messages.
In ahc_handle_msg_reject(), let ahc_build_transfer_msg() build
any additional transfer messages instead of doing this inline.
aic7xxx.h:
Increase the size of both msgout_buf and msgin_buf to
better accomodate PPR messages.
aic7xxx_freebsd.c:
Update for change in ahc_send_async() parameters.
aic7xxx_freebsd.h
Update for change in ahc_send_async() parameters.
Honor AHC_ALLOW_MEMIO.
aic7xxx_pci.c:
Check the error register before going into full blown PCI
interrupt handling. This avoids a few costly PCI configuration
space reads when we run our PCI interrupt handler because another
device sharing our interrupt line is more active than we are.
Also unpause the sequencer after processing a PCI interrupt.
ahc->unit is depricated and will be going away as soon as the Linux
driver catches up. In the FreeBSD case, it is always initialized to 0
and this caused some strangeness in registering multiple ahc controllers
with CAM.
Noticed by: Tor.Egge@fast.no
Separate our platform independent hooks from core driver functionality
shared between platforms (FreeBSD and Linux at this time).
Add sequencer workarounds for several chip->chipset interactions.
Correct external SCB corruption problem on aic7895 based cards (3940AUW).
Lots of cleanups resulting from the port to another OS.