freebsd-nq/sys/dev
Matt Jacob 28445eef28 Fix usage of DELAY (SYS_DELAY is the platform independent local
define).  Fix stupidity wrt checking whether we've gone to
LOOP_PDB_RCVD loopstate- it's okay to be greater than this state.
D'oh! Protect calls to isp_pdb_sync and isp_fclink_state with IS_FC
macros.

Completely redo mailbox command routine (in preparation to make this
possibly wait rather than poll for completion).

Make a major attempt to solve the 'lost interrupt' problem

1. Problem

The Qlogic cards would appear to 'lose' interrupts, i.e., a legitimate
regular SCSI command placed on the request queue would never complete
and the watchdog routine in the driver would eventually wakeup and
catch it. This would typically only happen on Alphas, although a
couple folks with 700MHz Intel platforms have also seen this.

For a long time I thought it was a foulup with f/w negotiations of
SYNC and/or WIDE as it always seemed to happen right after the
platform it was running on had done a SET TARGET PARAMETERS mailbox
command to (re)enable sync && wide (after initially forcing
ASYNC/NARROW at startup). However, occasionally, the same thing
would also occur for the Fibre Channel cards as well (which, ahem,
have no SET TARGET PARAMETERS for transfer mode).

After finally putting in a better set of watchdog routines for the
platforms for this driver, it seemed to be the case that the command
in question (usually a READ CAPACITY) just had up and died- the
watchdog routine would catch it after ~10 seconds. For some platforms
(NetBSD/OpenBSD)- an ABORT COMMAND mailbox command was sent (which
would always fail- indicating that the f/w denied knowledge of this
command, i.e., the f/w thought it was a done command). In any case,
retrying the command worked. But this whole problem needed to be
really fixed.

2. A False Step That Went in The Right Direction

The mailbox code was completely rewritten to no longer try and grab
the mailbox semaphore register and to try and 'by hand' complete
async fast posting completions. It was also rewritten to now have
separate in && out bitpatterns for registers to load to start and
retrieve to complete. This means that isp_intr now handles mailbox
completions.

This substantially simplifies the mailbox handling code, and carries
things 90% toward getting this to be a non-polled routine for this
driver.

This did not solve the problem, though.

3. Register Debouncing

I saw some comments in some errata sheets and some notes in a Qlogic
produced Linux driver (for the Qlogic 2100) that seemed to indicate
that debouncing of reads of the mailbox registers might be needed,
so I added this.  This did not affect the problem. In fact, it made
the problem worse for non-2100 cards.

5. Interrupt masking/unmasking

The driver *used* to do a substantial amount of masking/unmasking
of the interrupt control register. This was done to make sure that
the core common code could just assume it would never get pre-empted.

This apparently substantially contributed to the lost interrupt
problem.  The rewrite of the ICR (Interrupt Control Register),
which is a separate register from the ISR (Interrupt Status Register)
should not have caused any change to interrupt assertions pending.
The manual does not state that it will, and the register layout
seems to imply that the ICR is just an active route gate. We only
enable PCI Interrupts and RISC Interrupts- this should mean that
when the f/w asserts a RISC interrupt and (and the ICR allows RISC
Interrupts) and we have PCI Interrupts enabled, we should get a
PCI interrupt. Apparently this is a latch- not a signal route.

Removing this got rid of *most* but not all, lost interrupts.

5. Watchdog Smartening

I made sure that the watchdog routine would catch cases where the
Qlogic's ISR showed an interrupt assertion. The watchdog routine
now calls the interrupt service routine if it sees this. Some
additional internal state flags were added so that the watchdog
routine could then know whether the command it was in the middle
of burying (because we had time it out) was in fact completed by
the interrupt service routine.

6. Occasional Constipation Of Commands..

In running some very strenous high IOPs tests (generating about
11000 interrupts/second across one Qlogic 1040, one Qlogic 1080
and one Qlogic 2200 on an Alpha PC164), I found that I would get
occasional but regular 'watchdog timeouts' on both the 1080 and
the 2100 cards. This is under FreeBSD, and the watchdog timeout
routine just marks the command in error and retries it.

Invariably, right after this 'watchdog timeout' error, I'd get a
command completion for the command that I had thought timed out.
That is, I'd get a command completion, but the handle returned by
the firmware mapped to no current command. The frequency of this
problem is low under such a load- it would usually take an 30
minutes per 'lost' interrupt.

I doubled the timeout for commands to see if it just was an edge
case of waiting too short a period. This has no effect.

I gathered and printed out microtimes for the watchdog completed
command and the completion that couldn't find a command- it was
always the case that the order of occurrence was "timeout, completion"
separated by a time on the order of 100 to 150 ms.

This caused me to consider 'firmware constipation' as to be a
possible culprit. That is, resubmission of a command to the device
that had suffered a watchdog timeout seemed to cause the presumed
dead command to show back up.

I added code in the watchdog routine that, when first entered for
the command, marks the command with a flag, reissues a local timeout
call for one second later, but also then issues a MARKER Request
Queue entry to the Qlogic f/w. A MARKER entry is used typically
after a Bus Reset to cause the f/w to get synchronized with respect
to either a Bus, a Nexus or a Target.

Since I've added this code, I always now see the occasional watchdog
timeout, but the command that was about to be terminated always
now seems to be completed after the MARKER entry is issued (and
before the timeout extension fires, which would come back and
*really* terminate the command).
2000-06-27 19:44:31 +00:00
..
advansys Fix typo (accessable --> accessible). 2000-06-14 17:53:40 +00:00
agp Release resources properly in detach. 2000-06-10 17:53:20 +00:00
aha Fix typo (accessable --> accessible). 2000-06-14 17:53:40 +00:00
ahb Fix typo (accessable --> accessible). 2000-06-14 17:53:40 +00:00
aic Terminate aic_ids[] 2000-06-19 22:16:14 +00:00
aic7xxx Fix typo (accessable --> accessible). 2000-06-14 17:53:40 +00:00
amd Back out the previous change to the queue(3) interface. 2000-05-26 02:09:24 +00:00
amr The AMI MegaRAID's internal memory map conflicts with scatter/gather 2000-06-10 19:22:39 +00:00
an - Add suser check before SIOCSAIRONET. 2000-06-18 23:40:09 +00:00
ar Mass update of isa drivers using compatability shims to use 2000-05-28 13:40:48 +00:00
ata Add disk_enumerate() for finding names of disks. Vinum and libh will 2000-06-15 20:30:53 +00:00
atkbdc Manipulate with AltGR Led (really CapsLock Led) only in K_XLATE mode, because 2000-05-28 12:43:24 +00:00
awi We always provide the bpf hooks. Remove #include "bpf.h"/NBPF. 2000-06-10 07:16:14 +00:00
bktr Update to driver 2.13. 2000-06-26 09:41:32 +00:00
buslogic Fix typo (accessable --> accessible). 2000-06-14 17:53:40 +00:00
cardbus Sync to latest cardbusdevs file 1999-11-18 07:22:59 +00:00
ccd Separate the struct bio related stuff out of <sys/buf.h> into 2000-05-05 09:59:14 +00:00
cs Move code to handle BPF and bridging for incoming Ethernet packets out 2000-05-14 02:18:43 +00:00
cy Mass update of isa drivers using compatability shims to use 2000-05-28 13:40:48 +00:00
dc Add support for the Accton EN1217. 2000-06-11 11:54:52 +00:00
de Use the correct name for the PCI command register (PCIR_COMMAND). Don't 2000-05-28 16:06:56 +00:00
dec Add missing $FreeBSD$ 2000-05-01 19:54:26 +00:00
dgb Mass update of isa drivers using compatability shims to use 2000-05-28 13:40:48 +00:00
dpt Use correct register values. This one was in aic7xxx and advansys too. 2000-05-28 15:50:40 +00:00
ed Allow newer Linksys 10/100 PCMCIA cards to work. 2000-06-18 05:50:16 +00:00
eisa Back out the previous change to the queue(3) interface. 2000-05-26 02:09:24 +00:00
en Ahhrggg. Put the test for the compat shims AFTER the file that includes 2000-03-27 20:24:02 +00:00
ep Move code to handle BPF and bridging for incoming Ethernet packets out 2000-05-14 02:18:43 +00:00
ex Unused include: #include "ex.h" 2000-06-10 11:09:03 +00:00
fb Unused include: #include "fb.h" 2000-06-10 06:41:11 +00:00
fdc Step down a level and issue format requests with a struct bio instead 2000-05-06 07:01:47 +00:00
fe Mass update of isa drivers using compatability shims to use 2000-05-28 13:40:48 +00:00
fxp Implemented some optimizations which result in 14 fewer instructions in the 2000-06-19 00:58:34 +00:00
hea Remove un-needed #include's. 2000-01-17 20:49:59 +00:00
hfa Ensure that DMA mappings are freed in error situations. 2000-01-15 21:01:04 +00:00
ic Add $FreeBSD$ 2000-05-01 20:32:07 +00:00
ida Back out the previous change to the queue(3) interface. 2000-05-26 02:09:24 +00:00
ie Move code to handle BPF and bridging for incoming Ethernet packets out 2000-05-14 02:18:43 +00:00
iicbus Allow these drivers to be detached. 2000-06-16 07:20:29 +00:00
isp Fix usage of DELAY (SYS_DELAY is the platform independent local 2000-06-27 19:44:31 +00:00
ispfw Add in (separate files for different board's firmware) new files for ispfw 2000-06-18 04:37:44 +00:00
joy Add ADS7182 as a known Joystick. 2000-01-18 08:38:35 +00:00
kbd Manipulate with AltGR Led (really CapsLock Led) only in K_XLATE mode, because 2000-05-28 12:43:24 +00:00
lmc Adjust to accomodate recent changes to the rcvdata and rcvmsg 2000-05-01 03:31:58 +00:00
lnc MF4: add support for the Am79C973. 2000-06-18 08:12:54 +00:00
mc146818 Add missing $FreeBSD$ 2000-05-01 19:54:26 +00:00
mca Set the RF_SHAREABLE flage when we allocate an IRQ. 2000-03-13 11:43:53 +00:00
mcd Mass update of isa drivers using compatability shims to use 2000-05-28 13:40:48 +00:00
md Separate the struct bio related stuff out of <sys/buf.h> into 2000-05-05 09:59:14 +00:00
mii Added Altima Communications OUI and their AC101 10/100 2000-06-21 19:26:01 +00:00
mlx Back out the previous change to the queue(3) interface. 2000-05-26 02:09:24 +00:00
mse - `Newbus'ified the driver. 2000-03-18 15:13:30 +00:00
musycc Checkpoint commit. I can actually receive HDLC frames now. 2000-06-21 14:47:18 +00:00
null New machine independant /dev/null and /dev/zero driver. This device is 2000-06-25 08:32:39 +00:00
nulldev New machine independant /dev/null and /dev/zero driver. This device is 2000-06-25 08:32:39 +00:00
pccard Matching commits to pccard for last pcic changes. We now at least to 2000-06-18 05:28:59 +00:00
pcf Remove ~25 unneeded #include <sys/conf.h> 2000-04-19 14:58:28 +00:00
pci Nuke the useless chip driver. It gets in the way when you want to load 2000-06-09 16:00:29 +00:00
pcic Almost make loading work. This is a checkpoint. With these change we 2000-06-18 05:25:30 +00:00
pdq Uh, ya, sure this almost compiled for __bsdi__. NOT! 2000-05-21 05:33:40 +00:00
ppbus Unused include: #include "pps.h" 2000-06-10 11:14:19 +00:00
ppc Only print the diagnostic about extended I/O ports if bootverbose is true. 2000-06-25 09:20:56 +00:00
random I am guilty of an act of ommission. There is no longer a /dev/urandom 2000-06-27 09:38:40 +00:00
randomdev I am guilty of an act of ommission. There is no longer a /dev/urandom 2000-06-27 09:38:40 +00:00
ray Subtle Tx bugs - I wonder why the cast wans't picked up... 2000-06-21 21:37:27 +00:00
rc Mass update of isa drivers using compatability shims to use 2000-05-28 13:40:48 +00:00
rp - Eliminate rpread(). Call generic ttyread(). (cf rev 1.33) 2000-06-12 15:21:59 +00:00
scd Mass update of isa drivers using compatability shims to use 2000-05-28 13:40:48 +00:00
sf Use the correct register name. s/PCI_COMMAND_STATUS_REG/PCIR_COMMAND/ 2000-05-28 16:13:43 +00:00
si Always leave SP_DCEN on (monitor DCD). Otherwise the firmware *really* 2000-01-25 16:45:54 +00:00
sio Add option ALT_BREAK_TO_DEBUGGER. 2000-06-14 06:41:33 +00:00
sk - Call mii_pollstat() after we bring up the link on a 1000baseTX card 2000-06-06 02:56:37 +00:00
smbus Remove unneeded #include <sys/kernel.h> 2000-04-29 15:36:14 +00:00
sn Move code to handle BPF and bridging for incoming Ethernet packets out 2000-05-14 02:18:43 +00:00
snp Unstaticize this driver. You can have as many snoop devices as you can 2000-04-02 00:35:37 +00:00
sound add record channel irq timeouts too 2000-06-20 23:42:08 +00:00
speaker Add PnP probe methods to some common AT hardware drivers. In each case, 2000-06-23 07:44:33 +00:00
sr Mass update of isa drivers using compatability shims to use 2000-05-28 13:40:48 +00:00
streams Back out the previous change to the queue(3) interface. 2000-05-26 02:09:24 +00:00
sym - Fix a harmless compilation warning on Alpha. 2000-06-26 21:09:45 +00:00
syscons Remove old entropy-harvesting hooks; this is going to be re-engineered 2000-06-25 09:55:12 +00:00
tdfx Stupid me, I put the opt_tdfx.h underneath a test for TDFX_LINUX, which 2000-06-24 06:20:55 +00:00
ti Use the correct register name. s/PCI_COMMAND_STATUS_REG/PCIR_COMMAND/ 2000-05-28 16:13:43 +00:00
twe Initial import of a driver for the 3ware Escalade family of ATA RAID 2000-05-24 23:35:23 +00:00
tx Added support for SMC9432BTX cards. 2000-06-21 19:19:49 +00:00
usb Inverted error messages. 2000-06-15 15:23:12 +00:00
vinum start_object: Set the revive length correctly. 2000-06-07 03:34:18 +00:00
vn Back out the previous change to the queue(3) interface. 2000-05-26 02:09:24 +00:00
vr Use the correct register name. s/PCI_COMMAND_STATUS_REG/PCIR_COMMAND/ 2000-05-28 16:13:43 +00:00
vx Warn that this as an oldpci device.. 2000-05-28 15:59:52 +00:00
wi Bring the an(4) fixes to wi(4): 2000-06-19 00:17:13 +00:00
wl Mass update of isa drivers using compatability shims to use 2000-05-28 13:40:48 +00:00
xe Add support for the modem side of the 56k combo card. 2000-05-30 05:42:57 +00:00