Commit Graph

48 Commits

Author SHA1 Message Date
Matt Jacob
c77d11d0cc Raise debug level for some messages. Fix botched inversion
about MBOX_COMMAND_ERROR vs. MBOX_COMMAND_PARAM_ERROR.
2000-07-18 06:46:48 +00:00
Matt Jacob
3e97a5b432 Clean up ISPCTL_ABORT_CMD function to not be too chatty if it succeeds,
or even if it fails with INVALID_PARM (which just means that the handle
doesn't refer to an active commane).
2000-07-05 06:41:36 +00:00
Matt Jacob
1d460ef8d5 Change delay loop in new isp_mboxcmd to the use of the new MBOX_WAIT_COMPLETE
macro. Change notification of completion of a mailbox command in isp_intr
to MBOX_NOTIFY_COMPLETE macro.
2000-07-04 01:02:38 +00:00
Matt Jacob
28445eef28 Fix usage of DELAY (SYS_DELAY is the platform independent local
define).  Fix stupidity wrt checking whether we've gone to
LOOP_PDB_RCVD loopstate- it's okay to be greater than this state.
D'oh! Protect calls to isp_pdb_sync and isp_fclink_state with IS_FC
macros.

Completely redo mailbox command routine (in preparation to make this
possibly wait rather than poll for completion).

Make a major attempt to solve the 'lost interrupt' problem

1. Problem

The Qlogic cards would appear to 'lose' interrupts, i.e., a legitimate
regular SCSI command placed on the request queue would never complete
and the watchdog routine in the driver would eventually wakeup and
catch it. This would typically only happen on Alphas, although a
couple folks with 700MHz Intel platforms have also seen this.

For a long time I thought it was a foulup with f/w negotiations of
SYNC and/or WIDE as it always seemed to happen right after the
platform it was running on had done a SET TARGET PARAMETERS mailbox
command to (re)enable sync && wide (after initially forcing
ASYNC/NARROW at startup). However, occasionally, the same thing
would also occur for the Fibre Channel cards as well (which, ahem,
have no SET TARGET PARAMETERS for transfer mode).

After finally putting in a better set of watchdog routines for the
platforms for this driver, it seemed to be the case that the command
in question (usually a READ CAPACITY) just had up and died- the
watchdog routine would catch it after ~10 seconds. For some platforms
(NetBSD/OpenBSD)- an ABORT COMMAND mailbox command was sent (which
would always fail- indicating that the f/w denied knowledge of this
command, i.e., the f/w thought it was a done command). In any case,
retrying the command worked. But this whole problem needed to be
really fixed.

2. A False Step That Went in The Right Direction

The mailbox code was completely rewritten to no longer try and grab
the mailbox semaphore register and to try and 'by hand' complete
async fast posting completions. It was also rewritten to now have
separate in && out bitpatterns for registers to load to start and
retrieve to complete. This means that isp_intr now handles mailbox
completions.

This substantially simplifies the mailbox handling code, and carries
things 90% toward getting this to be a non-polled routine for this
driver.

This did not solve the problem, though.

3. Register Debouncing

I saw some comments in some errata sheets and some notes in a Qlogic
produced Linux driver (for the Qlogic 2100) that seemed to indicate
that debouncing of reads of the mailbox registers might be needed,
so I added this.  This did not affect the problem. In fact, it made
the problem worse for non-2100 cards.

5. Interrupt masking/unmasking

The driver *used* to do a substantial amount of masking/unmasking
of the interrupt control register. This was done to make sure that
the core common code could just assume it would never get pre-empted.

This apparently substantially contributed to the lost interrupt
problem.  The rewrite of the ICR (Interrupt Control Register),
which is a separate register from the ISR (Interrupt Status Register)
should not have caused any change to interrupt assertions pending.
The manual does not state that it will, and the register layout
seems to imply that the ICR is just an active route gate. We only
enable PCI Interrupts and RISC Interrupts- this should mean that
when the f/w asserts a RISC interrupt and (and the ICR allows RISC
Interrupts) and we have PCI Interrupts enabled, we should get a
PCI interrupt. Apparently this is a latch- not a signal route.

Removing this got rid of *most* but not all, lost interrupts.

5. Watchdog Smartening

I made sure that the watchdog routine would catch cases where the
Qlogic's ISR showed an interrupt assertion. The watchdog routine
now calls the interrupt service routine if it sees this. Some
additional internal state flags were added so that the watchdog
routine could then know whether the command it was in the middle
of burying (because we had time it out) was in fact completed by
the interrupt service routine.

6. Occasional Constipation Of Commands..

In running some very strenous high IOPs tests (generating about
11000 interrupts/second across one Qlogic 1040, one Qlogic 1080
and one Qlogic 2200 on an Alpha PC164), I found that I would get
occasional but regular 'watchdog timeouts' on both the 1080 and
the 2100 cards. This is under FreeBSD, and the watchdog timeout
routine just marks the command in error and retries it.

Invariably, right after this 'watchdog timeout' error, I'd get a
command completion for the command that I had thought timed out.
That is, I'd get a command completion, but the handle returned by
the firmware mapped to no current command. The frequency of this
problem is low under such a load- it would usually take an 30
minutes per 'lost' interrupt.

I doubled the timeout for commands to see if it just was an edge
case of waiting too short a period. This has no effect.

I gathered and printed out microtimes for the watchdog completed
command and the completion that couldn't find a command- it was
always the case that the order of occurrence was "timeout, completion"
separated by a time on the order of 100 to 150 ms.

This caused me to consider 'firmware constipation' as to be a
possible culprit. That is, resubmission of a command to the device
that had suffered a watchdog timeout seemed to cause the presumed
dead command to show back up.

I added code in the watchdog routine that, when first entered for
the command, marks the command with a flag, reissues a local timeout
call for one second later, but also then issues a MARKER Request
Queue entry to the Qlogic f/w. A MARKER entry is used typically
after a Bus Reset to cause the f/w to get synchronized with respect
to either a Bus, a Nexus or a Target.

Since I've added this code, I always now see the occasional watchdog
timeout, but the command that was about to be terminated always
now seems to be completed after the MARKER entry is issued (and
before the timeout extension fires, which would come back and
*really* terminate the command).
2000-06-27 19:44:31 +00:00
Matt Jacob
fb1d37adcd Once we have firmware running (if isp_reset) and this is the first time
through, establish what our LUN width is. Unfortunately, we can't ask
the f/w. If we loaded the f/w, we'll now assume we have expanded LUNs
(SCCLUN for fibre channel, just plain 32 LUN for SCSI). If we didn't
load firmware, assume 8 LUNs for SCSI and 1 LUN for Fibre Channel. We
have to assume only one LUN for Fibre Channel because the LUN setting
in Request Queue entries is in different places whether we have SCCLUN
firmware or not, so the only LUN guaranteed to work for both is LUN 0.

Clean up the rest of isp.c so that ISP2100_SCCLUN defines aren't used-
instead use run time determinants based upon isp->isp_maxluns.

After starting firmware, delay 500us to give it a chance to get rolling.

Fix the interrupt service routine to check for both isr && sema being zero
before thinking this was a spurious interrupt.  Following the manuals,
allow for both Mailbox as well as Queue Reponse type interrupts for regular
SCSI.
2000-06-18 04:56:17 +00:00
Matt Jacob
6d1d7d4c87 Fix some breakage about how we build WWNs. Do some other fabric related
changes: consider a new PDB entry different if Class 3 service parameter
roles change (!!!). Do some checking as we're getting a port database
that traps whether things change while we're doing so. Handle N-port
and F-ports correctly. Fix the fabric login loop to retain a login/binding
if things haven't changed (I mean, why logout a device only to log it back
in). No longer accept, after fabric logins, garbage if we can't get a PDB
entry that matches the device we've just logged into- if it doesn't, log
it out as it is very unlikely to still be what we thought it was. Get rid
of some of the debounce loops because we could get stuck there.
2000-05-09 01:14:43 +00:00
Matt Jacob
c88f65e2c0 Pick up topology more sanely at f/w startup. Change the restrictions of
where we can have targets (based on topology).

Much more importantly, make sure all mods to isp_sendmarker or |= so
we don't lose the marking of a bus that needs to have a marker sent for it.
2000-04-21 02:04:34 +00:00
Matt Jacob
cf74f2682e Slightly cleaner fabric support (whiter whites! redder reds!).. No,
seriously- only attempt to logout a previously logged in fabric device.

Fix a longstanding bug for aborting overtime commands- handle halves
have always been reversed.

Clean up some error messages to indicate channel number.

Approved:jkh
2000-02-29 05:52:14 +00:00
Matt Jacob
fe4d046167 Clean out residual bogosity for fast posting stuff- ISP_NO_FASTPOST_SCSI
is gone as a define. We just don't support fast posting for anything less
than the 1240/1080/1280/12160 or Fibre Channel cards.

Put in support for CDB's larger than 12 bytes for parallel SCSI (up to 44
bytes are allowed).

Approved: jkh
2000-02-15 00:35:00 +00:00
Matt Jacob
0f38a25b52 Restructure nvram reading routine to split out to separate functions
for 1020/1X80/12160/2X00- for readability. Add in 12160 (Ultra3)
support- but not with PPR just yet.  Fix and clarify fetching of
return parameter for getting firmware rev which for the 2200 contains
the connection topology (Private Loop (NL-port), N-port, FL-port,
F-port). Synthesize the connection topology for the 2100 which can
only be Private Loop or FL-port. Handle a couple of new async
mailbox commands which signify connection in Point-to-Point mode
(N-port or F-port) or indicate various toe stubbing getting to same.

Approved: jkh@freebsd.org
2000-02-11 19:31:32 +00:00
Matt Jacob
0719e3345c clean up for SBus Ultra (yes, we do not do that here yet) 2000-01-15 01:52:01 +00:00
Matt Jacob
e85919b9f8 change debug printout lefvels for a couple of places 2000-01-09 21:47:39 +00:00
Matt Jacob
3da7ba4d41 Make Fibre Channel cards correctly note the presence/absence
of ARQ data and punt the dealing with its presence/absence
to the platform layers.
2000-01-04 03:44:21 +00:00
Matt Jacob
ac1fd1487e Raise default FCP logintime to 60 seconds. Move the position
of where we could have seen the loop up at least once so it
makes sense. Change some stuff in ispscsicmd so we don't get
stuck there if the loop has never come up yet. Add in some
target mode support code.
2000-01-03 23:52:41 +00:00
Matt Jacob
9ee303fb46 Clean up some f/w revision checking wrt enabling fast posting.
Make sure we set defaults sanely for dual-bus adapters.
1999-12-20 01:34:01 +00:00
Matt Jacob
22e1dc858b Add Dual LVD bus (1280) support 1999-12-16 05:42:02 +00:00
Matt Jacob
7457966f26 turn some messages into CFGPRINT messages 1999-12-03 06:55:39 +00:00
Matt Jacob
38dace9790 Clean up stupidity in the isp_handle_other_response function- indexes
of queue entries have to be at least 16 bits now! If we're running
a 2100 less than rev 5, turn off loop fairness (per Qlogic errata). Fix
typo in checking against 2200 F/W revision. Slightly fix/reorder fabric
login stuff. Change to usage of isp_getrqentry for code clarity. Add some
defensive dual bus assumptions. Various cleanups, etc...
1999-11-21 03:18:22 +00:00
Matt Jacob
fdc79fd3fc correct moronic typo 1999-11-01 04:39:52 +00:00
Matt Jacob
03322f8625 Use pointer to f/w in md structure as to whether f/w exists or not.
If firmware length isn't specified, extract from the 4th short into
the firmware.
1999-10-30 19:32:44 +00:00
Matt Jacob
2668d67e9c I was misinformed. I cannot get away from specifying tags for FC. Some devices
are happy w/o them- some are unhappy (IBM drives).
1999-10-28 02:48:42 +00:00
Matt Jacob
83d62096d6 nuke a debug printout I thought I had already nuked 1999-10-26 22:25:13 +00:00
Matt Jacob
5e73516b7b remember to initialize mailbox 2 for FC isp bus resets 1999-10-22 17:03:03 +00:00
Matt Jacob
fc0685ea06 Remove some target mode stuff. It will get re-introduced in a different
file later. Do some pencil-sharpening types of minor changes. Change
how active commands are remembered (using new inline functions to get
handles, etc..). Now do a GET FIRMWARE STATUS after firing up the f/w as
outgoing mailbox 2 will tell you the f/w's notion of the max commands
that can be supported. Attempt to retrieve loop topology. Add in the
appropriate SWIZZLE/UNSWIZZLE macros calls (this is a no-op on Little
Endian machines but is needed for sparc (on other platforms)). Move
the temp port database we use to find out where things have moved to
after a LIP to the softc and off the kernel stack. Follow Qlogic's
hint and don't bother setting a tag for commands that don't have
this enabled (presumably the f/w will do it's own selection then).
Use an INT_PENDING macro to check for an interrupt. The call to
ISP_DMAFREE now just takes the handle- not the 'handle-1' which was
a layering violation. Use CFGPRINTF in a couple of places to make
things less chatty if not booting verbose, or CAMDEBUG compiles, etc..
1999-10-17 18:58:22 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Matt Jacob
ce7f792d94 More code cleanup. Go back to using FULL_LOGIN Fibre Chan if f/w is less than
1.17.0 level. Change where we do the loop database init. Add in the CMD_RQLATER
return. Add some register debounce.
1999-08-16 19:59:55 +00:00
Matt Jacob
3692397b0d add 2200 f/w; fix botched define 1999-07-05 20:42:08 +00:00
Matt Jacob
83cdc1a2b0 Roll revision levels. Add support for the Qlogic 2200 (warn about
not having SCSI_ISP_SCCLUN config defined if we don't have f/w for
the 2200- it's resident firmware uses SCCLUN (65535 luns)). Change
the way the default LoopID is gathered (it's now a platform specific
define so that some attempt at a synthetic WWN can be made in case
NVRAM isn't readable).

Change initialization of options a bit- don't use ADISC. Set
FullDuplex mode if config options tells us to do so. Do not use
FULL_LOGIN after LIP- it's the right thing to do but it causes too
much loop disruption (Loop Resets). Sanity check some default
values. Redo construction of port and node WWNs based upon what we
have- if we have 2 in the top nibble, we can have distinct port
and node WWNs. Clean up some SCCLUN related code that we obviously
had never compiled (:-(). Audit commands coming int ispscsicmd and
don't throw commands at Fibre devices that do not have Class 3
service parameters TARGET ROLE defined.

Clean up f/w initialization a bit. Add Fabric support (or at least
the first blush of it). Whew - way too much to describe here.
Basically, after a LIP, hang out until we see a Loop Up or a Port
DataBase Change async event, then see if we're on a Fabric
(GET_PORT_NAME of FL_PORT_ID). If we are, try and scan the fabric
controller for fabric devices using the GetAllNext SNS subcommand.
As we find devices, announce them to the outer layer. Try and do
some guard code for broken (Brocade) SNS servers (that get stuck
in loops- gotta maybe do this a different way using the GP_ID3 cmd
instead).  Then do a scan of the lower (local loop) ids using a
GET_PORT_NAME to see if the f/w has logged into anything at that
loop id. If so, then do a GET_PORT_DATABASE command.  Do this scan
into a local database. At this point we can say the loop is 'Ready'.
After this, we merge our local loop port database with our stored
port database- in a as yet to be really fully exercised fashion we
try and follow the logic of something having moved around. The
first time we see something at a Loop ID, we fix it, for the purpose
of this system instance, at that Loop ID. If things shift around
so it ends up somewhere else, we still keep it at this Loop ID (our
'Target') but use the new (moved) Loop ID when we actually throw
commands at it. Check for insane cases of different Loop IDs both
claiming to have the same WWN- if that happens, invalidate both.
Notify the outer layer of devices that have arrived and devices
that have gone away. *Finally*, when this is done, search the
softc's database of Fabric devices and perform logout/login actions.
The Qlogic f/w maintains logout/login for all local loop devices.
We have to maintain logout/login for fabric devices- total PITA.
Expect to see this area undergo more change over time.
1999-07-02 23:06:38 +00:00
Matt Jacob
442257d9c5 be a bit more chatty about some speed negotiations 1999-05-12 18:56:55 +00:00
Matt Jacob
5a025c82c6 Some massive thwunking in initialization to handle dual bus adapters. More
massive thwunking to include an XS_CHANNEL value. Some changes of how
parameters are reported to outer layers (including bus, e.g.). Yet more
stirring around in isp_mboxcmd to try and get it right. Decode of 1080/1240
NVRAM.
1999-05-11 05:06:55 +00:00
Matt Jacob
17b1ea0341 temp fix for internal queue overflow problem 1999-04-14 17:37:36 +00:00
Matt Jacob
3c6e29e07a Make firmware revision a triple. Clean up some FC init stuff for
board versions with no BIOS. Separate mailbox interrupts from
IOCB interrupts. Read OUTMAILBOX5 while RISC_INT is active- not
after you clear it (potential race condition). Clear out older broken
BIG_ENDIAN goop. Don't negotiate narrow/async for LVD busses at startup
if already in LVD mode. Note usage of presumptive 1040C revision. For
all the LIP, PDB Changed, Loop UP/DOWN async events, mark fw state
as unknown as well as marking the need to do a getpdb on targets- after
a LIP for certain the f/w has to do PRLI/PLOGI for all targets again
and marking f/w state as unknown gives us a fighting chance to (start
to) hold up for that to complete.
1999-04-04 02:28:29 +00:00
Matt Jacob
3bd28825dd Annoying little nigglet- apparently *some* Qlogic temporarily ignore
settings you've just sent them and return random values if you follow
the set by a get. This causes problems when you latter run a Tag-enabled
command when you've command tagged mode off.
1999-03-26 00:33:13 +00:00
Matt Jacob
4394c92f52 Add in 1080 LVD support and some basis also for the 1240. The port database
printout is now enabled.
1999-03-25 22:52:45 +00:00
Matt Jacob
57c801f5cf A wad of changes- prepping for 1080/1240 support (which caused a massive
thwank in register layout goop). A different mboxcmd approach. Some PDB change
infrastructure. Some better management of loopdown/loopup events (keep them
distinct from resource starvation for simq freeze/unfreeze actions).
1999-03-17 05:04:39 +00:00
Matt Jacob
3c688670f5 Roll internal release tag. Print out if we're in a 64 bit PCI slot.
Use fast memory timing NVRAM parameter. Clean up and fix establishment
of default target parameters. Don't use NVRAM if are flagged as not to
do so (I had a busted NVRAM setup which I couldn't edit that enabled SYNC
mode but disabled disconnect/reconnect and wide!!). Fix delays after
resets. BUS resets not done in isp_init anymore- relegated to OS
specific outer layers. Fix a buglet where you can get in a loop for
a NULL xs in the completion list in isp_intr. Add in some defines that
can disable fast posting. Add in code for Loop Up/Loop Down events that
call into the outer layers as to what to do.
1999-02-09 01:07:06 +00:00
Matt Jacob
cbf57b472d Implement and use Fast Posting for both parallel && fibre. Redo a bit of
the startup code. Implement a call to outer framework function so that
asynchronous events can be handled (e.g., speed negotiation, target mode).

Roll internal release tags.
1999-01-30 07:29:00 +00:00
Matt Jacob
bff85e290f Suggested by bde@freebsd.org- memcpy not necessarily good to use. D'oh- not in
the BSD DKI. Stop being lazy and finish the defines so MEMCPY becomes bzero
for FreeBSD.
1999-01-10 11:15:23 +00:00
Matt Jacob
f7102a20c6 Add some prototype deadchip detection. Set FIFO bursting (1XX0 only-
it's already on for the 2XX0) and detect the broken 1040A FIFO. Change
bzero to MEMZERO (portability with **nux). Use memcpy for same reason.

Finally detect QUEUE FULL conditions and return this as an error that
will get cam_periph_error to do it's 'tagged openings now XXX' dance.
1999-01-10 02:55:10 +00:00
Matt Jacob
c305536317 clarify headers;move uninit to outer layer;remove watchdog 1998-12-28 19:22:27 +00:00
Matt Jacob
cc56928232 oops on last 1998-12-05 01:46:40 +00:00
Matt Jacob
e205b454ac Remove the Target mode functions until they're in better shape. Implement some
suggested compilation cleanups from Eklund. Wire down a hard loop id if we are
not on a platform that has the ability to get to a PCI BIOS (it still will
float to the ID it gets after a LIP but at least we can try). Clarify that the
expanded lun is based upon SCCLUN defines (in f/w).
1998-12-05 01:33:57 +00:00
Matt Jacob
a70ac9fd83 per bde (who is right about this) that an inlined fucntion with const
char * strings being returned defined in a header file included several
places but only used in one module, is, uh, silly.
1998-09-17 23:20:29 +00:00
Matt Jacob
7ac6b5a314 Cleanliness. Don't leave defined a const char array that's only used
if target mode is defined (which it isn't, yet).
1998-09-17 22:53:35 +00:00
Matt Jacob
42b7427595 ISP_DMASETUP now returns a value to be possibly punted to outer layers.
Turn request queue overflow messages into debug messages. Ensure on
isp_restarts that we nullify the xflist array.
1998-09-17 21:03:45 +00:00
Matt Jacob
050ea0d6f2 fix reported compile error flying blind- I do not have the new compiler yet 1998-09-15 22:44:51 +00:00
Justin T. Gibbs
478f8a9685 Update QLogic ISP support for CAM. Add preliminary target mode support.
Submitted by:	Matthew Jacob <mjacob@feral.com>
1998-09-15 08:42:56 +00:00
Matt Jacob
6054c3f6f2 Add support for the Qlogic ISP SCSI && FC/AL Adapters 1998-04-22 17:54:58 +00:00