Commit Graph

241 Commits

Author SHA1 Message Date
Matt Jacob
482cf5c2e7 Add in some new IN_XXX and CT_XXXX flags in preparation
for the rototilling that !*$)~@!$_@*_(~@$*_(~@$*~@$*
Qlogic F/W changes will need.
2000-07-18 07:06:47 +00:00
Matt Jacob
d37162ca7a If debugging set, zero out an incoming response entry
when we're done reading it (makes checking things easier).
Before calling isp_notify_ack make sure we're at RUNSTATE-
elsewise we can be responding to LIPs or SCSI bus resets
before we've finished some of the wiring.
2000-07-18 07:05:37 +00:00
Matt Jacob
910fb4f6ee The SERVICING_INTERRUPT isn't quite safe yet. 2000-07-18 07:04:07 +00:00
Matt Jacob
f48ce1882f Add a isp_target_putback_atio- we aren't using CCINCR at this time, so
we need a function that tells the Qlogic f/w that a target mode command
is done, so increase the resource count for that lun. Add in a timeout
function to kick the putback again if we fail to do it the first time (we
may not have the request queue space for ATIO push). Split the function
isp_handle_platform_ctio into two parts so that the timeout function for
the ATIO push or isp_handle_platform_ctio can inform CAM that the requested
CTIO(s) are now done.

Clean up (cough) residual handling. What we need for Fibre Channel
is to preserve the at_datalen field from the original incoming ATIO
so we can calculate a 'true' residual.  Unfortunately, we're not
guaranteed to get that back from CAM. We'll *try* to find it hiding
in the periph_priv field (layering violation)- but if an ATIO was
passed in from user land- forget it. This means that we'll probably
get residuals wrong for Fibre Channel commands we're completing
with an error. It's too late to 4.1 release to fix this- too bad.
Luckily the only device we'd really care about this occurring on
is a tape device and they're still so rare as FC attached devices
that this can be considered an untested combination anyway.

Remove all CCINCR usage (resource autoreplenish). When we've proved
to ourself that things are working properly, we can add it back
in.

Make sure we propage 'suggested' sense data from the incoming ATIO
into the created system ATIO- and set sense_len appropriately.
Correctly propagate tag values.

Fall back to the model of generating (well, the functions in isp_pci.c
do the work) multiple CTIOs based upon what we get from XPT. Instead
of being able to pair Qlogic generated ATIOs with CAM ATIOs, and then
to pair CAM CTIOs with Qlogic CTIOs, we have to take the CTIO passed
to us from XPT, and if it implies that we have to generate extra
Qlogic CTIOs, so be it. This means that we have to wait until the
last CTIO in a sequence we generated completes before calling xpt_done.

Executive summary- target mode actually now pretty much works well
enough to tell folks about.
2000-07-18 06:58:28 +00:00
Matt Jacob
c77d11d0cc Raise debug level for some messages. Fix botched inversion
about MBOX_COMMAND_ERROR vs. MBOX_COMMAND_PARAM_ERROR.
2000-07-18 06:46:48 +00:00
Matt Jacob
05fbcbb000 Keep interrupts blocked for all of isp_pci_attach. Redo DMA routines
for target mode for cleanliness and accuracy.
2000-07-18 06:40:22 +00:00
Matt Jacob
1fcf5deb4a Oops! If we're deciding a command is now really dead, make *darned*
sure that it really is by issuing a ISPCTL_ABORT_CMD just on the
off chance the f/w will start it up again and, ha ha, start using
the DMA resources we gave it but are now taking away.
2000-07-05 06:44:17 +00:00
Matt Jacob
3e97a5b432 Clean up ISPCTL_ABORT_CMD function to not be too chatty if it succeeds,
or even if it fails with INVALID_PARM (which just means that the handle
doesn't refer to an active commane).
2000-07-05 06:41:36 +00:00
Matt Jacob
c464389f4b Remove obsolete isp_dogactive tag. 2000-07-04 01:06:42 +00:00
Matt Jacob
8bdda719ae Fix completely stupid and idiotiuc sprintfs in isp_inline.h with
with the STRNCAT function.
2000-07-04 01:06:23 +00:00
Matt Jacob
f6e75de230 Add in config_hook for catching when interrupts are safe- this allows
us to not the ints are ok and also to (re)ENABLE isp interrupts. Remove
all splcam()/splx() invocates and replace them with ISP_LOCK/ISP_UNLOCK
macros.
2000-07-04 01:05:43 +00:00
Matt Jacob
df9d46b6d9 Add in isp_lock/isp_unlock inlines. Add in an islocked/intsok flag
to isp_osinfo substructure (all in prep for SMP). Define MBOX_WAIT_COMPLETE
and MBOX_NOTIFY_COMPLETE macros so that we can now (temp) use tsleep
to wait for mailbox completion. Requires us to guess whether we're
servicing an interrupt or not- will use intr_nesting_level.

Add local strncat function.
2000-07-04 01:04:35 +00:00
Matt Jacob
1d460ef8d5 Change delay loop in new isp_mboxcmd to the use of the new MBOX_WAIT_COMPLETE
macro. Change notification of completion of a mailbox command in isp_intr
to MBOX_NOTIFY_COMPLETE macro.
2000-07-04 01:02:38 +00:00
Matt Jacob
469b6b9efb Change startup locking. Use new isp_handle_index function
for indexing off of handles to get dma maps.
2000-07-04 01:01:15 +00:00
Matt Jacob
28445eef28 Fix usage of DELAY (SYS_DELAY is the platform independent local
define).  Fix stupidity wrt checking whether we've gone to
LOOP_PDB_RCVD loopstate- it's okay to be greater than this state.
D'oh! Protect calls to isp_pdb_sync and isp_fclink_state with IS_FC
macros.

Completely redo mailbox command routine (in preparation to make this
possibly wait rather than poll for completion).

Make a major attempt to solve the 'lost interrupt' problem

1. Problem

The Qlogic cards would appear to 'lose' interrupts, i.e., a legitimate
regular SCSI command placed on the request queue would never complete
and the watchdog routine in the driver would eventually wakeup and
catch it. This would typically only happen on Alphas, although a
couple folks with 700MHz Intel platforms have also seen this.

For a long time I thought it was a foulup with f/w negotiations of
SYNC and/or WIDE as it always seemed to happen right after the
platform it was running on had done a SET TARGET PARAMETERS mailbox
command to (re)enable sync && wide (after initially forcing
ASYNC/NARROW at startup). However, occasionally, the same thing
would also occur for the Fibre Channel cards as well (which, ahem,
have no SET TARGET PARAMETERS for transfer mode).

After finally putting in a better set of watchdog routines for the
platforms for this driver, it seemed to be the case that the command
in question (usually a READ CAPACITY) just had up and died- the
watchdog routine would catch it after ~10 seconds. For some platforms
(NetBSD/OpenBSD)- an ABORT COMMAND mailbox command was sent (which
would always fail- indicating that the f/w denied knowledge of this
command, i.e., the f/w thought it was a done command). In any case,
retrying the command worked. But this whole problem needed to be
really fixed.

2. A False Step That Went in The Right Direction

The mailbox code was completely rewritten to no longer try and grab
the mailbox semaphore register and to try and 'by hand' complete
async fast posting completions. It was also rewritten to now have
separate in && out bitpatterns for registers to load to start and
retrieve to complete. This means that isp_intr now handles mailbox
completions.

This substantially simplifies the mailbox handling code, and carries
things 90% toward getting this to be a non-polled routine for this
driver.

This did not solve the problem, though.

3. Register Debouncing

I saw some comments in some errata sheets and some notes in a Qlogic
produced Linux driver (for the Qlogic 2100) that seemed to indicate
that debouncing of reads of the mailbox registers might be needed,
so I added this.  This did not affect the problem. In fact, it made
the problem worse for non-2100 cards.

5. Interrupt masking/unmasking

The driver *used* to do a substantial amount of masking/unmasking
of the interrupt control register. This was done to make sure that
the core common code could just assume it would never get pre-empted.

This apparently substantially contributed to the lost interrupt
problem.  The rewrite of the ICR (Interrupt Control Register),
which is a separate register from the ISR (Interrupt Status Register)
should not have caused any change to interrupt assertions pending.
The manual does not state that it will, and the register layout
seems to imply that the ICR is just an active route gate. We only
enable PCI Interrupts and RISC Interrupts- this should mean that
when the f/w asserts a RISC interrupt and (and the ICR allows RISC
Interrupts) and we have PCI Interrupts enabled, we should get a
PCI interrupt. Apparently this is a latch- not a signal route.

Removing this got rid of *most* but not all, lost interrupts.

5. Watchdog Smartening

I made sure that the watchdog routine would catch cases where the
Qlogic's ISR showed an interrupt assertion. The watchdog routine
now calls the interrupt service routine if it sees this. Some
additional internal state flags were added so that the watchdog
routine could then know whether the command it was in the middle
of burying (because we had time it out) was in fact completed by
the interrupt service routine.

6. Occasional Constipation Of Commands..

In running some very strenous high IOPs tests (generating about
11000 interrupts/second across one Qlogic 1040, one Qlogic 1080
and one Qlogic 2200 on an Alpha PC164), I found that I would get
occasional but regular 'watchdog timeouts' on both the 1080 and
the 2100 cards. This is under FreeBSD, and the watchdog timeout
routine just marks the command in error and retries it.

Invariably, right after this 'watchdog timeout' error, I'd get a
command completion for the command that I had thought timed out.
That is, I'd get a command completion, but the handle returned by
the firmware mapped to no current command. The frequency of this
problem is low under such a load- it would usually take an 30
minutes per 'lost' interrupt.

I doubled the timeout for commands to see if it just was an edge
case of waiting too short a period. This has no effect.

I gathered and printed out microtimes for the watchdog completed
command and the completion that couldn't find a command- it was
always the case that the order of occurrence was "timeout, completion"
separated by a time on the order of 100 to 150 ms.

This caused me to consider 'firmware constipation' as to be a
possible culprit. That is, resubmission of a command to the device
that had suffered a watchdog timeout seemed to cause the presumed
dead command to show back up.

I added code in the watchdog routine that, when first entered for
the command, marks the command with a flag, reissues a local timeout
call for one second later, but also then issues a MARKER Request
Queue entry to the Qlogic f/w. A MARKER entry is used typically
after a Bus Reset to cause the f/w to get synchronized with respect
to either a Bus, a Nexus or a Target.

Since I've added this code, I always now see the occasional watchdog
timeout, but the command that was about to be terminated always
now seems to be completed after the MARKER entry is issued (and
before the timeout extension fires, which would come back and
*really* terminate the command).
2000-06-27 19:44:31 +00:00
Matt Jacob
b85389e117 Add in the enabling of interrupts (to isp_attach). Clean up a busted
comment. Check against firmware state- not loop state when enabling
target mode. Other changes have to do with no longer enabling/disabling
interrupts at will.

Rearchitect command watchdog timeouts-

First of all, set the timeout period for a command that has a
timeout (in isp_action) to the period of time requested *plus* two
seconds. We don't want the Qlogic firmware and the host system to
race each other to report a dead command (the watchdog is there to
catch dead and/or broken firmware).

Next, make sure that the command being watched isn't done yet. If
it's not done yet, check for INT_PENDING and call isp_intr- if that
said it serviced an interrupt, check to see whether the command is
now done (this is what the "IN WATCHDOG" private flag is for- if
isp_intr completes the command, it won't call xpt_done on it because
isp_watchdog is still looking at the command).

If no interrupt was pending, or the command wasn't completed, check
to see if we've set the private 'grace period' flag. If so, the
command really *is* dead, so report it as dead and complete it with
a CAM_CMD_TIMEOUT value.

If the grace period flag wasn't set, set it and issue a SYNCHRONIZE_ALL
Marker Request Queue entry and re-set the timeout for one second
from now (see Revision 1.45 isp.c notes for more on this) to give
the firmware a final chance to complete this command.
2000-06-27 19:31:02 +00:00
Matt Jacob
cc28790740 Clean up private storage so that we can use the spriv_field0 to
store a bitmask of whether we've set a value into ccb->ccb_h.status,
whether we're in the watchdog routine for this command now, whether
we've set a grace period for this command and whether this command is
actually done.

See comments of rev 1.45 of isp.c for more complete information.
2000-06-27 19:22:13 +00:00
Matt Jacob
e2adf86e4e Add 8 bits of volatile mailbox busy mask- this will be the bitmask of
output mailbox values we want to get back out of the chip once a mailbox
command is done. Add storage for the maximum number of output mailbox
registers to the softc.

Roll minor version number.
2000-06-27 19:17:39 +00:00
Matt Jacob
40e88de6c3 Add mailbox bitmask macros (numbers of available mailbox registers
based upon Qlogic chip type). Define maximum mailboxes. Add INT_PENDING_MASK
macro. Change mailbox offset macro name.
2000-06-27 19:15:43 +00:00
Matt Jacob
986973a448 Add an isp_handle_index function- this is prepatory to loading more into
the handle (i.e., generation number), so we will now need a function that
will take a handle and return a flat index [ 0 .. maxhandles-1 ] for
auxillary routines that need an index to get at buddy store values
(like dma maps or xflist pointers).
2000-06-27 19:14:14 +00:00
Matt Jacob
56aef50302 Clean up firmware load issues and remove darn near all config options.
Force alphas to prefer mem mapping as the default.

Basically, we have a pointer to a function which we can call which will
return us a pointer to firmware for the card we have. We call this function
(if it's non-NULL) with the address of our mdvec f/w pointer.

The way this works is that if ispfw (as a module or a static) is loaded,
it initializes the pointer in isp_pci, so we can call into to it to fetch
a pointer to a f/w set.

If ispfw is MOD_UNLOADed, it's retained a pointer to our mdvec f/w pointers,
which then get zeroed out so we don't have any references to data that's
now gone from kernel memory. Removing the f/w saves ~360KBytes.

Alas, there is no autounload mechanism that works for is here.
2000-06-18 05:18:55 +00:00
Matt Jacob
526539764e Removing this bulky one large f/w file. This f/w is now in dev/ispfw. 2000-06-18 04:59:47 +00:00
Matt Jacob
fb1d37adcd Once we have firmware running (if isp_reset) and this is the first time
through, establish what our LUN width is. Unfortunately, we can't ask
the f/w. If we loaded the f/w, we'll now assume we have expanded LUNs
(SCCLUN for fibre channel, just plain 32 LUN for SCSI). If we didn't
load firmware, assume 8 LUNs for SCSI and 1 LUN for Fibre Channel. We
have to assume only one LUN for Fibre Channel because the LUN setting
in Request Queue entries is in different places whether we have SCCLUN
firmware or not, so the only LUN guaranteed to work for both is LUN 0.

Clean up the rest of isp.c so that ISP2100_SCCLUN defines aren't used-
instead use run time determinants based upon isp->isp_maxluns.

After starting firmware, delay 500us to give it a chance to get rolling.

Fix the interrupt service routine to check for both isr && sema being zero
before thinking this was a spurious interrupt.  Following the manuals,
allow for both Mailbox as well as Queue Reponse type interrupts for regular
SCSI.
2000-06-18 04:56:17 +00:00
Matt Jacob
2ad50ca5f4 Remove all ISP2100_SCCLUN define protected code and replace it with
runtime checks.
2000-06-18 04:50:26 +00:00
Matt Jacob
2133e16f18 Remove all ISP2100_SCCLUN define based code and replace it with runtime
comparisons against the tag isp_maxluns- if > 16, we're SCCLUN based.

On initial regular SCSI startup, disable auto-disconnect.
2000-06-18 04:48:28 +00:00
Matt Jacob
be44b164d0 Roll platform minor number. Force definition of SCSI_ISP_FABRIC
(we always support fabric now). Remove SCCLUN definition (we always
support SCCLUN now, if we load the f/w). Add typedef definition of an
external firmware fetch function.
2000-06-18 04:47:12 +00:00
Matt Jacob
5e09512c51 Roll core minor version. Set ISP_MAX_LUNS to be off of new isp_maxluns
tag in softc.
2000-06-18 04:45:51 +00:00
Matt Jacob
d22fcb6b75 add "disable autodisconnect" flags 2000-06-18 04:44:41 +00:00
Matt Jacob
22e83dac1e cleanup i_int_X vs. uint_X definitions 2000-06-18 04:43:55 +00:00
Matt Jacob
ee9fc94ca5 add MBOX_GET_RESOURCE_COUNT command 2000-06-18 04:41:14 +00:00
Matt Jacob
52df5dfdab Fix breakage to target mode support.
What we'd like to know is whether or not we have a listener
upstream that really hasn't configured yet. If we do, then
we can give a more sensible reply here. If not, then we can
reject this out of hand.

Choices for what to send were
	Not Ready, Unit Not Self-Configured Yet
	(0x2,0x3e,0x00)
for the former and
	Illegal Request, Logical Unit Not Supported
	(0x5,0x25,0x00)
for the latter.

We used to decide whether there was at least one listener
based upon whether the black hole driver was configured.

However, recent config(8) changes have made this hard to do
at this time.

Actually, we didn't use the above quite yet, but were sure considering it.
2000-06-12 23:08:31 +00:00
Matt Jacob
6d1d7d4c87 Fix some breakage about how we build WWNs. Do some other fabric related
changes: consider a new PDB entry different if Class 3 service parameter
roles change (!!!). Do some checking as we're getting a port database
that traps whether things change while we're doing so. Handle N-port
and F-ports correctly. Fix the fabric login loop to retain a login/binding
if things haven't changed (I mean, why logout a device only to log it back
in). No longer accept, after fabric logins, garbage if we can't get a PDB
entry that matches the device we've just logged into- if it doesn't, log
it out as it is very unlikely to still be what we thought it was. Get rid
of some of the debounce loops because we could get stuck there.
2000-05-09 01:14:43 +00:00
Matt Jacob
a200278c54 roll platform minor 2000-05-09 01:09:46 +00:00
Matt Jacob
686f419d48 Roll core minor version. Change our 'fabdev' tag to 'loggedin'. 2000-05-09 01:09:23 +00:00
Matt Jacob
cc8df88b70 Add in a watchdog routine to catch cases where we've dropped the command.
Apparently the f/w has finished the command, but somehow an interrupt is
being lost. So, we just plain wedge when booting alphas.

This is a general routine we've needed for a while.
2000-05-09 01:08:21 +00:00
Matt Jacob
a96d513df9 The storage for WWN from NVRAM is actually the PORT WWN, not the NODE WWN. 2000-05-09 01:06:47 +00:00
Matt Jacob
c41767dc4e Conrrect a macro with parenthesis. 2000-05-09 01:06:18 +00:00
Matt Jacob
16c9a708a4 Now that we fixed the isp_sendmarker botch, we can now do initial bus
resets for ULTRA2/ULTRA3 cards again (which were turned off really because
of a botch for dual bus configurations).
2000-04-21 19:18:06 +00:00
Matt Jacob
d465588a93 Roll minor version. Increase size (and add defines for) topology storage. 2000-04-21 02:06:30 +00:00
Matt Jacob
9dae880716 Some minor tweaklets. 2000-04-21 02:05:54 +00:00
Matt Jacob
05914a3f9d Add in the now required malloc.h include. I guess somebody
was busy hackin' w/o checking kernel compiles.
2000-04-21 02:05:13 +00:00
Matt Jacob
c88f65e2c0 Pick up topology more sanely at f/w startup. Change the restrictions of
where we can have targets (based on topology).

Much more importantly, make sure all mods to isp_sendmarker or |= so
we don't lose the marking of a bus that needs to have a marker sent for it.
2000-04-21 02:04:34 +00:00
Matt Jacob
b581c6e08f Update (finally) 1.15.37 to 1.19.03 for the 2100. This allows us to not
require full logins after a LIP, which always led to loop resets, and
various other perturbations.

Update 2200 f/w from 2.01.00 release to 2.01.09 release.
2000-04-21 02:01:06 +00:00
Poul-Henning Kamp
3389ae9350 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
Matt Jacob
0b69cead4d roll platform versions to 5.0 2000-03-15 18:49:44 +00:00
Matt Jacob
6ca3a52f59 Don't do bus resets for ULTRA2 or later cards because what seems to
happen currently is that several commands issued *after* the bus reset are
then reported destroyed.
2000-03-13 16:30:00 +00:00
Matt Jacob
eb76ec8076 Add in mailbox return codes for failed fabric logins (port_id_used,
loop_id_used, etc...)

Do a more precise structure for Get All Next name server responses.

Approved: jkh
2000-02-29 05:54:48 +00:00
Matt Jacob
adde48b322 Minor non-FreeBSD changes (keeping source sync'd).
Approved: jkh
2000-02-29 05:53:41 +00:00
Matt Jacob
40cfc8fe8e Prettier print of fabric devices being attached- say what kind of
port they are (e.g., F_Port vs. N_Port).

Approved: jkh
2000-02-29 05:53:10 +00:00
Matt Jacob
cf74f2682e Slightly cleaner fabric support (whiter whites! redder reds!).. No,
seriously- only attempt to logout a previously logged in fabric device.

Fix a longstanding bug for aborting overtime commands- handle halves
have always been reversed.

Clean up some error messages to indicate channel number.

Approved:jkh
2000-02-29 05:52:14 +00:00