Commit Graph

276 Commits

Author SHA1 Message Date
Matt Jacob
4081cc88c9 Only call ISP_UNLOCK/ISP_LOCK if isp->isp_osinfo.intsok in USEC_SLEEP.
Add a test against isp->isp_osinfo.islocked prior to trying to see
whether --isp->isp_osinfo.islocked is zero to cause us to unlock
(non-SMPLOCK case).
2000-12-05 07:41:53 +00:00
Matt Jacob
bfbab17021 Replace some more printfs with isp_prt's. Use isp_prt/ISP_LOGDEBUG0
for rate setting/getting printouts.
2000-12-05 07:39:54 +00:00
Matt Jacob
f7dddf8a54 Remove more printfs and use either isp_prt or device_printf. Remember
to set ISP_LOGINFO if bootverbose is set.
2000-12-05 07:38:41 +00:00
David Malone
ea8b5a9ae9 More M_ZERO patches.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
Approved by:	mjacob
2000-12-03 20:46:54 +00:00
Matt Jacob
e5f2f488c5 Add USEC_SLEEP macro support. Change the location at which we define
ISP_LOCK/ISP_UNLOCK macros.
2000-12-02 18:33:29 +00:00
Matt Jacob
81babfd043 Make the Not RESPONSE in RESPONSE QUEUE message have a bit more info
(specifically, how many entries we've looked at so far). Maintain
interrupt instrumentation. Use USEC_SLEEP instead of USEC_DELAY in
a number of places (this allows us to drop locks and sleep instead
of spin). Track changes to configuration options for topology preference.
Fix botched order of printout for Channel, Target, Lun.
2000-12-02 18:08:35 +00:00
Matt Jacob
67afe757a2 Add interrupt instrumentation. Change ISP_CFG_NPORT config option to
a set of options that allows specific loop, loop-only, nport, nport-only
topology settings. Define a required macro for all platforms (USEC_SLEEP).
2000-12-02 18:06:03 +00:00
Matt Jacob
aa0898c0e0 I'm dropping the MAINTAINER request and see what happens. If it becomes
too hard for me to keep in sync with other platforms, FreeBSD will go
it's own way.
2000-10-31 05:55:54 +00:00
Matt Jacob
650789cb1b Get rid of ridiculous ISP_PVS macro. Instead, just set an
ISP_SMPLOCK define based on the previous 5.4 major/minor release
define of PVS- because this allows us to turn it off easier.
2000-10-25 04:42:46 +00:00
Matt Jacob
3395b0568a Whoops! Forgot to commit this when I committed the other (turnin on locks)
change. Sorry about that.
2000-10-25 04:40:49 +00:00
John Baldwin
35e0e5b311 Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h
2000-10-20 07:58:15 +00:00
Matt Jacob
e92fbe47e2 Roll minor revision- for once we'll use this because.... if revision >= 5.4,
compile time will build in mutex locks, otherwise the old locking (splcam/splx
with a recursion counter) will be compiled in.

We still depend on config_intr_hook to tell us when it's okay to call
msleep instead of polling. It'd be real nice if we could do this early
enough to not hang up a machine struggling with a bad Fibre Channel loop,
but that's still to come.
2000-10-17 18:18:14 +00:00
Matt Jacob
39f3fc6f69 remove "SERVICING_INTERRUPT" nonsense 2000-10-17 18:15:30 +00:00
Poul-Henning Kamp
db7e3af111 Remove unneeded #include <machine/clock.h> 2000-10-15 14:19:01 +00:00
Matt Jacob
e5d4e19714 Make changes required by change in how default and usable node and port
WWNS are made and used.
2000-10-12 23:59:40 +00:00
Matt Jacob
c914d4237d Redo how default Node and Port WWNs are determined (again!). This is so
we don't stomp on the differences between ports for a Qlogic 2202.
2000-10-12 23:49:09 +00:00
Matt Jacob
9cf43b9716 Change some default macro usages/definitions/requirements. 2000-10-12 23:47:03 +00:00
Matt Jacob
aa57fd6fa5 some copyright cleanups 2000-09-21 20:16:04 +00:00
Matt Jacob
c0cfc79790 Inintialize the queue index stuff from what the f/w sends back- just
in case it's insane enough to not do what you tell it to.

Print out (LOGINFO level) initiator ID.
2000-09-21 17:06:45 +00:00
Matt Jacob
e11a1ee870 Per msmith's request, don't attach to Qlogic 12160 id'd cards that have
a certain SubVendorID.
2000-09-07 20:27:40 +00:00
Doug Rabson
21c3015a24 * Completely rewrite the alpha busspace to hide the implementation from
the drivers.
* Remove legacy inx/outx support from chipset and replace with macros
  which call busspace.
* Rework pci config accesses to route through the pcib device instead of
  calling a MD function directly.

With these changes it is possible to cleanly support machines which have
more than one independantly numbered PCI busses. As a bonus, the new
busspace implementation should be measurably faster than the old one.
2000-08-28 21:48:13 +00:00
Matt Jacob
84267b9edd remove clause 3 licence 2000-08-27 23:39:23 +00:00
Matt Jacob
b6b6ad2f23 various fixes 2000-08-27 23:38:44 +00:00
Matt Jacob
3ea883b46d Add a comment as to where stdarg.h applies. 2000-08-03 03:05:50 +00:00
John Baldwin
7d615c1d8b Use <machine/stdarg.h> instead of <stdarg.h> so that this will compile.
While I'm at it, move the #include line up to the top of the file.
2000-08-03 02:47:06 +00:00
Matt Jacob
c7d5594134 Add in macros && masks so that mailbox command errors can be
selectively printed/supressed in isp_mboxcmd.
2000-08-01 06:55:08 +00:00
Matt Jacob
d0d5832ac7 Major whacking for core version 2.0. A major motivator for 2.0 and these
changes is that there's now a Solaris port of this driver, so some things
in the core version had to change (not much, but some).

In order, from the top.....:

A lot of error strings are gathered in one place at the head of the file.
This caused me to rewrite them to look consistent (with respect to
things like 'Port 0x%' and 'Target %d' and 'Loop ID 0x%x'.

The major mailbox function, isp_mboxcmd, now takes a third argument,
which is a mask that selectively says whether mailbox command failures
will be logged. This will substantially reduce a lot of spurious noise
from the driver.

At the first run through isp_reset we used to try and get the current
running firmware's revision by issuing a mailbox command. This would
invariably fail on alpha's with anything but a Qlogic 1040 since SRM
doesn't *start* the f/w on these cards. Instead, we now see whether we're
sitting ROM state before trying to get a running BIOS loaded f/w version.

All CFGPRINTF/PRINTF/IDPRINTF macros have been replaced with calls to
isp_prt. There are seperate print levels that can be independently
set (see ispvar.h), which include debugging, etc.

All SYS_DELAY macros are now USEC_DELAY macros. RQUEST_QUEUE_LEN and
RESULT_QUEUE_LEN now take ispsoftc as a parameter- the Fibre Channel
cards and the Ultra2/Ultra3 cards can have 16 bit request queue entry
indices, so we can make a 1024 entry index for them instead of the
256 entries we've had until now.

A major change it to fix isp_fclink_test to actually only wait the
delay of time specified in the microsecond argument being passed.
The problem has always been that a call to isp_mboxcmd to get he
current firmware state takes an unknown (sometimes long) amount of
time- this is if the firmware is busy doing PLOGIs while we ask
it what's up. So, up until now, the usdelay argument has been
a joke. The net effect has been that if you boot without being plugged
into a good loop or into a switch, you hang. Massively annonying, and
hard to fix because the actual time delta was impossible to know
from just guessing. Now, using the new GET_NANOTIME macros, a precise
and measured amount of USEC_DELAY calls are done so that only the
specified usecdelay is allowed to pass. This means that if the initial
startup of the firmware if followed by a call from isp_freebsd.c:isp_attach
to isp_control(isp, ISP_FCLINK_TEST, &tdelay) where tdelay is 2 * 1000000,
no more than two seconds will actually elapse before we leave concluding
that the cable is unhooked. Jeez. About time....

Change the ispscsicmd entry point to isp_start, and the XS_CMD_DONE
macro to a call to the platform supplied isp_done (sane naming).

Limit our size of request queue completions we'll look at at interrupt
time. Since we've increased the size of the Request Queue (and the
size of the Response Queue proportionally), let's not create an
interrupt stack overflow by having to keep a max completion list
(forw links are not an option because this is common code with
some platforms that don't have link space in their XS_T structures).
A limit of 32 is not unreasonable- I doubt there'd be even this many
request queue completions at a time- remember, most boards now use
fast posting for normal command completion instead of filling out
response queue entries.

In the isp_mboxcmd cleanup, also create an array of command
names so that "ABOUT FIRMWARE" can be printed instead of "CMD #8".

Remove the isp_lostcmd function- it's been deprecated for a while.
Remove isp_dumpregs- the ISP_DUMPREGS goes to the specific bus
register dump fucntion.

Various other cleanups.
2000-08-01 06:51:05 +00:00
Matt Jacob
b09b009594 Core version 2.0 rewrite. In this file we replace isp_tdebug with
isp_prt calls. We now use an argument to the  ISPCTL_FCLINK_TEST
call. We change all IDPRINTF macros to isp_prt calls. We add
the isp_prt function here.
2000-08-01 06:31:44 +00:00
Matt Jacob
18ccaecd45 Core version 2.0 cleanup/rewrite. Things get rearranged and changed
quite a bit so that all of the ports have a similar set of required
macros/definitions (and in similar places in the isp_<platform>.h
file).

Some new macros/functions added- Mailbox Acquire/Relase macros,
NANOTIME macros, SNPRINTf and STRNCAT. MemoryBarrier beomes
MEMORYBARRIER with much stronger types.
2000-08-01 06:29:55 +00:00
Matt Jacob
16dd34376c Remove isp_prtstst (now in case statement in isp.c). Remove
isp2100_fw_statename as an INLINE (now a function in isp.c). Remove
isp2100_pdb_statename (unused). Redo all ISP_SCSI_XFER_T as XS_T types.
Change all RQUEST_QUEUE_LEN/RESULT_QUEUE_LEN macros to take a parameter.
Add isp_print_bytes function.
2000-08-01 06:26:04 +00:00
Matt Jacob
10549c059a Remove isp_tdebug. Change all PRINTF macros to the now common
isp_prt logging function.
2000-08-01 06:24:01 +00:00
Matt Jacob
69fbe07a2e Fix typo. Remove isp_tdebug (we'll use ISP_LOGTDEBUG2 in isp->isp_dblev
as a selector now). Change DFLT_CMD_CNT to a fixed amount for now.
2000-08-01 06:23:24 +00:00
Matt Jacob
a6db0ba6d3 Add in lengths of SBus or PCI registers. 2000-08-01 06:21:21 +00:00
Matt Jacob
53cff3bb65 Rewrite for version 2.0. Some structural changes, but also
a substantial amount of commenting about what each platform
specific definitions are supposed to be.
2000-08-01 06:10:21 +00:00
Matt Jacob
d02373f1a0 Part of major rewrite for core version 2.0- clarification of
mdvec structure, removal of printf/CFGPRINTF in place of isp_prt
calls. Parameterization of RQUEST_QUEUE_LEN/RESULT_QUEUE_LEN.
2000-08-01 05:16:49 +00:00
Matt Jacob
482cf5c2e7 Add in some new IN_XXX and CT_XXXX flags in preparation
for the rototilling that !*$)~@!$_@*_(~@$*_(~@$*~@$*
Qlogic F/W changes will need.
2000-07-18 07:06:47 +00:00
Matt Jacob
d37162ca7a If debugging set, zero out an incoming response entry
when we're done reading it (makes checking things easier).
Before calling isp_notify_ack make sure we're at RUNSTATE-
elsewise we can be responding to LIPs or SCSI bus resets
before we've finished some of the wiring.
2000-07-18 07:05:37 +00:00
Matt Jacob
910fb4f6ee The SERVICING_INTERRUPT isn't quite safe yet. 2000-07-18 07:04:07 +00:00
Matt Jacob
f48ce1882f Add a isp_target_putback_atio- we aren't using CCINCR at this time, so
we need a function that tells the Qlogic f/w that a target mode command
is done, so increase the resource count for that lun. Add in a timeout
function to kick the putback again if we fail to do it the first time (we
may not have the request queue space for ATIO push). Split the function
isp_handle_platform_ctio into two parts so that the timeout function for
the ATIO push or isp_handle_platform_ctio can inform CAM that the requested
CTIO(s) are now done.

Clean up (cough) residual handling. What we need for Fibre Channel
is to preserve the at_datalen field from the original incoming ATIO
so we can calculate a 'true' residual.  Unfortunately, we're not
guaranteed to get that back from CAM. We'll *try* to find it hiding
in the periph_priv field (layering violation)- but if an ATIO was
passed in from user land- forget it. This means that we'll probably
get residuals wrong for Fibre Channel commands we're completing
with an error. It's too late to 4.1 release to fix this- too bad.
Luckily the only device we'd really care about this occurring on
is a tape device and they're still so rare as FC attached devices
that this can be considered an untested combination anyway.

Remove all CCINCR usage (resource autoreplenish). When we've proved
to ourself that things are working properly, we can add it back
in.

Make sure we propage 'suggested' sense data from the incoming ATIO
into the created system ATIO- and set sense_len appropriately.
Correctly propagate tag values.

Fall back to the model of generating (well, the functions in isp_pci.c
do the work) multiple CTIOs based upon what we get from XPT. Instead
of being able to pair Qlogic generated ATIOs with CAM ATIOs, and then
to pair CAM CTIOs with Qlogic CTIOs, we have to take the CTIO passed
to us from XPT, and if it implies that we have to generate extra
Qlogic CTIOs, so be it. This means that we have to wait until the
last CTIO in a sequence we generated completes before calling xpt_done.

Executive summary- target mode actually now pretty much works well
enough to tell folks about.
2000-07-18 06:58:28 +00:00
Matt Jacob
c77d11d0cc Raise debug level for some messages. Fix botched inversion
about MBOX_COMMAND_ERROR vs. MBOX_COMMAND_PARAM_ERROR.
2000-07-18 06:46:48 +00:00
Matt Jacob
05fbcbb000 Keep interrupts blocked for all of isp_pci_attach. Redo DMA routines
for target mode for cleanliness and accuracy.
2000-07-18 06:40:22 +00:00
Matt Jacob
1fcf5deb4a Oops! If we're deciding a command is now really dead, make *darned*
sure that it really is by issuing a ISPCTL_ABORT_CMD just on the
off chance the f/w will start it up again and, ha ha, start using
the DMA resources we gave it but are now taking away.
2000-07-05 06:44:17 +00:00
Matt Jacob
3e97a5b432 Clean up ISPCTL_ABORT_CMD function to not be too chatty if it succeeds,
or even if it fails with INVALID_PARM (which just means that the handle
doesn't refer to an active commane).
2000-07-05 06:41:36 +00:00
Matt Jacob
c464389f4b Remove obsolete isp_dogactive tag. 2000-07-04 01:06:42 +00:00
Matt Jacob
8bdda719ae Fix completely stupid and idiotiuc sprintfs in isp_inline.h with
with the STRNCAT function.
2000-07-04 01:06:23 +00:00
Matt Jacob
f6e75de230 Add in config_hook for catching when interrupts are safe- this allows
us to not the ints are ok and also to (re)ENABLE isp interrupts. Remove
all splcam()/splx() invocates and replace them with ISP_LOCK/ISP_UNLOCK
macros.
2000-07-04 01:05:43 +00:00
Matt Jacob
df9d46b6d9 Add in isp_lock/isp_unlock inlines. Add in an islocked/intsok flag
to isp_osinfo substructure (all in prep for SMP). Define MBOX_WAIT_COMPLETE
and MBOX_NOTIFY_COMPLETE macros so that we can now (temp) use tsleep
to wait for mailbox completion. Requires us to guess whether we're
servicing an interrupt or not- will use intr_nesting_level.

Add local strncat function.
2000-07-04 01:04:35 +00:00
Matt Jacob
1d460ef8d5 Change delay loop in new isp_mboxcmd to the use of the new MBOX_WAIT_COMPLETE
macro. Change notification of completion of a mailbox command in isp_intr
to MBOX_NOTIFY_COMPLETE macro.
2000-07-04 01:02:38 +00:00
Matt Jacob
469b6b9efb Change startup locking. Use new isp_handle_index function
for indexing off of handles to get dma maps.
2000-07-04 01:01:15 +00:00
Matt Jacob
28445eef28 Fix usage of DELAY (SYS_DELAY is the platform independent local
define).  Fix stupidity wrt checking whether we've gone to
LOOP_PDB_RCVD loopstate- it's okay to be greater than this state.
D'oh! Protect calls to isp_pdb_sync and isp_fclink_state with IS_FC
macros.

Completely redo mailbox command routine (in preparation to make this
possibly wait rather than poll for completion).

Make a major attempt to solve the 'lost interrupt' problem

1. Problem

The Qlogic cards would appear to 'lose' interrupts, i.e., a legitimate
regular SCSI command placed on the request queue would never complete
and the watchdog routine in the driver would eventually wakeup and
catch it. This would typically only happen on Alphas, although a
couple folks with 700MHz Intel platforms have also seen this.

For a long time I thought it was a foulup with f/w negotiations of
SYNC and/or WIDE as it always seemed to happen right after the
platform it was running on had done a SET TARGET PARAMETERS mailbox
command to (re)enable sync && wide (after initially forcing
ASYNC/NARROW at startup). However, occasionally, the same thing
would also occur for the Fibre Channel cards as well (which, ahem,
have no SET TARGET PARAMETERS for transfer mode).

After finally putting in a better set of watchdog routines for the
platforms for this driver, it seemed to be the case that the command
in question (usually a READ CAPACITY) just had up and died- the
watchdog routine would catch it after ~10 seconds. For some platforms
(NetBSD/OpenBSD)- an ABORT COMMAND mailbox command was sent (which
would always fail- indicating that the f/w denied knowledge of this
command, i.e., the f/w thought it was a done command). In any case,
retrying the command worked. But this whole problem needed to be
really fixed.

2. A False Step That Went in The Right Direction

The mailbox code was completely rewritten to no longer try and grab
the mailbox semaphore register and to try and 'by hand' complete
async fast posting completions. It was also rewritten to now have
separate in && out bitpatterns for registers to load to start and
retrieve to complete. This means that isp_intr now handles mailbox
completions.

This substantially simplifies the mailbox handling code, and carries
things 90% toward getting this to be a non-polled routine for this
driver.

This did not solve the problem, though.

3. Register Debouncing

I saw some comments in some errata sheets and some notes in a Qlogic
produced Linux driver (for the Qlogic 2100) that seemed to indicate
that debouncing of reads of the mailbox registers might be needed,
so I added this.  This did not affect the problem. In fact, it made
the problem worse for non-2100 cards.

5. Interrupt masking/unmasking

The driver *used* to do a substantial amount of masking/unmasking
of the interrupt control register. This was done to make sure that
the core common code could just assume it would never get pre-empted.

This apparently substantially contributed to the lost interrupt
problem.  The rewrite of the ICR (Interrupt Control Register),
which is a separate register from the ISR (Interrupt Status Register)
should not have caused any change to interrupt assertions pending.
The manual does not state that it will, and the register layout
seems to imply that the ICR is just an active route gate. We only
enable PCI Interrupts and RISC Interrupts- this should mean that
when the f/w asserts a RISC interrupt and (and the ICR allows RISC
Interrupts) and we have PCI Interrupts enabled, we should get a
PCI interrupt. Apparently this is a latch- not a signal route.

Removing this got rid of *most* but not all, lost interrupts.

5. Watchdog Smartening

I made sure that the watchdog routine would catch cases where the
Qlogic's ISR showed an interrupt assertion. The watchdog routine
now calls the interrupt service routine if it sees this. Some
additional internal state flags were added so that the watchdog
routine could then know whether the command it was in the middle
of burying (because we had time it out) was in fact completed by
the interrupt service routine.

6. Occasional Constipation Of Commands..

In running some very strenous high IOPs tests (generating about
11000 interrupts/second across one Qlogic 1040, one Qlogic 1080
and one Qlogic 2200 on an Alpha PC164), I found that I would get
occasional but regular 'watchdog timeouts' on both the 1080 and
the 2100 cards. This is under FreeBSD, and the watchdog timeout
routine just marks the command in error and retries it.

Invariably, right after this 'watchdog timeout' error, I'd get a
command completion for the command that I had thought timed out.
That is, I'd get a command completion, but the handle returned by
the firmware mapped to no current command. The frequency of this
problem is low under such a load- it would usually take an 30
minutes per 'lost' interrupt.

I doubled the timeout for commands to see if it just was an edge
case of waiting too short a period. This has no effect.

I gathered and printed out microtimes for the watchdog completed
command and the completion that couldn't find a command- it was
always the case that the order of occurrence was "timeout, completion"
separated by a time on the order of 100 to 150 ms.

This caused me to consider 'firmware constipation' as to be a
possible culprit. That is, resubmission of a command to the device
that had suffered a watchdog timeout seemed to cause the presumed
dead command to show back up.

I added code in the watchdog routine that, when first entered for
the command, marks the command with a flag, reissues a local timeout
call for one second later, but also then issues a MARKER Request
Queue entry to the Qlogic f/w. A MARKER entry is used typically
after a Bus Reset to cause the f/w to get synchronized with respect
to either a Bus, a Nexus or a Target.

Since I've added this code, I always now see the occasional watchdog
timeout, but the command that was about to be terminated always
now seems to be completed after the MARKER entry is issued (and
before the timeout extension fires, which would come back and
*really* terminate the command).
2000-06-27 19:44:31 +00:00