not have a user-supplied signal handler, when a signal is delivered, one
thread will receive the signal, and then the code reverts to having no
signal handler for the signal. This can leave the other sigwait()ing
threads stranded permanently if the signal is later ignored, or can result
in process termination when the process should have delivered the signal to
one of the threads in sigwait().
To fix this problem, maintain a count of sigwait()ers for each signal that
has no default signal handler. Use the count to correctly install/uninstall
dummy signal handlers.
Reviewed by: deischen
use the BIOS Equipment List to determine how many hard drives are
installed and if the drive number we received in %dl is valid.
- Don't bother to disable interrupts when setting up the stack. The 8086
and beyond implicitly disable interrupts after an instruction that sets
%ss (for example, a pop or a mov) so that you can safely set %ss and %sp
in two consecutive instructions. An exception to this is the lss
instruction, which can set both registers simultaneously and thus doesn't
need this hack.
- Add support for EDD BIOS extensions to support booting off of hard drives
of nearly arbitrary length.
define). Fix stupidity wrt checking whether we've gone to
LOOP_PDB_RCVD loopstate- it's okay to be greater than this state.
D'oh! Protect calls to isp_pdb_sync and isp_fclink_state with IS_FC
macros.
Completely redo mailbox command routine (in preparation to make this
possibly wait rather than poll for completion).
Make a major attempt to solve the 'lost interrupt' problem
1. Problem
The Qlogic cards would appear to 'lose' interrupts, i.e., a legitimate
regular SCSI command placed on the request queue would never complete
and the watchdog routine in the driver would eventually wakeup and
catch it. This would typically only happen on Alphas, although a
couple folks with 700MHz Intel platforms have also seen this.
For a long time I thought it was a foulup with f/w negotiations of
SYNC and/or WIDE as it always seemed to happen right after the
platform it was running on had done a SET TARGET PARAMETERS mailbox
command to (re)enable sync && wide (after initially forcing
ASYNC/NARROW at startup). However, occasionally, the same thing
would also occur for the Fibre Channel cards as well (which, ahem,
have no SET TARGET PARAMETERS for transfer mode).
After finally putting in a better set of watchdog routines for the
platforms for this driver, it seemed to be the case that the command
in question (usually a READ CAPACITY) just had up and died- the
watchdog routine would catch it after ~10 seconds. For some platforms
(NetBSD/OpenBSD)- an ABORT COMMAND mailbox command was sent (which
would always fail- indicating that the f/w denied knowledge of this
command, i.e., the f/w thought it was a done command). In any case,
retrying the command worked. But this whole problem needed to be
really fixed.
2. A False Step That Went in The Right Direction
The mailbox code was completely rewritten to no longer try and grab
the mailbox semaphore register and to try and 'by hand' complete
async fast posting completions. It was also rewritten to now have
separate in && out bitpatterns for registers to load to start and
retrieve to complete. This means that isp_intr now handles mailbox
completions.
This substantially simplifies the mailbox handling code, and carries
things 90% toward getting this to be a non-polled routine for this
driver.
This did not solve the problem, though.
3. Register Debouncing
I saw some comments in some errata sheets and some notes in a Qlogic
produced Linux driver (for the Qlogic 2100) that seemed to indicate
that debouncing of reads of the mailbox registers might be needed,
so I added this. This did not affect the problem. In fact, it made
the problem worse for non-2100 cards.
5. Interrupt masking/unmasking
The driver *used* to do a substantial amount of masking/unmasking
of the interrupt control register. This was done to make sure that
the core common code could just assume it would never get pre-empted.
This apparently substantially contributed to the lost interrupt
problem. The rewrite of the ICR (Interrupt Control Register),
which is a separate register from the ISR (Interrupt Status Register)
should not have caused any change to interrupt assertions pending.
The manual does not state that it will, and the register layout
seems to imply that the ICR is just an active route gate. We only
enable PCI Interrupts and RISC Interrupts- this should mean that
when the f/w asserts a RISC interrupt and (and the ICR allows RISC
Interrupts) and we have PCI Interrupts enabled, we should get a
PCI interrupt. Apparently this is a latch- not a signal route.
Removing this got rid of *most* but not all, lost interrupts.
5. Watchdog Smartening
I made sure that the watchdog routine would catch cases where the
Qlogic's ISR showed an interrupt assertion. The watchdog routine
now calls the interrupt service routine if it sees this. Some
additional internal state flags were added so that the watchdog
routine could then know whether the command it was in the middle
of burying (because we had time it out) was in fact completed by
the interrupt service routine.
6. Occasional Constipation Of Commands..
In running some very strenous high IOPs tests (generating about
11000 interrupts/second across one Qlogic 1040, one Qlogic 1080
and one Qlogic 2200 on an Alpha PC164), I found that I would get
occasional but regular 'watchdog timeouts' on both the 1080 and
the 2100 cards. This is under FreeBSD, and the watchdog timeout
routine just marks the command in error and retries it.
Invariably, right after this 'watchdog timeout' error, I'd get a
command completion for the command that I had thought timed out.
That is, I'd get a command completion, but the handle returned by
the firmware mapped to no current command. The frequency of this
problem is low under such a load- it would usually take an 30
minutes per 'lost' interrupt.
I doubled the timeout for commands to see if it just was an edge
case of waiting too short a period. This has no effect.
I gathered and printed out microtimes for the watchdog completed
command and the completion that couldn't find a command- it was
always the case that the order of occurrence was "timeout, completion"
separated by a time on the order of 100 to 150 ms.
This caused me to consider 'firmware constipation' as to be a
possible culprit. That is, resubmission of a command to the device
that had suffered a watchdog timeout seemed to cause the presumed
dead command to show back up.
I added code in the watchdog routine that, when first entered for
the command, marks the command with a flag, reissues a local timeout
call for one second later, but also then issues a MARKER Request
Queue entry to the Qlogic f/w. A MARKER entry is used typically
after a Bus Reset to cause the f/w to get synchronized with respect
to either a Bus, a Nexus or a Target.
Since I've added this code, I always now see the occasional watchdog
timeout, but the command that was about to be terminated always
now seems to be completed after the MARKER entry is issued (and
before the timeout extension fires, which would come back and
*really* terminate the command).
comment. Check against firmware state- not loop state when enabling
target mode. Other changes have to do with no longer enabling/disabling
interrupts at will.
Rearchitect command watchdog timeouts-
First of all, set the timeout period for a command that has a
timeout (in isp_action) to the period of time requested *plus* two
seconds. We don't want the Qlogic firmware and the host system to
race each other to report a dead command (the watchdog is there to
catch dead and/or broken firmware).
Next, make sure that the command being watched isn't done yet. If
it's not done yet, check for INT_PENDING and call isp_intr- if that
said it serviced an interrupt, check to see whether the command is
now done (this is what the "IN WATCHDOG" private flag is for- if
isp_intr completes the command, it won't call xpt_done on it because
isp_watchdog is still looking at the command).
If no interrupt was pending, or the command wasn't completed, check
to see if we've set the private 'grace period' flag. If so, the
command really *is* dead, so report it as dead and complete it with
a CAM_CMD_TIMEOUT value.
If the grace period flag wasn't set, set it and issue a SYNCHRONIZE_ALL
Marker Request Queue entry and re-set the timeout for one second
from now (see Revision 1.45 isp.c notes for more on this) to give
the firmware a final chance to complete this command.
store a bitmask of whether we've set a value into ccb->ccb_h.status,
whether we're in the watchdog routine for this command now, whether
we've set a grace period for this command and whether this command is
actually done.
See comments of rev 1.45 of isp.c for more complete information.
output mailbox values we want to get back out of the chip once a mailbox
command is done. Add storage for the maximum number of output mailbox
registers to the softc.
Roll minor version number.
the handle (i.e., generation number), so we will now need a function that
will take a handle and return a flat index [ 0 .. maxhandles-1 ] for
auxillary routines that need an index to get at buddy store values
(like dma maps or xflist pointers).
mark up a sample invocation, since it is not a command internal to the
described utility.
Do not use Ar (argument) to mark up something which is not an argument
to the utility or one of its internal commands.
device with Yarrow, and although I coded for that in dev/MAKEDEV, I forgot
to _tell_ folks.
This commit adds back the /dev/urandom device (as a duplicate) of /dev/random,
until such time as it can be properly announced.
This will help the openssl users quite a lot.
This means 'options NETGRAPH' is no longer necessary in order to get
netgraph-enabled Ethernet interfaces. This supports loading/unloading
the ng_ether.ko and attaching/detaching the Ethernet interface in any
order.
Add two new hooks 'upper' and 'lower' to allow access to the protocol
demux engine and the raw device, respectively. This enables bridging
to be defined as a netgraph node, if so desired.
Reviewed by: freebsd-net@freebsd.org
(Reported by Matthew Jacob)
- Fix a couple of __inline__ (changed to __inline).
- Check also against DT_DATA_IN phase on parity/crc error.
(Merged from Pamela Delaney's changes in the Linux driver)
- Fix support for phase mismatch handling from the C code for
the C1010 (only useful for testing issue).
- Add an asynchonous notification handler for `lost device'
(AC_LOST).
world seems to interpret the spec this way
- Initialize transmit window to two instead of one; helps get things
going initially when the first packet may get dropped
- Really fix the shutdown + timeout race condition this time
The first one got screwed up by me because of rev 1.33, which was
incorrectly merged into my patches by myself, and so Ruslan (maintainer)
asked me to back them out.
Ruslan was ok with the second one, but since it needs rework, it'll be
readded later, when it doesn't conflict with the backout of the first one.
Pointy hat: alex
Beer on next meeting: ru