---
From: sos@login.dkuug.dk (S|ren Schmidt)
Subject: IDE-disk hangs - solution/patches NetBSD/FreeBSD
Summary: fixes for lost interrupts with IDE disks
Keywords: hanging-disk, IDE-disk, lost-interrupt
Due to "popular" demand I'm posting these patches to NetBSD/FreeBSD
instead of mailing them around the world :-)
As many have found out there is a problem when using IDE disks on
FreeBSD. Following is a patch that fixes the problem with lost intterrupts.
Both fixes is based on a patch posted here some month ago by
Stefan Behrens?? (sorry I've lost the original article). But anyway it
works (for me :-).
Basically it does a timeout on lost interrupts, starting the operation
again and logging and error message on the console.
It additionally makes the allready present while loop timeouts
independent of CPU speed, and adds minor numbers for easy access to
dos partitions.
* change all splnet's to splimp's
*
* Revision 2.13 1993/11/22 10:53:52 davidg
* patch to add support for SMC8216 (Elite-Ultra) boards
* from Glen H. Lowe
*
* Revision 2.12 1993/11/07 18:04:13 davidg
* fix from Garrett Wollman:
* add a return(0) at the end of ed_probe so that if the various device
* specific probes fail that we just don't fall of the end of the function.
when the machine panics.
i386/i386/locore.s:
1) got rid of most .set directives that were being used like
#define's, and replaced them with appropriate #define's in
the appropriate header files (accessed via genassym).
2) added comments to header inclusions and global definitions,
and global variables
3) replaced some hardcoded constants with cpp defines (such as
PDESIZE and others)
4) aligned all comments to the same column to make them easier to
read
5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to
/sys/i386/include/asmacros.h
6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code
7) added new global '_KERNend' to store last location+1 of kernel
8) cleaned up zeroing of bss so that only bss is zeroed
9) fix zeroing of page tables so that it really does zero them all
- not just if they follow the bss.
10) rewrote page table initialization code so that 1) works correctly
and 2) write protects the kernel text by default
11) properly initialize the kernel page directory, upages, p0stack PT,
and page tables. The previous scheme was more than a bit
screwy.
12) change allocation of virtual area of IO hole so that it is
fixed at KERNBASE + 0xa0000. The previous scheme put it
right after the kernel page tables and then later expected
it to be at KERNBASE +0xa0000
13) change multiple bogus settings of user read/write of various
areas of kernel VM - including the IO hole; we should never
be accessing the IO hole in user mode through the kernel
page tables
14) split kernel support routines such as bcopy, bzero, copyin,
copyout, etc. into a seperate file 'support.s'
15) split swtch and related routines into a seperate 'swtch.s'
16) split routines related to traps, syscalls, and interrupts
into a seperate file 'exception.s'
17) remove some unused global variables from locore that got
inserted by Garrett when he pulled them out of some .h
files.
i386/isa/icu.s:
1) clean up global variable declarations
2) move in declaration of astpending and netisr
i386/i386/pmap.c:
1) fix calculation of virtual_avail. It previously was calculated
to be right in the middle of the kernel page tables - not
a good place to start allocating kernel VM.
2) properly allocate kernel page dir/tables etc out of kernel map
- previously only took out 2 pages.
i386/i386/machdep.c:
1) modify boot() to print a warning that the system will reboot in
PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user
abort with a key on the console. The machine will wait for
ever if a key is typed before the reboot. The default is
15 seconds, but can be set to 0 to mean don't wait at all,
-1 to mean wait forever, or any positive value to wait for
that many seconds.
2) print "Rebooting..." just before doing it.
kern/subr_prf.c:
1) remove PANICWAIT as it is deprecated by the change to machdep.c
i386/i386/trap.c:
1) add table of trap type strings and use it to print a real trap/
panic message rather than just a number. Lot's of work to
be done here, but this is the first step. Symbolic traceback
is in the TODO.
i386/i386/Makefile.i386:
1) add support in to build support.s, exception.s and swtch.s
...and various changes to various header files to make all of the
above happen.
set when extended translation is turned on, thus we need to do the mailbox
unlock command no matter what value is in the extended bios flag byte as
the other extensions (ie, > 2 drive support) cause the same problems.
The code has been changed to ALWAYS unlock the mailbox interface on ALL
1542C class boards.
called once when card is attached. Solved problem with driver
getting hosed when a reset takes place.
Removed init_block array -- now part of malloced memory. No more
static declarations left.
Added code so that debug ioctl actually does something.
ifconfig is0 debug will now switch on debugging code.
Other general cleanups.
* Novell probe changed to be invasive because of too many complaints
* about some clone boards not being reset properly and thus not
* found on a warmboot. Yuck.
*
* Revision 2.10 1993/10/23 04:07:12 davidg
* increment output errors if the device times out (done via watchdog)
*
* Revision 2.9 1993/10/23 04:01:45 davidg
* increment input error counter if a packet with a bad length is
* detected.
(rick@snowhite.cis.uoguelph.ca). I am currently using it with a Microsoft
InPort busmouse, under FreeBSD Epsilon. I hadn't planned on supporting it,
but I have patched it a few times, and I guess this is now the de facto
reference version, so send me any problems or improvements.
- Gene Stark
stark@cs.sunysb.edu
October 9, 1993
Date: Tue, 19 Oct 1993 02:22:41 -40962758 (WST)
As the subject line says:
I can;t believe this typo is still here.
Has NOBODY used the isa_dmastart() routine for 16bit DMA?
I know I just hit the dma regs directly for the AHA1542,
and it appears that either everybody else does as well, or
they only use 8bit DMA (e.g. floppy)
Editors Note:
The definition of DMA2_CHN was incorrectly using IO_DMA1!
* increase maximum time to wait for transmit DMA to complete to 120us.
* call ed_reset() if the time limit is reached instead of trying
* to abort the remote DMA.
*
* Revision 2.7 1993/10/15 10:49:10 davidg
* minor change to way the mbuf pointer temp variable is assigned in
* ed_start (slightly improves code readability)
*
* Revision 2.6 93/10/02 01:12:20 davidg
* use ETHER_ADDR_LEN in NE probe rather than '6'.
then use that information to fix the enhancemode features of the
1542C/CF boards by turning them off.
When doing this I found that the Buslogic 545S does NOT properly
mimic the 1542 families AHA_INQUIRE command. It only returns 1
byte of information, when the adaptec manual clearly states that 4
bytes are to be returned. I added a printf that explains the error
when we see a 545S for now, I tried to come up with a better solution,
but it involved to much work for now.
Removed patch kit headers and rcsid strings, add $Id$.
isa.c:
Removed old #ifdef notyet isa_configure code, since it will never be
used, and I have done 90% of what it attempted to.
Add conflict checking code that searchs back through the devtab's looking
for any device that has already been found that may conflict with what
we are about to probe. Checks are mode for I/O address, memory address,
IRQ, and DRQ. This should stop the screwing up of any device that has
alread been found by other device probes.
Print out messages when we are not going to probe a device due to
a conflict so the user knows WHY something was not found. For example:
aha0 not probed due to irq conflict with ahb0 at 11
Now print out a message when a device is not found so the user knows
that it was probed for, but could not be found. For example:
ed1 not found at 0x320
For devices that have I/O address < 0x100 say that they are on the
motherboard, not on isa! The 0x100 magic number is per ISA spec. It
may seem funny that pc0 and sc0 report as being on the motherboard, but
this is due to the fact that the I/O address used is that of the keyboard
controller which IS on the motherboard. We really need to split the
keyboard probe from the display probe. It is completly legal to build
a pc with out one or the other, or even with out both!
npx.c:
Return -1 from the probe routine if we are using the Emulator so
that the i/o addresses are not printed, this is the same trick used
for 486's.
Do not print the ``Errors reported via Exception 16'', and
``Errors reported via IRQ 13'' messages any more, since these just lead
to more user confusion that anything. It still prints the message
``Error reporting broken, using 387 emulator'' so that the person is
aware that there mother board is ill.