This code was eliminated when the PEND_INTS algorithm was added. But it was
discovered that PEND_INTS only worsen latency for FAST_INTR() routines,
which can't be marked pending.
Noticed & debugged by: dave adkins <adkin003@gold.tc.umn.edu>
enough and can cause some strange performance problems. Specifically, at
or near startup time is when the problem is worst. To reproduce
the problem, run "lat_syscall stat" from the alpha lmbench code right
after bootup. A positive side effect of this mod is that the name
cache can be set to grow again by sysctl. A noticable positive
performance impact is realized due to a larger namecache being available
as needed (or tuned.)
code. According to the crash dump, bpref is set to 445
and cgp->cg_nclusterblks is 444. Hence in the for loop,
the test fails immediately but the following failure check
(got == cgp->cg_nclusterblks) doesn't trigger because got >
cgp->cg_nclusterblks. This wreaks havoc in the code after that.
Fix: Move one source bit to the left :-)
Noticed by: Mike Hibler <mike@fast.cs.utah.edu>
Submitted by: Kirk McKusick <mckusick@McKusick.COM>
when execing a setuid/setgid binary. Code submitted by Sean Eric Fagan
(sef@freebsd.org).
Also consolidated the setuid/setgid checks into one place.
Reviewed by: dyson,sef
keep ahold of buffers, and therefore leave filesystems dirty. I haven't
been able to test, but the code compiles. Those who run -current, please
test and report back!!! (Sorry :-)).
PR: kern/3571
Submitted by: Dirk Keunecke <dk@panda.rhein-main.de>
Initially functionality is confined to 32-bit BIOS functions, however
it is envisioned that BIOS support may be enlisted for other
activities in the future.
defined, your really getting 32. Also warn about how you can't have
more than 256 pty's when your using DEVFS (non DEVFS can use more, just
the makedev script doesn't know how to make >256). it also doesn't
allocate more memory than needed in this case.
Make sure that the signal passed in TIOCSIG isn't 0 as it might cause
a panic. I personally haven't seen this happen, but after a similar
bug in syscons crashed my machine, I'm acutely aware of this one. :)
I changed a few bits here and there, mainly renaming wd82371.c
to ide_pci.c now that it's supposed to handle different chipsets.
It runs on my P6 natoma board with two Maxtor drives, and also
on a Fujitsu machine I have at work with an Opti chipset and
a Quantum drive.
Submitted by:cgull@smoke.marlboro.vt.us <John Hood>
Original readme:
*** WARNING ***
This code has so far been tested on exactly one motherboard with two
identical drives known for their good DMA support.
This code, in the right circumstances, could corrupt data subtly,
silently, and invisibly, in much the same way that older PCI IDE
controllers do. It's ALPHA-quality code; there's one or two major
gaps in my understanding of PCI IDE still. Don't use this code on any
system with data that you care about; it's only good for hack boxes.
Expect that any data may be silently and randomly corrupted at any
moment. It's a disk driver. It has bugs. Disk drivers with bugs
munch data. It's a fact of life.
I also *STRONGLY* recommend getting a copy of your chipset's manual
and the ATA-2 or ATA-3 spec and making sure that timing modes on your
disk drives and IDE controller are being setup correctly by the BIOS--
because the driver makes only the lamest of attempts to do this just
now.
*** END WARNING ***
that said, i happen to think the code is working pretty well...
WHAT IT DOES:
this code adds support to the wd driver for bus mastering PCI IDE
controllers that follow the SFF-8038 standard. (all the bus mastering
PCI IDE controllers i've seen so far do follow this standard.) it
should provide busmastering on nearly any current P5 or P6 chipset,
specifically including any Intel chipset using one of the PIIX south
bridges-- this includes the '430FX, '430VX, '430HX, '430TX, '440LX,
and (i think) the Orion '450GX chipsets. specific support is also
included for the VIA Apollo VP-1 chipset, as it appears in the
relabeled "HXPro" incarnation seen on cheap US$70 taiwanese
motherboards (that's what's in my development machine). it works out
of the box on controllers that do DMA mode2; if my understanding is
correct, it'll probably work on Ultra-DMA33 controllers as well.
it'll probably work on busmastering IDE controllers in PCI slots, too,
but this is an area i am less sure about.
it cuts CPU usage considerably and improves drive performance
slightly. usable numbers are difficult to come by with existing
benchmark tools, but experimentation on my K5-P90 system, with VIA
VP-1 chipset and Quantum Fireball 1080 drives, shows that disk i/o on
raw partitions imposes perhaps 5% cpu load. cpu load during
filesystem i/o drops a lot, from near 100% to anywhere between 30% and
70%. (the improvement may not be as large on an Intel chipset; from
what i can tell, the VIA VP-1 may not be very efficient with PCI I/O.)
disk performance improves by 5% or 10% with these drives.
real, visible, end-user performance improvement on a single user
machine is about nil. :) a kernel compile was sped up by a whole three
seconds. it *does* feel a bit better-behaved when the system is
swapping heavily, but a better disk driver is not the fix for *that*
problem.
THE CODE:
this code is a patch to wd.c and wd82371.c, and associated header
files. it should be considered alpha code; more work needs to be
done.
wd.c has fairly clean patches to add calls to busmaster code, as
implemented in wd82371.c and potentially elsewhere (one could imagine,
say, a Mac having a different DMA controller).
wd82371.c has been considerably reworked: the wddma interface that it
presents has been changed (expect more changes), many bugs have been
fixed, a new internal interface has been added for supporting
different chipsets, and the PCI probe has been considerably extended.
the interface between wd82371.c and wd.c is still fairly clean, but
i'm not sure it's in the right place. there's a mess of issues around
ATA/ATAPI that need to be sorted out, including ATAPI support, CD-ROM
support, tape support, LS-120/Zip support, SFF-8038i DMA, UltraDMA,
PCI IDE controllers, bus probes, buggy controllers, controller timing
setup, drive timing setup, world peace and kitchen sinks. whatever
happens with all this and however it gets partitioned, it is fairly
clear that wd.c needs some significant rework-- probably a complete
rewrite.
timing setup on disk controllers is something i've entirely punted on.
on my development machine, it appears that the BIOS does at least some
of the necessary timing setup. i chose to restrict operation to
drives that are already configured for Mode4 PIO and Mode2 multiword
DMA, since the timing is essentially the same and many if not most
chipsets use the same control registers for DMA and PIO timing.
does anybody *know* whether BIOSes are required to do timing setup for
DMA modes on drives under their care?
error recovery is probably weak. early on in development, i was
getting drive errors induced by bugs in the driver; i used these to
flush out the worst of the bugs in the driver's error handling, but
problems may remain. i haven't got a drive with bad sectors i can
watch the driver flail on.
complaints about how wd82371.c has been reindented will be ignored
until the FreeBSD project has a real style policy, there is a
mechanism for individual authors to match it (indent flags or an emacs
c-mode or whatever), and it is enforced. if i'm going to use a source
style i don't like, it would help if i could figure out what it *is*
(style(9) is about half of a policy), and a way to reasonably
duplicate it. i ended up wasting a while trying to figure out what
the right thing to do was before deciding reformatting the whole thing
was the worst possible thing to do, except for all the other
possibilities.
i have maintained wd.c's indentation; that was not too hard,
fortunately.
TO INSTALL:
my dev box is freebsd 2.2.2 release. fortunately, wd.c is a living
fossil, and has diverged very little recently. included in this
tarball is a patch file, 'otherdiffs', for all files except wd82371.c,
my edited wd82371.c, a patch file, 'wd82371.c-diff-exact', against the
2.2.2 dist of 82371.c, and another patch file,
'wd82371.c-diff-whitespace', generated with diff -b (ignore
whitespace). most of you not using 2.2.2 will probably have to use
this last patchfile with 'patch --ignore-whitespace'. apply from the
kernel source tree root. as far as i can tell, this should apply
cleanly on anything from -current back to 2.2.2 and probably back to
2.2.0. you, the kernel hacker, can figure out what to do from here.
if you need more specific directions, you probably should not be
experimenting with this code yet.
to enable DMA support, set flag 0x2000 for that drive in your config
file or in userconfig, as you would the 32-bit-PIO flag. the driver
will then turn on DMA support if your drive and controller pass its
tests. it's a bit picky, probably. on discovering DMA mode failures
or disk errors or transfers that the DMA controller can't deal with,
the driver will fall back to PIO, so it is wise to setup the flags as
if PIO were still important.
'controller wdc0 at isa? port "IO_WD1" bio irq 14 flags 0xa0ffa0ff
vector wdintr' should work with nearly any PCI IDE controller.
i would *strongly* suggest booting single-user at first, and thrashing
the drive a bit while it's still mounted read-only. this should be
fairly safe, even if the driver goes completely out to lunch. it
might save you a reinstall.
one way to tell whether the driver is really using DMA is to check the
interrupt count during disk i/o with vmstat; DMA mode will add an
extremely low number of interrupts, as compared to even multi-sector
PIO.
boot -v will give you a copious register dump of timing-related info
on Intel and VIAtech chipsets, as well as PIO/DMA mode information on
all hard drives. refer to your ATA and chipset documentation to
interpret these.
WHAT I'D LIKE FROM YOU and THINGS TO TEST:
reports. success reports, failure reports, any kind of reports. :)
send them to cgull+ide@smoke.marlboro.vt.us.
i'd also like to see the kernel messages from various BIOSes (boot -v;
dmesg), along with info on the motherboard and BIOS on that machine.
i'm especially interested in reports on how this code works on the
various Intel chipsets, and whether the register dump works
correctly. i'm also interested in hearing about other chipsets.
i'm especially interested in hearing success/failure reports for PCI
IDE controllers on cards, such as CMD's or Promise's new busmastering
IDE controllers.
UltraDMA-33 reports.
interoperation with ATAPI peripherals-- FreeBSD doesn't work with my
old Hitachi IDE CDROM, so i can't tell if I've broken anything. :)
i'd especially like to hear how the drive copes in DMA operation on
drives with bad sectors. i haven't been able to find any such yet.
success/failure reports on older IDE drives with early support for DMA
modes-- those introduced between 1.5 and 3 years ago, typically
ranging from perhaps 400MB to 1.6GB.
failure reports on operation with more than one drive would be
appreciated. the driver was developed with two drives on one
controller, the worst-case situation, and has been tested with one
drive on each controller, but you never know...
any reports of messages from the driver during normal operation,
especially "reverting to PIO mode", or "dmaverify odd vaddr or length"
(the DMA controller is strongly halfword oriented, and i'm curious to
know if any FreeBSD usage actually needs misaligned transfers).
performance reports. beware that bonnie's CPU usage reporting is
useless for IDE drives; the best test i've found has been to run a
program that runs a spin loop at an idle priority and reports how many
iterations it manages, and even that sometimes produces numbers i
don't believe. performance reports of multi-drive operation are
especially interesting; my system cannot sustain full throughput on
two drives on separate controllers, but that may just be a lame
motherboard.
THINGS I'M STILL MISSING CLUE ON:
* who's responsible for configuring DMA timing modes on IDE drives?
the BIOS or the driver?
* is there a spec for dealing with Ultra-DMA extensions?
* are there any chipsets or with bugs relating to DMA transfer that
should be blacklisted?
* are there any ATA interfaces that use some other kind of DMA
controller in conjunction with standard ATA protocol?
FINAL NOTE:
after having looked at the ATA-3 spec, all i can say is, "it's ugly".
*especially* electrically. the IDE bus is best modeled as an
unterminated transmission line, these days.
for maximum reliability, keep your IDE cables as short as possible and
as few as possible. from what i can tell, most current chipsets have
both IDE ports wired into a single buss, to a greater or lesser
degree. using two cables means you double the length of this bus.
SCSI may have its warts, but at least the basic analog design of the
bus is still somewhat reasonable. IDE passed beyond the veil two
years ago.
--John Hood, cgull@smoke.marlboro.vt.us
Mask the read value from the count register in order to return zero correctly
after TC, as per intel datasheet : "If it is not autoinitialised, this
register will have a count of FFFFH after TC"
comments. Remove reduntant extra addition that was unncessary, and
unneeded mask (asuming inb works correctly).
Submitted by: Stephen McKay <syssgm@dtir.qld.gov.au>
handlers don't skew the results of isa_dmastatus. The function can be
safely called with interrupts disabled.
Submitted by: Stephen McKay <syssgm@dtir.qld.gov.au>
the system is out of memory. The daemon does a minimal amount of work that
increases as the system becomes more likely to run out of memory and page in/out.
The default tuning is fairly low in background CPU usage, and sysctl variables
have been added to enable flexable operation. This is an experimental feature
that will likely be changed and improved over time.
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.
mplock.s:
- minor update of try_mplock for new algorithm where a CPU uses try_mplock
instead of get_mplock in the ISRs.
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.
apic_vector.s:
- new algorithm where a CPU uses try_mplock instead of get_mplock:
if successful continue as before.
if fail set ipending bit, mask INT (to avoid recursion), cleanup & iret.
This allows the CPU to return to successful work, while the ISR will be run
by the CPU holding the lock as part of the doreti dance.
Macros to convert the Lite2 lock manager primitives to the names used
in the kernel proper. This allows us to hide them from the lock
manager till they can be turned on.
smp.h:
declarations for the new simplelock functions.
we use lazy masking INTREN()/INTRDIS() might be called with INTs enabled.
This means another higher prio INT to the same cpu could attempt to
re-enter the critical region, but would spin waiting for the lock. Since
it is the owner, it would deadlock.
- s_lock_init()
- s_lock()
- s_lock_try()
- s_unlock()
Created lock for IO APIC and apic_imen (SMP version of imen)
- imen_lock
Code to use imen_lock for access from apic_ipl.s and apic_vector.s.
Moved this code *outside* of mp_lock.
It seems to work!!!
and he says he's happy to see forward movement in aligning our defaults
with a 16 bit world, the 8 bit folk already being veterans by this
point who know how to use userconfig.
In any case, perhaps Warner will soon come to save us all with his Dynamic
Probing(tm) feature and this will all become totally moot in any case,
so it's probably not worth arguing about either way.
included twice as a side effect of including unrelated headers, but I
removed the #include of one of the headers and will soon fix the nested
#include in the other.
the file so that this compiles without forward declarations of that
data. (It is impossible to forward-declare static data in Gnu C.
Declaring it as static is correct, but causes bogus warnings from
gcc -Wredundant-decls. Declaring it as extern works, but causes
correct warnings from gcc -pedantic and is undefined in ANSI C.
We usually declare it as extern. Here it was once really extern,
but botched staticization left it as static here and apparently-
extern in a header file.)
----------------------------------------------------------------------
another system, such as NetBSD, CVS: then name the system in this
line, otherwise delete it. CVS: Reviewed by: CVS: Before
committing changes please have someone check your work and CVS:
include their name here. If the change is trivial and you have not
else; i.e., CVS: they sent us a patch or a new module, then
include their name/email CVS: address here. If this is your work
then delete this line. CVS:
----------------------------------------------------------------------
----------------------------------------------------------------------
Moved description of sio 16650A flag to the sio section and rewrote the
description. It was in the generic console flags section.
Added undocumented options CPU_UPGRADE_HW_CACHE and WLDEBUG.
This define enables the code to ALWAYS run the 8254 timer thru the 8259 ICU.
It is ON by default.
It depricates the usage of "options SMP_TIMER_NC" in the config file.
1/ is compatible with the old route(1) in case needed.
2/ actually fixes the problem while vetting bad user input.
note: I have already fixed route(1) so the problem shouldn't occur.
if it does. use 0.0.0.0/0 instead of the word 'default' :)
1) Make sure that the region mapped by a 4MB page is
properly aligned.
2) Don't turn on the PG_G flag in locore for SMP. I plan
to do that later in startup anyway.
3) Make sure the 2nd processor has PSE enabled, so that 4MB
pages don't hose it.
We don't use PG_G yet on SMP -- there is work to be done to make that
work correctly. It isn't that important anyway...
We need to accept at least one sockaddr with zero length, in order
to be able to set the default route.
Suggested by: Phone conversation with Julian (sleep well!)
of the kernel, and also most of the dynamic parts of the kernel. Additionally,
4MB pages will be allocated for display buffers as appropriate (only.)
The 4MB support for SMP isn't complete, but doesn't interfere with operation
either.
Route(1) has a bug that sends a bad message to the kernel. The kernel
trusts it and crashes. Add some sanity checks so that
we don't trust the user quite as much any more.
(also add a comment in if_ethersubr.c)
facilitate the new saver loading/unloading notification interface
in syscons.
daemon_saver:
- M_NOWAIT was wrong, since NULL returns are not handled. Just use
M_WAITOK.
- use `ostype' instead of hard-coded "FreeBSD". Now there is no more
hard-coded string! (But, who will run this screen saver on other
OS?!)
- put macros and data declarations in a consistent order.
- -DDEAMON_ONLY and -DSHOW_HOSTNAME options added in the previous commit
are removed. Options of this kind can go stale and no one notices
because no one uses them. DEAMON_ONLY is just removed. SHOW_HOSTNAME
is made default.
snake_saver:
- use `ostype' and `osrelease' as in the daemon saver. The string changes
slightly - there was a hyphen after "FreeBSD"; now there is a space.
(It is consistent with uname -a, like the daemon server already is.)
all screen savers:
- Use the new add_scrn_saver()/remove_scrn_saver() in syscons.c
to declare loading/unloading of a screen saver. Removed reference
to `current_saver' and the variable `old_saver' as they are not
necessary anymore.
- The blank, fade and green screen savers manipulate VGA registers.
Module loading should fail for non-VGA cards.
- `scrn_blanked' is consistently treated as a number/counter rather
than boolean.
- Some savers touch `scp->start' and `scp->end' to force entire screen
update when stopping themselves. This is unnecessary now because
syscons.c takes care of that.
- cleared up many unused or unnecessary #include statements.
- Removed -DLKM from Makefiles.
YOU NEED TO RECOMPILE BOTH SCREEN SAVERS AND KERNEL AS OF THIS CHANGE.
1. Add new interface, add_scrn_saver()/remove_scrn_saver(), to declare
loading/unloading of a screen saver. The screen saver calls these
functions to notify syscons of loading/unloading events.
It was possible to load multiple savers each of which will try to
remember the previous saver in a local variable (`old_saver'). The
scheme breaks easily if the user load two savers and unload them in a
wrong order; if the first saver is unloaded first, `old_saver' in the
second saver points to nowhere.
Now only one screen saver is allowed in memory at a time.
Soeren will be looking into this issue again later. syscons is
becoming too heavy. It's time to cut things down, rather than adding
more...
2. Make scrn_timer() to be the primary caller of the screen saver
(*current_saver)(). scintr(), scioctl() and ansi_put() update
`scrn_time_stamp' to indicate that they want to stop the screen saver.
There are three exceptions, however.
One is remove_scrn_saver() which need to stop the current screen saver
if it is running. To guard against scrn_timer() calling the saver during
this operation, `current_saver' is set to `none_saver' early.
The others are sccngetc() and sccncheckc(); they will unblank the
screen too. When the kernel enters DDB (via the hot key or a
break point), the screen saver will be stopped by sccngetc().
However, we have a reentrancy problem here. If the system has been in
the middle of the screen saver...
(The screen saver reentrancy problem has always been with sccnputc()
and sccngetc() in the -current source. So, the new code is doing no
worse, I reckon.)
3. Use `mono_time' rather than `time'.
4. Make set_border() work for EGA and CGA in addition to VGA. Do
nothing for MDA.
Changes to the LKM screen saver modules will follow shortly. YOU NEED
TO RECOMPILE BOTH SCREEN SAVERS AND KERNEL AS OF THESE CHANGES.
Reviewed by: sos and bde
to this when raised, and most were in favor of at least this option
(some also asked for semaphores and messages, but I'll leave that argument
for another time :).
and released. It should use `spcl' consistently in both cases,
otherwise shift/control/alt state may not be correctly set/reset.
(Even with this fix, you can still make syscons confused and fail to
change internal state if you really want to, by installing a really
arcane and artificial keymap.)
PR: i386/4030
Reviewed by: sos
chown(). Previously, it wasn't marked for null chown()'s. We
permit null chown()s as a special case of "appropriate privilege"
- everyone has enough priviilege to not change ids (this is a better
argument than the one I gave for rev.1.13, that null changes aren't
really changes). However, POSIX.1 requires the update independently
of whether anything has changed.
Clear both the setuid and the setgid bits upon successful completion
of non-null chown()s by non-root. Previously, the setuid bit was
only changed for non-null changes of the uid, etc. POSIX.1 requires
clearing both unless the call was made by a process with "appropriate
privilege", in which case altering the bits is implementation-defined.
We define appropriate privilege as `process is root, or the change
is null', and the implementation-defined behaviour as not altering
the bits. There is no interpretation that permits clearing only
one of the bits.
Reviewed by: jdp