bp->b_flags has been broken for many years:
a) they didn't set B_BUSY for doing i/o. This has been fatal since
1995/07/25 when biodone() started checking that B_BUSY is set.
b) they didn't set B_INVAL for releasing the buffer. This at best
just put a useless buffer in the LRU queue for a little while.
Fix a couple of spelling errors and complete a couple of function
pointer declarations.
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..
NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..
certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)
The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.
calls with a byte size of 1. This special case was not
correctly emulated. Now programs such as a simple 'ls' to a commercial
Macintosh emulator called 'executor' will work correctly.
instead of with none. The first (struct proc *) arg is used if lkmnosys()
if is actually called.
Implement lkmnosys() with the correct number and type of args so that
the first of them can be used and the others won't need to be fixed
lated.
nfsm_rpchead() has been called with the wrong number of args and misplaced
args since someone added new args in the middle for nfsv3.
Here's another one that would be important on 64-bit systems. VOP_READDIR
takes a `u_int **cookies' arg.
Submitted by: Bruce Evans <bde@zeta.org.au>
This is still very green, but I have managed to get my modem working.
Lots of work still to do, but now at least we can commit it. /phk
Reviewed by: phk
Submitted by: Andrew McRae <andrew@mega.com.au>
This change forces the controller drivers to allocate a scsibus_data struct
via a call to scsi_alloc_bus(), fill in the adapter_link field, and optionally
modify any other fields of the struct. Scsi_alloc_bus() initializes all fields
to the default, so the changes in most drivers are very minimal. For drivers
that support Wide controllers, the maxtarg field will have to be updated to
allow probing of all targets (for an example, look at the aic7xxx driver).
Scsi_attachdevs() now takes a scsibus_data* as its argument instead of an
sc_link*. This allows us to expand the role of the scsibus_data struct for
other bus level configuration setings (max number of transactions, current
transaction opennings, etc for better tagged queuing support).
Reviewed by: Rodney Grimes <rgrimes>, Peter Dufault <dufault>, Julian Elischer <julian>
Change some leading spaces to tabs.
This change forces the controller drivers to allocate a scsibus_data struct
via a call to scsi_alloc_bus(), fill in the adapter_link field, and optionally
modify any other fields of the struct. Scsi_alloc_bus() initializes all fields
to the default, so the changes in most drivers are very minimal. For drivers
that support Wide controllers, the maxtarg field will have to be updated to
allow probing of all targets (for an example, look at the aic7xxx driver).
Scsi_attachdevs() now takes a scsibus_data* as its argument instead of an
sc_link*. This allows us to expand the role of the scsibus_data struct for
other bus level configuration setings (max number of transactions, current
transaction opennings, etc for better tagged queuing support).
Reviewed by: Rodney Grimes <rgrimes>, Peter Dufault <dufault>, Julian Elischer <julian>
Garrett,
Here are some patches for the rate limiting code. It should be faster,
and in particular it doesn't leak malloc'd memory any more when rate_limit'ing
a phyint.
It now uses an mbuf chain at each vif, instead of the static queue array.
This means that the MAXQSIZE is now variable per vif (although there is no
interface to change it other than a debugger); this is an area for more
experimentation.
Bill
Submitted by: Bill Fenner <fenner@parc.xerox.com>
Implement the slip/ppp "hotchar" detection to improve latency
Debug the L_RINT bypass code..
Fix an interesting feature that caused 8-bit chars to loose their top bit
in some circumstances..
This finishes the remaining outstanding problems that I'm aware of, with
the exception of efficiency... Optimizing can come later after it's fully
debugged.
Claim the major numbers (before sombedoy else jumps in again and
claims the slots for his foocd driver :-), install all the hooks that
are required.
While i've been at this, i've cleaned up some of the routines at the
end of i386/conf.c; all the importers of the latest CDROM drivers
forgot to fill in the appropriate information. The `ata' driver
(vapourware?) does only occupy a slot in the bdevsw[] array, btw.
The actual import of the code does require a minor change in the SCSI
subsystem, and i want to have this reviewed by Peter first, so it will
be deferred for some days. The driver is already working for me
though.
Submitted by: akiyama@kme.mei.co.jp (Shunsuke Akiyama)
calls.
Found by: gcc -Wstrict-prototypes after I supplied some of the 5000+
missing prototypes. Now I have 9000+ lines of warnings and errors
about bogus conversions of function pointers.
Note, I tested this on a NEC Versa, IBM 750C, and a IBM 755CX w/out
problems. The card still works fine in TP mode.
Submitted by: schwarz@alpharel.com (Steve Schwarz)
Reviewed by: jleppek@suw2k.ess.harris.com (James Leppek)
actually a timeout only. The existing behaviour caused a
mcd0: timeout getreply
at halt/reboot time.
Submitted by: graichen@sirius.physik.fu-berlin.de (Thomas Graichen)
traps occurred. This also helps ddb backtrace through trap frames.
Backtracing through syscall and interrupt frames still doesn't work
but it is relatively unimportant and more expensive to fix.
ISA GAT mode and hidden refresh seem to cause reliability problems
on Saturn based systems and are now reported when booting with '-v'.
Submitted by: Danny J. Zerkel <dzerkel@feephi.phofarm.com>
moved to the driver proper, so that <machine/si.h> can be #included by user
programs without needing to include stuff from /sys/i386/isa..
Various (now) redundant features removed, eg: the locks on IXANY and HWFLOW
as these are now done with the "initial" and "lock" termios devices.
Note that it still (for reasons unknown) appears to be masking data to
7-bit with ppp - hence the cleanup to support the debugging via 'sicontrol'
This was originally ported to BSDI by Andy Rutter <andy@acronym.co.uk>.
At the end of the day, this code has very little in common with Andy's
version, or the Specialix SYSV version. Essentially it has been gradually
and almost completely rewritten, with LOTS of advice and inspiration from
Bruce Evans. There are a couple of missing bits still, but they are minor.
The user-mode "sicontrol" program is in sad shape and will come in soon.
Transparent printing died a timely death.. Maybe later..
Jeremy Rolls @ Specialix (Development directory) has confirmed this is OK
to distribute, and Andy personally sent me his version that I started from.
Although this driver stood up to a nasty stress-test in this form, I am not
confident that there are no nasty bugs lurking.
People are welcome to try it, but dont go out and buy one just yet.. :-)
And *DONT* use it on a mission-critical machine... This is ALPHA QUALITY!
of "__volatile". Note also that the original mods that were submitted
by me were as a result of a discussion between various FreeBSD contributors.
Submitted by: peter@haywire.dialix.com (Peter Wemm)
for return values. It just so happens that in the cases where it is likely
to fail, it is okay to change the M_NOWAIT to M_WAITOK -- and all will
be well. This problem was manfest as a panic very regularly on a 4MB
system right after bootup.
disksort is called at non-interrupt time and can be actively traversing
the list when that happens, there is a very small window of vulnerability.
Close it by protecting disksort with splbio().
SunOS and SCO. You can then even use the pipe as a cheap fifo stack
(yuck!). A semantic change also important (but not limited) to iBCS2
compatibility.
Submitted by: swallace
to replace the very poor, original implementation of Scatter/Gather operations.
Use a bit (that was freed up with the rewrite above) in the SCB control byte
to designate commands that should allow disconnection. The kernel driver
makes this decision now instead of the sequencer since the sequencer can't
do the indexing very efficiently.
This commit drops the sequencer from 426 instructions to 390 most likely
freeing enough space to do a target mode implementation.
The first could occur because the original code would continue to reset
the SCSIID register while waiting for a selection. This could potentially
conflict with a reconnect since a successfull reconnect will also set the
SCSIID register. The fix is to use a separate wait loop after starting
a selection (as was done a few revisions ago).
The second probably never happens, but it was possible for a target to
reconnect while there was a pending SCB on the waiting list and not get
noticed. The fix was to remove a supurflous check of the scb waiting
list.
Detect in nfsrv_readdirplus when a filesystem soes not support VFS_VGET and
return NFSERR_NOTSUPP so that the client will use ordinary readdir instead.
Old variant returns 38400 for them, now it returns nearest matched
rounded down, expect speeds in range 0 > speed < 50 rounded up
to not produce hangup.
with interaction pty <-> serial driver with non-standard speed.
So, nothing protect us from garbadge in speed field, expect
checking for < 0 left in tty.c :-(
too much for non-open ptys, but there is normally no problem because the
l_modem(, 0) is a no-op for closed ptys provided the line discipline is
standard and MDMBUF isn't set.
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.
Call output process whether or not there is any output. The output
process may be overloaded to handle hardware flow control and
hardware output completions.
hardware. Set the sleep-on flag for the address so there is more
than a small chance that the sleep address is actually used (this
used to work by timing out). Don't bother clearing the sleep-on
flag after a timeout here or elsewhere since leaving it set just
generates a few null calls to wakeup().
Introduce TS_CONNECTED and TS_ZOMBIE states. TS_CONNECTED is set
while a connection is established. It is set while (TS_CARR_ON or
CLOCAL is set) and TS_ZOMBIE is clear. TS_ZOMBIE is set for on to
off transitions of TS_CARR_ON that occur when CLOCAL is clear and
is cleared for off to on transitions of CLOCAL. I/o can only occur
while TS_CONNECTED is set. TS_ZOMBIE prevents further i/o.
Split the input-event sleep address TSA_CARR_ON(tp) into TSA_CARR_ON(tp)
and TSA_HUP_OR_INPUT(tp). The former address is now used only for
off to on carrier transitions and equivalent CLOCAL transitions.
The latter is used for all input events, all carrier transitions
and certain CLOCAL transitions. There are some harmless extra
wakeups for rare connection- related events. Previously there were
too many extra wakeups for non-rare input events.
Drivers now call l_modem() instead of setting TS_CARR_ON directly
to handle even the initial off to on transition of carrier. They
should always have done this. l_modem() now handles TS_CONNECTED
and TS_ZOMBIE as well as TS_CARR_ON.
gnu/isdn/iitty.c:
Set TS_CONNECTED for first open ourself to go with bogusly setting
CLOCAL.
i386/isa/syscons.c, i386/isa/pcvt/pcvt_drv.c:
We fake carrier, so don't also fake CLOCAL.
kern/tty.c:
Testing TS_CONNECTED instead of TS_CARR_ON fixes TIOCCONS forgetting to
test CLOCAL. TS_ISOPEN was tested instead, but that broke when we disabled
the clearing of TS_ISOPEN for certain transitions of CLOCAL.
Testing TS_CONNECTED fixes ttyselect() returning false success for output
to devices in state !TS_CARR_ON && !CLOCAL.
Optimize the other selwakeup() call (this is not related to the other
changes).
kern/tty_pty.c:
ptcopen() can be declared in traditional C now that dev_t isn't short.
Make more functions static.
tty.c:
Use tcflag_t (u_long) and cc_t instead of u_char and int/long.
Don't record values that are only evaluated once.
Compare ints using imin(), not min(). min() is for comparing u_ints.
Old versions of tty.c used the type-safe but multiple-evaluation-unsafe
macro MIN(). The args are apparently never negative; otherwise this
change would be non-cosmetic.
Don't repeat the loop test in ttywait().
tty.h:
Improve English in and formatting of comments.
Use input buffer watermarks of TTYHOG-512 (high) and (high)*7/8
(low) instead of TTYHOG/2 (high) and TTYHOG/5 (low) to agree with
some drivers. 512 is magic and some things depended on TTYHOG/2
>= TTYHOG-512 to work; now they depend on the 512 magic not changing
and TTYHOG-512 being significantly larger than 0. This should be
handled in ttsetwater().
Separate the decision about whether to do input flow control from
doing it. ttyblock() now just starts input flow control (hardware
and/or software) and there is a new function ttyunblock() to stop
it. The decisions are the same except for the watermark changes
and allowing for input expansion for PARMRK.
When flushing input, try harder at first to send a start character
if required, but give up if the first attempt fails.
cy.c, rc.c, sio.c:
Simplify: let ttyinput() handle input flow control if it is not
being bypassed. Use ttyblock() to start flow control otherwise.
rc.c:
Use same input flow control test as elsewhere: test in a more
efficient order and start flow control at >= highwater instead of
at > highwater.
interface is no longer IFF_UP.
The test for IFF_UP in ifpromisc is only useful while enabling IFF_PROMISC
and the higher levels of the bpf code do not allow for the possibility of
failure while shutting down. This is a trivial change.
Also, fixes PR#522.
string as possible and return ENOMEM if the entire string cannot
be returned. This brings the routines in line with how the man
page says they work, and how the calling routines are expecting
them to work. This allows the dummy uname() routine in libc to
obtain the version string, since the kernel version string is
longer than that normally returned by the uname() routine.
This is 3/4 of the fix for PR# 462.
Reviewed by: Bruce Evans
umountable file systems, hung processes, or system panics:
- Some operations could return without decrementing the vnode
reference count.
- Some operations could leave the vnode locked.
- Generalize the /kern/rootdev & rrootdev files so that they
are no longer special cased in kernfs_lookup().
Note: procfs, fdescfs, and most of the other miscfs/* file systems
also suffer from the same type of problems and I will work on
fixing them one at a time.
BUS DEVICE RESET followed by BUS RESET failure recovery strategy including
the necesary renegotiation of sync/wide transfers after recovery completes.
Clean up debugging code to make it more finely selectable. Reset code
debugging is enabled for now so I can get more feedback on how this
code behaves in real life.
as is required to be POSIXLY_CORRECT and "right". I interpret
"referring to a directory" as being a directory or becoming a
directory. E.g., the trailing slashes in mkdir("/nonesuch/"),
rename("/tmp", /nonesuch/") and link("/tmp", "/root_can_like_dirs/")
are ignored because the target will become a directory if the
syscall succeeds. A trailing slash on a symlink causes the symlink
to be followed (this is a bug if the symlink doesn't point to a
directory; fix later).
queues for TIOCSETA[W]. Swapping an even number of times broke
the queue resource limits. This would have broken CRTSCTS flow
control if the clist slush list was used up.
Don'concatenate the queues for TIOCSETA[W] if one of the queues
has a resource limit of 0. Concatenation would cause a panic if
one of the queues is nonempty and the other is limited to length
0. This may have caused panics in PPPDISC.
Wake up readers after all transitions of ICANON. When ICANON is
turned off it is quite likely that characters will become available
to be read.
Reduce indentation near these changes.
on output below low water) and TS_SO_OCOMPLETE (sleep on output complete).
Most of the support for this has already been committed. Drivers should
call ttwwakeup() to handle wakeups whenever output is below low water
(and some output event causes this condition to be checked) or TS_BUSY is
cleared.
tty.c:
Fix the livelock in ttywait() properly by sleeping on output complete, not
on output below low water.
Use ttwwakeup() instead of separate select and output wakeups for all
wakeups of writers.
Add wakeups of writers for output flushes and carrier/clocal transitions.
Don't go to sleep in ttycheckoutq() if ttstart() reduces the queue to below
low water.
Use the timeout built into tsleep() in ttycheckoutq().
Optimize the select wakeup in ttwwakeup(). It seems reasonable to know
too much about the internals of tp->t_wsel now that the knowledge is
localised in tty.c.
Remove nullmodem().
It may be useful to have a null modem routine, but nullmodem()
wasn't one. nullmodem() was identical to ttymodem() except it
didn't implement MDMBUF (carrier) flow control, didn't do any
wakeups for off to on carrier transitions, and didn't flush the
i/o queues for on to off carrier transitions (flushing has the side
effect of waking up readers and writers) although it did generate
SIGHUPs. The wakeups must normally be done even if nullmodem() is
null in case something is sleeping waiting for a carrier transition.
In any case, the wakeups should be harmless. They may cause bogus
results for select(), but select() is already bogus for nonstandard
line disciplines.
essential when I fix excessive wakeups for output-below-low-water.
In cy.c and sio.c, wake up via the driver start routine to also
eliminate duplicated code involving the clearing of TS_TTSTOP.
Always (except in code to be replaced soon) call driver start
routine directly instead of going through ttstart().
Amancio. There is some SoundSource support here that is primitive and
probably doesn't work, but I'll let the two submitters let me know
how my integration of that was since I don't have this card to test.
I've only tested this on my GUS MAX since it's all I have.
This all probably needs to be re-done anyway since we're widely variant
from the original VOXWARE source in the current layout.
Submitted by: Amancio Hasty and Jim Lowe
Obtained from: Hannu Savolainen
floppy DMA buffers...use avail_start not "first". Removed duplicate
(and wrong) declaration of phys_avail[].
Submitted by: Bruce Evans, but fixed differently by me.
case, multicast options are not passed to ip_mforward().) The previous
version had a wrong test, thus causing RSVP mrouters to forward RSVP messages
in violation of the spec.
This finishes making the kernel compile without -O.
The "optimized" asm version of the function being inlined
(translate_bytes()) uses slow instructions. On a 486, assuming
everything is in the cache (unlikely), it is 21/15 times slower
than the dumb C version and 21/3 times slower than the best
possible bytewise method.
for the kernel, but gcc provides an inline version of it if the
kernel is compiled with -O.
The inline memcmp() is OK for small compares and is better than
the dumb kernel bcmp() in all cases, but it has been hiding the
library memcmp() which is 4 times faster for large compares.
don't go away when the kernel is compiled with -O.
The functions are backed up by extern versions in cd9660_util.c,
but these versions are disabled by `#ifdef __notanymore__'. They
could have been enabled by using `#if defined(__notanymore__) ||
!defined(__OPTIMIZE__)' but then I would have had to check that
they still work. The correct way to handle all this is to replace
`extern inline' by `EXTERN_INLINE' and define `EXTERN_INLINE' as
`extern inline' in most modules and as empty in one module.
didn't work are somewhat bogusly optimized away before the constraint
is checked. We still expect constants passed to inline functions to
remain constant, but if the compiler ever decides that they aren't
constant then it will just generate slightly slower code instead of
an error.
Declare `cheat' as static. It was bogusly shared between the aha1742 and
ultrastor drivers.
Even static variables should have unique names so that they can be
debugged, but fixing them can wait.
different types of panics/inconsistencies with NFS clients.
Cleared PG_WANTED where appropriate.
Added checks for buffer busy in allocbuf and biodone.
Reviewed by: John Dyson
when it is moved to an NFS filesystem from from another filesystem and /bin/mv
failed to set the file ownership during the move.
I believe that this bug is present in STABLE but I have not tested it. The fix
would be the same in STABLE even though the code has changed quite considerably
in CURRENT.
ended that fork() uses to determine the time that the process
started when calculating the elapsed time. This prevents the
ac_etime field in the accounting record from getting set to -1
if the process exists for a VERY short period of time.
ttwwakeup(). The conditions for doing the wakeup will soon become
more complicated and I don't want them duplicated in all drivers.
It's probably not worth making ttwwakeup() a macro or an inline
function. The cost of the function call is relatively small when
there is a process to wake up. There is usually a process to wake
up for large writes and the system call overhead dwarfs the function
call overhead for small writes.
Temporarily nuke TS_WOPEN. It was only used for the obscure MDMBUF
flow control option in the kernel and for informational purposes
in `pstat -t'. The latter worked properly only for ptys. In
general there may be multiple processes sleeping in open() and
multiple processes that successfully opened the tty by opening it
in O_NONBLOCK mode or during a window when CLOCAL was set. tty.c
doesn't have enough information to maintain the flag but always
cleared it in ttyopen().
TS_WOPEN should be restored someday just so that `pstat -t' can
display it (MDMBUF is already fixed). Fixing it requires counting
of processes sleeping in open() in too many serial drivers.
Temporarily nuke TS_WOPEN. It was only used for the obscure MDMBUF
flow control option in the kernel and for informational purposes
in `pstat -t'. The latter worked properly only for ptys. In
general there may be multiple processes sleeping in open() and
multiple processes that successfully opened the tty by opening it
in O_NONBLOCK mode or during a window when CLOCAL was set. tty.c
doesn't have enough information to maintain the flag but always
cleared it in ttyopen().
TS_WOPEN should be restored someday just so that `pstat -t' can
display it (MDMBUF is already fixed). Fixing it requires counting
of processes sleeping in open() in too many serial drivers.
Don't put partial PARMRK escape sequences in the input queue. Use
MAX_INPUT = TTYHOG instead of TTYHOG directly for the maximum input
queue size. Don't use the bogus MAX_INPUT advertised in
<sys/syslimits.h>.
First of many changes required to restore lost stability to the tty
driver.
ECHONL is supposed to enable echoing of NL when ECHO is off, but it
enabled echoing of everything except NL.
VBLK vnodes isn't adequate since all NFS nodes aren't locked, either. The
result is a race condition that would lead to duplicate buffers at the
same block offset.
Submitted by: John Dyson
This is performed by using a line similar to:
controller scbus0 at ahc0 bus 1
to wire scbus0 to the second bus on an adaptec 2742T controller.
Reviewed by: Peter Dufault(dufault@hda.com), Rod Grimes(rgrimes@FreeBSD.org)
buses on multi-bus controllers. Currently only affects the 274xT controllers.
Reviewed by: Peter Dufault(dufault@hda.com), Rod Grimes(rgrimes@FreeBSD.org)
emul code when compiling with "options KTRACE".
ktrsyscall() was expecting an array of integers, this was passing the
address of a structure containing an array of integers..
The cosmetic problem was that it was calling the "enter syscall"
trace hook twice - this looks like a cut/paste error/typo.
Submitted by: Andrew McRae <andrew@mega.com.au>
Some initial commits from the pcmcia stuff, to make life easier for the
testers.
We will use the name "pccard" since that is really the buzzword at present.
notebooks where a powerfail condition (external power drop; battery
state low) is signalled by an NMI. Makes it beep instead of panicing.
Reviewed by: davidg
associated files.
Submitted by: leo@dachau.marco.de (Matthias Pfaller)
Not-obtained from: NetBSD. Instead sent directly to me by Matthias.
(Sorry, this is to prevent people from claiming i might have gotten
this from NetBSD. :)
probes). Apart from there being no reason to set SCSI_NOSLEEP on every
tape command, this prevents controller drivers from sleeping when resources
are fully utilized causing unecessary "Oops not queued" errors. This is
only noticed for controllers that can run out of resources like the
27/2842 adaptec controllers. Before this fix, it is almost impossible to
perform extended tape operations if more than one scsi disk is on the
bus with the tape drive with these controllers. This does not address a
similar problem that could occur if devices are probed while other targets
are active since SCSI_NOSLEEP will still be set in that case.
made a change to NFS that caused buffers at EOF to be variable size. This
had the undesired side-effect of breaking delayed writes on NFS. This
fixes it.
Submitted by: John Dyson
the problem "when a file is truncated on the server after being written on
a client under NFSv3, the client doesn't see the size drop to zero".
(As you noted, the problem is that NMODIFIED wasn't being cleared by nfs_close
when it flushed the buffers. After checking through the code, the only place
where NMODIFIED was used to test for the possibility of dirty blocks was in
nfs_setattr(). The two cases are safe to do when there aren't dirty blocks,
so I just took out the tests. Unfortunately, testing for
v_dirtyblkhd.lh_first being non-null is not sufficient, since there are
times when the code moves blocks to the clean list and then back to the
dirty list.)
Submitted by: rick@snowhite.cis.uoguelph.ca
Future Domain TMC-885 controllers. These beasts were just different enough in
a number of perverse ways to be recognised but not work with the seagate
stuff. I also whacked in blind transfers for DATAIN and DATAOUT phases - this
more than doubles my throughput. If you're dubious about that, comment out the
definition of SEA_BLINDTRANSFER. Anyway if you're running an ST01 or TMC-950
controller, please give this a go, I'd like to see if anything's broken for
those beasts.
Submitted by: Stephen Hocking <sysseh@devetir.qld.gov.au>
what CSRG had, plus make things like, TYPE, REVISION, and BRANCH
easy to set, and derive RELEASE and VERSION from them.
Kill the JUST_TELL_ME hack, it is no longer needed.
Kill DISTNAME, I could find no reveference to it any place in the
source tree.
Now I just need to rework a few bits in release/Makefile, but want
to wait and talk to jkh about that.
Oh, and your now all running:
TYPE="FreeBSD"
REVISION="2.2"
BRANCH="CURRENT"
and the -BUILD-yymmdd is dead and gone. The date was already in the
version[] string, no need for it to be there in 2 formats!
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
(on an i486, 10 cycles (+ cache misses) instead of 15). The
change should be a no-op if the compiler is any good. The best
possible i*86 code for the same algorithm is only 1 more cycle
faster on i486's so I don't want to bother implementing an
assembler version.
scanc() is a bottleneck for OPOST processing. It is naturally
about 4 times as slow as bcopy() on 32-bit systems.
were two races:
- q_to_b() might unexpectedly return 0 (e.g, after a keyboard signal
flushes the output queue and isn't echoed). ansi_put() interprets
0 bytes as 4GB...
- more output (e.g. for echoes) might arrive afer q_to_b() returns 0.
Then scstart() returns presumably and the new output might not be
handled for a long time.
Remove unused function scxint().
Fix prototypes (foo() isn't a prototype).