1) Make cluster buffer list be a non-malloced chain. This eliminates
yet another 'evil' M_WAITOK and generally cleans up the code.
2) Fix write clustering for ext2fs. It was just broken. Also, ffs
clustering had an efficiency problem that more bawrites were happening
than should have been.
3) Make changes to buf.h to support the above, plus remove b_pfcent
at the request of David Greenman.
Reviewed by: davidg (partially)
- don't allow invalid timevals.
- normalize timevals as they are built - don't call timevaladd() with
a possibly invalid timeval and normalize the result.
Fixed a warning.
much as I'd like to, but the malloc stunt I tried for an interim for
sure does worse.
Now we can read and write from any kind of address-space, not only
user and kernel, using callbacks.
This may be over-generalization for now, but it's actually simpler.
structs and prototypes for syscalls.
Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.
NetBSD interface.
Increased the bogusness of the args list for mmap(). The args lists for
most of the memory mapping functions are bogus. The args lists in
syscalls.master are a little better than the ones in the args structs
currently being used, but the improvement for mmap() changed the object
code and I don't want to worry about that now.
Increased the bogusness of the args list for fcntl. BSD4.4lite2/NetBSD
uses `void *' instead of int for the third arg. This has the advantage
of working when `void *'s are longer than ints, but requires extra bogus
casts that I hope to avoid.
Fixed the args list for uname. `struct outsname' seems to be a typo,
not an old interface.
Added comments about bogus args lists for open, mount, msync, munmap,
mprotect, madvise, mincore, fcntl, semsys, msgsys and shmsys.
it 1138 times (:-() in casts and a few more times in declarations.
This change is null for the i386.
The type has to be `typedef int vop_t(void *)' and not `typedef
int vop_t()' because `gcc -Wstrict-prototypes' warns about the
latter. Since vnode op functions are called with args of different
(struct pointer) types, neither of these function types is any use
for type checking of the arg, so it would be preferable not to use
the complete function type, especially since using the complete
type requires adding 1138 casts to avoid compiler warnings and
another 40+ casts to reverse the function pointer conversions before
calling the functions.
This is here now. We can now access (the new) sysctl variables from the
kernel too and using functions to handle access is more sane now.
I will now attack sysctl variables in the rest of the kernel and get them
all converted to newspeak.
Changed vnodep -> vp for consistency with the rest of the kernel, and
changed iparams -> imgp for brevity.
kern_exec.c:
Explicitly initialized some additional parts of the image_params struct
to avoid bzeroing it. Rewrote the set-id code to reduce the number of
logical tests. The rewrite exposed a mostly benign bug in the algorithm:
traced set-id images would get ktracing disabled even if the set-id didn't
happen for other reasons.
These functions went away:
enosys (hasn't been used for some time)
enxio
enodev
enoioctl (was used only once, actually for a vop)
if_tun.c:
Continued cleaning up...
conf.h:
Probably fixed the type of d_reset_t. It is hard to tell the correct
type because there are no non-dummy device reset functions.
Removed last vestige of ambiguous sleep message strings.
dangerous than the original MNT_ASYNC. There might be some minor
security considerations due to data writes not being posted as promptly
as before. Meta-data operations are still not quite as fast as Linux,
but streaming I/O is still higher.
by functions.
tty_conf.c:
Cleaned up formatting of tables.
Removed another ARGSUSED for consistency.
conf.h:
Introduced typedefs for line discipline functions.
Backed out most of previous revision (it is done elsewhere).
to <machine/conf.h>. conf.h was mechanically generated by
`grep ^d_ conf.c >conf.h'. This accounts for part of its ugliness. The
prototypes should be moved back to the driver sources when the functions
are staticalized.
(maximum size of a socket buffer) tunable.
Permit callers of listen(2) to specify a negative backlog, which
is translated into somaxconn. Previously, a negative backlog was
silently translated into 0.
civilised manner than panicing. This only happens as a result of another
state botch somewhere else, eg: from a tty driver calling putc or b_to_q
on a closed device. Apparently, it's also been implicated in a panic
with a status (^T) event on ptys.
This change should pretty well be in it's final form now.
set in open() when CLOCAL is set unless carrier is present.
Fixed initialization of line discipline. It lived across opens.
Lines that started with the wrong discipline probably didn't work
at all, because TS_ISOPEN is only set by TTYDISC.
non-fatal. I've make it return an appropriate error to the caller instead
of panic()ing.
Handling an error condition is inherently more friendly than exploding
the kernel.. :-) The new behavior is a little closer to traditional
clists, potentially making porting a little simpler.
Suggested by: bde (many months ago, I've been using this for a while..)
The goal is to make them "user-friendly" :-)
In the end this will allow a SNMP style "getnext" function, sysctl editing
in the boot-editor and/or debugger, LKMs can define sysctl vars when
they get loaded, and remove them when unloaded and other interesting
uses for dynamic sysctl variables.
PR 795.
Set the size before one error return from sysctl_vnode() the same as before
the other. The caller might want to know about the amount successfully
read although the current caller doesn't.
at the end of each write for writes of more than 1K.
Fixed handling of residual count for early returns in writes to pty masters.
It was only adjusted in 2 out of 6 cases.
Added prototypes.
TTYHOG = 1024 bytes, 10 cblocks were reserved. This was thought to
provide 10 * CBSIZE = 1080 bytes of buffering, but if the head pointer
is at the end of a cblock, then it only provides 1 + 9 * CBSIZE = 973
bytes of buffering. This caused serious data loss for ptys because the
flow control is deterministic and requires at least TTYHOG bytes of
buffering. For ttys, if input flow control is used then there is
usually enough slop in the high watermark to avoid problems, and if
input flow control isn't used then a limit of 973 is not much different
from a limit of 1024.
Add prototypes.
Continue cleaning up new init stuff.
the !COMPAT_43 case - use a common function even when there is no
`old' function. The diffs for this are large because of code motion
to restore the function order to what it was before the pseudo-argument
changes.
Include <sys/sysproto.h> to get correct args structs and prototypes.
The diffs for this are large because the declarations of the args structs
were moved to become comments in the function headers. The comments may
actually match the automatically generated declarations right now.
Add prototypes.
filesystem layer, as was done in lite-2. Merged in some other cosmetic
changes while I was at it. Rewrote most of msdosfs_access() to be more
like ufs_access() and to include the FS read-only check.
Obtained from: partially from 4.4BSD-lite2
prototypes for all syscalls. The args structs are still declared in
comments as in VOP implementation functions. I don't like the
duplication for this, but several more layers of changes are required
to get it right. First we need to catch up with 4.4lite2, which uses
macros to handle struct padding. Then we need to catch up with NetBSD,
which passes the args correctly (as void *). Then we need to handle
varargs functions and struct padding better. I think all the details
can be hidden in machine-generated functions so that the args structs
and verbose macros to reference them don't have to appear in the core
sources.
Add prototypes.
Add bogus casts to hide the evil type puns exposed by the previous
steps. &uap[1] was used to get at the args after the first. This
worked because only the first arg in *uap was declared. This broke
when the machine- genenerated args struct declared all the args
(actually it declares extra args in some cases and depends on the
user stack having some accessible junk after the last arg, not to
mention the user args being on the stack. It isn't possible to
declare a correct args struct for a varargs syscall). The msgsys(),
semsys() and shmsys() syscall interfaces are BAD because they
multiplex several syscalls that have different types of args.
There was no reason to duplicate this sysv braindamage but now
we're stuck with it. NetBSD has reimplemented the syscalls properly
as separate syscalls #220-231.
Declare static functions as static in both their prototype and their
implementation (the latter is optional, and this misfeature was used).
Remove gratuitous #includes.
Continue cleaning up new init stuff.
valid bytes, we must also clear the B_DONE flag. Some filesystems
depend on this (incl NFS) and is probably the cause of the biodone
error and subsequent crash. Anyway this change needs to be made.
Add SA_NODEFER define to signal.h
Add ps_nodefer field to struct sigacts in signalvar.h.
Add code to kern_sig.c to handle SA_NODEFER.
If flag is set, when the signal is delivered, it is not masked automatically
from receiving the same signal again.
Reviewed by: wollman, bde
option DDB_NO_LCALLS to stop ddb getting control and broke all ddb
tracing. Now there is no option and no way for ddb to trace at
address _Xsyscall or to _Xsyscall, but tracing everywhere else
works. The previous fix did unnecessary things for Linux syscalls.
Don't bother checking that syscall frames are for user mode.
Make debugger traps inside the kernel (except at addresses _Xsyscall
and _Xsyscall+1) fatal if ddb is not configured. They "can't happen".
Add prototypes.
Remove stupid comments, e.g., /*ARGSUSED*/ for args that are used.
Prototypes are located in <sys/sysproto.h>.
Add appropriate #include <sys/sysproto.h> to files that needed
protos from systm.h.
Add structure definitions to appropriate files that relied on sys/systm.h,
right before system call definition, as in the rest of the kernel source.
In kern_prot.c, instead of using the dummy structure "args", create
individual dummy structures named <syscall>_args. This makes
life easier for prototype generation.
Add CPT_NOA type which is COMPAT with NOARGS -- do not produce argument
struct in sysproto.
Change accept, recvfrom, getsockname to CPT_NOA type.
Fix getrlimit, setrlimit argument #2 name to struct rlimit.
Instead of using a fake "compat" argument, pass a real compat int to function
if COMPAT_43 is defined. Functions involved: wait4, accept, recvfrom,
getsockname.
With the compat psuedo-argument, this introduces an argument structure
that can have two possible sizes depending on compat options.
This makes life difficult for lkm modules like ibcs2, which would
have to guess what size used in kernel when compiled. Also,
the prototype generator for these structures cannot generate proper sizes.
Now there is only one fixed structure and makes everybody happy.
I recommend these changes be introduced to 2.1 so that ibcs2, linux
lkm's generated for 2.2 can still run on a 2.1 kernel.
o optional config-file to set vars: sysnames, sysproto, sysproto_h,
syshdr, syssw, syshide, syscallprefix, switchname, namesname, sysvec.
o change syntax of syscalls.master entry:
remove argument count.
add pseudo-prototype field defining function name and arguments.
o generates correct structure definitions for all system calls
in sys/sysproto.h
o add type NOARGS: same as STD except do not create structure in
sys/sysproto.h
o add type NOPROTO: same as STD except do not create structure or function
prototype in sys/sysproto.h
New functionality provides complete prototype definitions.
Usefull for generating files for emulated systems like my new ibcs2 code.
Update syscalls.master to reflect new changes. For example, read()
entry now looks like:
3 STD POSIX { int ibcs2_read(int fd, char *buf, u_int nbytes); }
This is similar to how NetBSD generates these files.
Obtained from: other people on the net ?
1. stepping over syscalls (gdb ni) sends you to DDB, and returned
to the wrong address afterwards, with or without DDB. patch in
i386/i386/trap.c below.
2. the linux emulator (modload'ed) still causes panics with DIAGNOSTIC,
re-applied a patch posted to one of the lists...
This is a place for all things to do with conf.c and conf.h
that are not machine specific.
Other things that are at present in i386/isa/conf.c might
migrate into here..
It's the first small step in cleaning up the device interface
to make it more dynamic and to assist in more modular drivers
(i.e. both loadable via LKMs and linked in..
e.g able to add a device without having to edit conf.c)
this code is not yet used and the whole thing will be conditionally
compiled in for a while till proven useful :)
1) "obj" was't initialized properly, resulting in an important vm_page_lookup
always failing (resulting in a panic).
2) busy pages could be put on the cache queue or freed (resulting in a panic).
support for EXT2FS. Note that the Sig-11 problems appear to be caused by
this, but there is still probably an underlying VM problem that let this
clustering bug cause vnode objects to appear to be corrupted.
The direct manifestation of this bug would have been severely mis-read
files. It is possible that processes would Sig-11 on very damaged
input files and might explain the mysterious differences in system
behaviour when phk's malloc is being used.
<sys/sysproto.h> and use them (so far only) in kern/init_sysent.c.
Don't put $Id in generated files.
kern/syscalls.master:
I had to add some new fields to describe some non-orthogonal names.
E.g., the args struct for the syscall-implementing function foo()
is usually named `foo_args', but for getpid() it is named `args'.
sys/sysent.h:
sy_call_t is still incomplete to hide a couple of warnings.
definitions even though the functions are inline. If vnode_if.h was
compiled by a non-ANSI compiler, then `inline' would be defined away,
so vnode_if.h might compile correctly.
the first one in the config has priority. They can be switched using
userconfig().
i386/i386/conf.c:
Initialize the shared syscons/pcvt cdevsw entry to `nx'.
Add cdevsw registration functions.
Use devsw functions of the correct type if they exist.
i386/i386/cons.c:
Add renamed syscons entry points to constab.
i386/i386/cons.h:
Declare the renamed syscons entry points.
i386/i386/machdep.c:
Repeat console initialization after userconfig() in case the current
console has become wrong. This depends on cn functions not wiring down
anything important.
sys/conf.h:
Declare new functions.
i386/isa/isa.[ch]:
Add a function to decide which display driver has priority. Should be
done better.
i386/isa/syscons.c:
Rename pccn* -> sccn*.
Initialize CRTC start address in case the previous driver has moved it.
i386/isa/syscons.c, i386/isa/pcvt/*
Initialize the bogusly shared variable Crtat dynamically in case the
stored value was changed by the previous driver.
Initialize cdevsw table from a template.
Don't grab the console if another display driver has priority.
i386/isa/syscons.h, i386/isa/pcvt/pcvt_hdr.h:
Don't externally declare now-static cdevsw functions.
i386/isa/pcvt/pcvt_hdr.h:
Set the sensitive hardware flag so that pcvt doesn't always have lower
priority than syscons. This also fixes the "stupid" detection of the
display after filling the display with text.
i386/isa/pcvt/pcvt_out.c:
Don't be confused the off-screen cursor offset 0xffff set by syscons.
kern/subr_xxx.c:
Add enough nxio/nodev/null devsw functions of the correct type for syscons
and pcvt.
Split off cdevsw initialization in cninit() into a new function
cninit_finish() that isn't called until all hardware device drivers
have been attached. The bdevsw entry of the driver for the physical
console needs to be hooked after the physical driver has been
attached in case the attachment modified the entry.
Rearrange cninit() to avoid changing cn_tab until the driver for the
physical console has been initialized, so that the previous driver
(if any) can be used for debugging.
Start removing half-baked lint support. bdevsw functions usually have
unused args but /*ARGSUSED*/ was used for only about 5% of them.
cons.h:
Declare cn_init_finish().
autoconf.c:
Call cn_init_finish().
Start adding prototypes. Functions with bogus linkage (extern where
static is probably should be static) are explicitly declared as extern
so that the can be found easily (extern in a non-header is usually
wrong).
All:
Continue cleaning up init stuff: init functions shall be static;
INITs should be at the start of files...
Better performance -- more aggressive read-ahead
under certain circumstanses.
Mods to support clustering on small
( < PAGE_SIZE) block size filesystems (e.g. ext2fs,
msdosfs.)
changes to allow devices that don't probe (e.g. /dev/mem)
to create devfs entries
this required giving 'configure' its own SYSINIT entry
so we could duck in just before it with a DEVFS init
and some device inits..
my devfs now looks like:
./misc
./misc/speaker
./misc/mem
./misc/kmem
./misc/null
./misc/zero
./misc/io
./misc/console
./misc/pcaudio
./misc/pcaudioctl
./disks
./disks/rfloppy
./disks/rfloppy/fd0.1440
./disks/rfloppy/fd1.1200
./disks/floppy
./disks/floppy/fd0.1440
./disks/floppy/fd1.1200
also some sligt cleanups.. DEVFS needs a lot of work
but I'm getting back to it..
4k to 8k. This has a significant effect on the pipe performance. In
the future it might be good to increase this to 16k. PIPSIZ is now
tunable for experimentation.
external linkage.
Remove useless comments saying that SYSINIT() does system initialization.
shm.c:
Remove nearly useless comment that gave wrong pseudo-prototypes.
bp->b_flags has been broken for many years:
a) they didn't set B_BUSY for doing i/o. This has been fatal since
1995/07/25 when biodone() started checking that B_BUSY is set.
b) they didn't set B_INVAL for releasing the buffer. This at best
just put a useless buffer in the LRU queue for a little while.
Fix a couple of spelling errors and complete a couple of function
pointer declarations.
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..
NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..
certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)
The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.
instead of with none. The first (struct proc *) arg is used if lkmnosys()
if is actually called.
Implement lkmnosys() with the correct number and type of args so that
the first of them can be used and the others won't need to be fixed
lated.
calls.
Found by: gcc -Wstrict-prototypes after I supplied some of the 5000+
missing prototypes. Now I have 9000+ lines of warnings and errors
about bogus conversions of function pointers.
disksort is called at non-interrupt time and can be actively traversing
the list when that happens, there is a very small window of vulnerability.
Close it by protecting disksort with splbio().
Old variant returns 38400 for them, now it returns nearest matched
rounded down, expect speeds in range 0 > speed < 50 rounded up
to not produce hangup.
with interaction pty <-> serial driver with non-standard speed.
So, nothing protect us from garbadge in speed field, expect
checking for < 0 left in tty.c :-(
too much for non-open ptys, but there is normally no problem because the
l_modem(, 0) is a no-op for closed ptys provided the line discipline is
standard and MDMBUF isn't set.
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.
Introduce TS_CONNECTED and TS_ZOMBIE states. TS_CONNECTED is set
while a connection is established. It is set while (TS_CARR_ON or
CLOCAL is set) and TS_ZOMBIE is clear. TS_ZOMBIE is set for on to
off transitions of TS_CARR_ON that occur when CLOCAL is clear and
is cleared for off to on transitions of CLOCAL. I/o can only occur
while TS_CONNECTED is set. TS_ZOMBIE prevents further i/o.
Split the input-event sleep address TSA_CARR_ON(tp) into TSA_CARR_ON(tp)
and TSA_HUP_OR_INPUT(tp). The former address is now used only for
off to on carrier transitions and equivalent CLOCAL transitions.
The latter is used for all input events, all carrier transitions
and certain CLOCAL transitions. There are some harmless extra
wakeups for rare connection- related events. Previously there were
too many extra wakeups for non-rare input events.
Drivers now call l_modem() instead of setting TS_CARR_ON directly
to handle even the initial off to on transition of carrier. They
should always have done this. l_modem() now handles TS_CONNECTED
and TS_ZOMBIE as well as TS_CARR_ON.
gnu/isdn/iitty.c:
Set TS_CONNECTED for first open ourself to go with bogusly setting
CLOCAL.
i386/isa/syscons.c, i386/isa/pcvt/pcvt_drv.c:
We fake carrier, so don't also fake CLOCAL.
kern/tty.c:
Testing TS_CONNECTED instead of TS_CARR_ON fixes TIOCCONS forgetting to
test CLOCAL. TS_ISOPEN was tested instead, but that broke when we disabled
the clearing of TS_ISOPEN for certain transitions of CLOCAL.
Testing TS_CONNECTED fixes ttyselect() returning false success for output
to devices in state !TS_CARR_ON && !CLOCAL.
Optimize the other selwakeup() call (this is not related to the other
changes).
kern/tty_pty.c:
ptcopen() can be declared in traditional C now that dev_t isn't short.
Make more functions static.
tty.c:
Use tcflag_t (u_long) and cc_t instead of u_char and int/long.
Don't record values that are only evaluated once.
Compare ints using imin(), not min(). min() is for comparing u_ints.
Old versions of tty.c used the type-safe but multiple-evaluation-unsafe
macro MIN(). The args are apparently never negative; otherwise this
change would be non-cosmetic.
Don't repeat the loop test in ttywait().
tty.h:
Improve English in and formatting of comments.
Use input buffer watermarks of TTYHOG-512 (high) and (high)*7/8
(low) instead of TTYHOG/2 (high) and TTYHOG/5 (low) to agree with
some drivers. 512 is magic and some things depended on TTYHOG/2
>= TTYHOG-512 to work; now they depend on the 512 magic not changing
and TTYHOG-512 being significantly larger than 0. This should be
handled in ttsetwater().
Separate the decision about whether to do input flow control from
doing it. ttyblock() now just starts input flow control (hardware
and/or software) and there is a new function ttyunblock() to stop
it. The decisions are the same except for the watermark changes
and allowing for input expansion for PARMRK.
When flushing input, try harder at first to send a start character
if required, but give up if the first attempt fails.
cy.c, rc.c, sio.c:
Simplify: let ttyinput() handle input flow control if it is not
being bypassed. Use ttyblock() to start flow control otherwise.
rc.c:
Use same input flow control test as elsewhere: test in a more
efficient order and start flow control at >= highwater instead of
at > highwater.
string as possible and return ENOMEM if the entire string cannot
be returned. This brings the routines in line with how the man
page says they work, and how the calling routines are expecting
them to work. This allows the dummy uname() routine in libc to
obtain the version string, since the kernel version string is
longer than that normally returned by the uname() routine.
This is 3/4 of the fix for PR# 462.
Reviewed by: Bruce Evans
as is required to be POSIXLY_CORRECT and "right". I interpret
"referring to a directory" as being a directory or becoming a
directory. E.g., the trailing slashes in mkdir("/nonesuch/"),
rename("/tmp", /nonesuch/") and link("/tmp", "/root_can_like_dirs/")
are ignored because the target will become a directory if the
syscall succeeds. A trailing slash on a symlink causes the symlink
to be followed (this is a bug if the symlink doesn't point to a
directory; fix later).
queues for TIOCSETA[W]. Swapping an even number of times broke
the queue resource limits. This would have broken CRTSCTS flow
control if the clist slush list was used up.
Don'concatenate the queues for TIOCSETA[W] if one of the queues
has a resource limit of 0. Concatenation would cause a panic if
one of the queues is nonempty and the other is limited to length
0. This may have caused panics in PPPDISC.
Wake up readers after all transitions of ICANON. When ICANON is
turned off it is quite likely that characters will become available
to be read.
Reduce indentation near these changes.
on output below low water) and TS_SO_OCOMPLETE (sleep on output complete).
Most of the support for this has already been committed. Drivers should
call ttwwakeup() to handle wakeups whenever output is below low water
(and some output event causes this condition to be checked) or TS_BUSY is
cleared.
tty.c:
Fix the livelock in ttywait() properly by sleeping on output complete, not
on output below low water.
Use ttwwakeup() instead of separate select and output wakeups for all
wakeups of writers.
Add wakeups of writers for output flushes and carrier/clocal transitions.
Don't go to sleep in ttycheckoutq() if ttstart() reduces the queue to below
low water.
Use the timeout built into tsleep() in ttycheckoutq().
Optimize the select wakeup in ttwwakeup(). It seems reasonable to know
too much about the internals of tp->t_wsel now that the knowledge is
localised in tty.c.
Remove nullmodem().
It may be useful to have a null modem routine, but nullmodem()
wasn't one. nullmodem() was identical to ttymodem() except it
didn't implement MDMBUF (carrier) flow control, didn't do any
wakeups for off to on carrier transitions, and didn't flush the
i/o queues for on to off carrier transitions (flushing has the side
effect of waking up readers and writers) although it did generate
SIGHUPs. The wakeups must normally be done even if nullmodem() is
null in case something is sleeping waiting for a carrier transition.
In any case, the wakeups should be harmless. They may cause bogus
results for select(), but select() is already bogus for nonstandard
line disciplines.
different types of panics/inconsistencies with NFS clients.
Cleared PG_WANTED where appropriate.
Added checks for buffer busy in allocbuf and biodone.
Reviewed by: John Dyson
ended that fork() uses to determine the time that the process
started when calculating the elapsed time. This prevents the
ac_etime field in the accounting record from getting set to -1
if the process exists for a VERY short period of time.
ttwwakeup(). The conditions for doing the wakeup will soon become
more complicated and I don't want them duplicated in all drivers.
It's probably not worth making ttwwakeup() a macro or an inline
function. The cost of the function call is relatively small when
there is a process to wake up. There is usually a process to wake
up for large writes and the system call overhead dwarfs the function
call overhead for small writes.
Temporarily nuke TS_WOPEN. It was only used for the obscure MDMBUF
flow control option in the kernel and for informational purposes
in `pstat -t'. The latter worked properly only for ptys. In
general there may be multiple processes sleeping in open() and
multiple processes that successfully opened the tty by opening it
in O_NONBLOCK mode or during a window when CLOCAL was set. tty.c
doesn't have enough information to maintain the flag but always
cleared it in ttyopen().
TS_WOPEN should be restored someday just so that `pstat -t' can
display it (MDMBUF is already fixed). Fixing it requires counting
of processes sleeping in open() in too many serial drivers.
Don't put partial PARMRK escape sequences in the input queue. Use
MAX_INPUT = TTYHOG instead of TTYHOG directly for the maximum input
queue size. Don't use the bogus MAX_INPUT advertised in
<sys/syslimits.h>.
First of many changes required to restore lost stability to the tty
driver.
ECHONL is supposed to enable echoing of NL when ECHO is off, but it
enabled echoing of everything except NL.
VBLK vnodes isn't adequate since all NFS nodes aren't locked, either. The
result is a race condition that would lead to duplicate buffers at the
same block offset.
Submitted by: John Dyson
emul code when compiling with "options KTRACE".
ktrsyscall() was expecting an array of integers, this was passing the
address of a structure containing an array of integers..
The cosmetic problem was that it was calling the "enter syscall"
trace hook twice - this looks like a cut/paste error/typo.
notebooks where a powerfail condition (external power drop; battery
state low) is signalled by an NMI. Makes it beep instead of panicing.
Reviewed by: davidg
made a change to NFS that caused buffers at EOF to be variable size. This
had the undesired side-effect of breaking delayed writes on NFS. This
fixes it.
Submitted by: John Dyson
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
that call vnode_pager_alloc() so that a failure return can be dealt with.
This fixes a panic seen on NFS clients when a file being opened is deleted
on the server before the open completes.
syscall to allow applications linked against their libc's uname() to
work. Netscape 1.1N being a prime example, which prints:
"uname() failed. cant tell what system we're running on".
This change is a little ugly, but that's mainly because of the "interesting"
semantics of the BSDI extension.
Since ogetkerninfo() is only enabled by COMPAT_43, Netscape will only
be affected on kernels with that option (eg: "GENERIC")
Reviewed by: davidg
might not be handled by the same FS as the directory (e.g. special device
files)...so it must be special-cased. This bug is seen when doing
"ln /dev/console /dev/foo" or equivilent and first appeared after I fixed
the argument order of VOP_LINK. YUCK! There really needs to be a way of
specifying what vp to use in the VCALL; doing this could fix the strategy
and bwrite special-cases, too.
2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs
after vnode_pager_alloc() calls - the object is already guaranteed to be
persistent.
3) Removed some gratuitous casts.
VOP_CLOSE() takes `F' (file) flags, not `IO' flags. At least that's
what close() passes. I previously fixed ttylclose() to check
FNONBLOCK instead of IO_NDELAY. This broke the call from vclean()
and cleaning of ptys sometimes deadlocked.
when syscons stops mapping the console to minor MAXCONS. There is
usually no corresponding device in /dev, and the correct device has
minor 0.
cons.c:
Initialize cn_tty properly, so that CPU_CONSDEV can work.
Comment about too many variants of the console tty pointer.
machdep.c:
Return device NODEV and not error EFAULT when there is no console device.
in the wrong place. Blank padding in the right place or zero padding
would be inconsistent with user mode.
Put case 'p' in alphabetical order.
Implement %p in sprintf() too. I'd like only a single, more complete
printf() core, perhaps one based on vsnprintf().
in machdep.c (it should use the global nmbclusters). Moved the calculation
of nmbclusters into conf/param.c (same place where nmbclusters has always
been assigned), and made the calculation include an extra amount based
on "maxusers". NMBCLUSTERS can still be overrided in the kernel config
file as always, but this change will make that generally unnecessary. This
fixes the "bug" reports from people who have misconfigured kernels seeing
the network "hang" when the mbuf cluster pool runs out.
Reviewed by: John Dyson
device.
v_numoutput wasn't incremented to match the b_iodone nesting. It's still
fishy that vwakeup() clears B_WRITEINPROG before biodone() has finished;
however, B_WRITEINPROG seems to be never used.
Submitted by: Bruce Evans
1) Files weren't properly synced on filesystems other than UFS. In some
cases, this lead to lost data. Most likely would be noticed on NFS.
The fix is to make the VM page sync/object_clean general rather than
in each filesystem.
2) Mixing regular and mmaped file I/O on NFS was very broken. It caused
chunks of files to end up as zeroes rather than the intended contents.
The fix was to fix several race conditions and to kludge up the
"b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention
to page modifications that occurred via the mmapping.
Reviewed by: David Greenman
Submitted by: John Dyson
These changes solve the problem in a general way by moving the
initialization out of the individual fs_mountroot's and into swaponvp().
Submitted by: Poul-Henning Kamp
when the single user shell was terminated. These changes disallow mounting
or R/W upgrading filesystems that are dirty unless "-f" (force) option
is used with mount. /etc/rc has been modified to abort the startup if
one or more non-nfs partitions fail to mount.
Reviewed by: Poul-Henning Kamp, Rod Grimes
require specific partitions be mentioned in the kernel config
file ("swap on foo" is now obsolete).
From Poul-Henning:
The visible effect is this:
As default, unless
options "NSWAPDEV=23"
is in your config, you will have four swap-devices.
You can swapon(2) any block device you feel like, it doesn't have
to be in the kernel config.
There is a performance/resource win available by getting the NSWAPDEV right
(but only if you have just one swap-device ??), but using that as default
would be too restrictive.
The invisible effect is that:
Swap-handling disappears from the $arch part of the kernel.
It gets a lot simpler (-145 lines) and cleaner.
Reviewed by: John Dyson, David Greenman
Submitted by: Poul-Henning Kamp, with minor changes by me.
is more representative of worst case situations of 4 files/directory. (If
that last sentence doesn't make any sense, I'm not surprised. It's rather
compilcated how this all fits together....).
This should fix a problem that Ed Hudson has been complaining about where
directories with lots of symlinks could cause excessive disk I/O.
there may even be LKMs.) Also, change the internal name of `unixdomain'
to `localdomain' since AF_LOCAL is now the preferred name of this family.
Declare netisr correctly and in the right place.
Reopen the bdev for the raw partition and not the cdev if only the bdev
was open.
Don't use a bogus limit for the number of partitions to possibly reopen
(bug found by Julian).
Add function dssize() to help fix wdsize() and sdsize(). The slice
layer knows more about (un)open partitions and partition sizes than
the driver layer.
in read() and write(). FNONBLOCK is valid in ioctl() and close().
The bug caused hung ptys when a process talked to itself using nonblocking
i/o and exited while the slave pty had output to flush. ttywait() was
called and hung. Signals didn't work because the process was exiting.
`comcontrol /dev/ttyp0 drainwait 1' worked to terminate the wait. This
shows that comcontrol is not limited to hardware control. It has no i386
or driver dependencies and doesn't belong in src/sbin/i386.
Bruce
initializing proc0's frame base, too, using cpu_set_init_frame(). It's
a kludge because that macro is intended to be used only for init, but
does what we want nonetheless.
if an invalid ioctl is done on /dev/klog. logioctl() needs to return
an errno instead of -1 on a failed ioctl.
Submitted by: Mike Pritchard <mpp@mpp.com>