peripheral drivers can determine where in the devstat(9) list they are
inserted.
This requires recompilation of libdevstat, systat, vmstat, rpc.rstatd, and
any ports that depend on the devstat code, since the size of the devstat
structure has changed. The devstat version number has been incremented as
well to reflect the change.
This sorts devices in the devstat list in "more interesting" to "less
interesting" order. So, for instance, da devices are now more important
than floppy drives, and so will appear before floppy drives in the default
output from systat, iostat, vmstat, etc.
The order of devices is, for now, kept in a central table in devicestat.h.
If individual drivers were able to make a meaningful decision on what
priority they should be at attach time, we could consider splitting the
priority information out into the various drivers. For now, though, they
have no way of knowing that, so it's easier to put them in an easy to find
table.
Also, move the checkversion() call in vmstat(8) to a more logical place.
Thanks to Bruce and David O'Brien for suggestions, for reviewing this, and
for putting up with the long time it has taken me to commit it. Bruce did
object somewhat to the central priority table (he would rather the
priorities be distributed in each driver), so his objection is duly noted
here.
Reviewed by: bde, obrien
to an architecture-specific value defined in <machine/elf.h>. This
solves problems on large-memory systems that have a high value for
MAXDSIZ.
The load address is controlled by a new macro ELF_RTLD_ADDR(vmspace).
On the i386 it is hard-wired to 0x08000000, which is the standard
SVR4 location for the dynamic linker.
On the Alpha, the dynamic linker is loaded MAXDSIZ bytes beyond
the start of the program's data segment. This is the same place
a userland mmap(0, ...) call would put it, so it ends up just below
all the shared libraries. The rationale behind the calculation is
that it allows room for the data segment to grow to its maximum
possible size.
These changes have been tested on the i386 for several months
without problems. They have been tested on the Alpha as well,
though not for nearly as long. I would like to merge the changes
into 3.1 within a week if no problems have surfaced as a result of
them.
attempt to optimize forks but were essentially given-up on due to
problems and replaced with an explicit dup of the vm_map_entry structure.
Prior to the removal, they were entirely unused.
to run Solaris executables (or executables from any other ELF system)
directly off the CD-ROM without having to waste megabytes of disk
by copying them to another filesystem just to brand them.
give the same behaviour produced before today. If sysadmin sets it
to a valid ELF brand, ELF image activator will attempt to run unbranded
ELF exectutables as if they were branded with that value.
Suggested by: Dima Ruban <dima@best.net>
process to sneak in and write to or close the pipe. The read code
enters a 'piperd' state after doing the lock operation without
checking to see if the state changed, which can cause the process
to wait forever.
The code has also been documented more.
completes, change if() to KASSERT(). This is not a bug, we are
simplify clarifying and optimizing the code.
In if/else in vfs_object_create(), the failure of both conditionals
will lead to a NULL object. Exit gracefully if this case occurs.
( this case does not normally occur, but needed to be handled ).
Obtained from: Eivind Eklund <eivind@FreeBSD.org>
code when examining their fix, which caused my code (in rev 1.52) to:
- panic("soaccept: !NOFDREF")
- fatal trap 12, with tracebacks going thru soclose and soaccept
can be hung.
Add a tunable delay at the beginning of the SHUTDOWN_FINAL at_shutdown
queue, allowing time to settle before we launch into the list of things
that are expected to turn the system off.
Fix a bug in at_shutdown_pri() where the second insertion always put
the item in second position in the queue.
Reviewed by: "D. Rock" <rock@cs.uni-sb.de>
terminating c_caddr_t with extreme prejudice. Here we depended
on the "opaque" type c_caddr_t being precisely `const char *'
to do unportable pointer arithmetic.
c_caddr_t with extreme prejudice. Here the original casts to
caddr_t were to support K&R compilers (or missing prototypes),
but the relevant source files require an ANSI compiler.
c_caddr_t with extreme prejudice. Here the point of the original
cast to caddr_t was to break the warning about the const mismatch
between write(2)'s `const void *buf' and `struct uio's `char
*iov_base' (previous bitrot gave a gratuitous dependency on caddr_t
being char *). Compiling with -Wcast-qual made the cast a full
no-op.
This change has no effect on the warning for discarding `const'
on assignment to iov_base. The warning should not be fixed by
splitting `struct iovec' into a non-const version for read()
and a const version for write(), since correct const poisoning
would affect all pointers to i/o addresses. Const'ness should
probably be forgotten by not declaring it in syscalls.master.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
directory containing rc.conf.local and rc.local, and possibly other
things in the future.
This sysctl is used by the diskless startup code and new rc.conf. If
it cannot be found or is empty, the system should revert to using /etc.
where select(2) can return that a listening socket has a connected socket
queued, the connection is broken, and the user calls accept(2), which then
blocks because there are no connections queued.
Reviewed by: wollman
Obtained from: NetBSD
(ftp://ftp.NetBSD.ORG/pub/NetBSD/misc/security/patches/19990120-accept)
buggy for fifos, and no one seems to have investigated its behaviour
on other types of files. It has been broken since the Lite2 merge
in rev.1.54.
Nagged about by: Brian Feldman (green@unixhelp.org)
patch. lf can't be dereferenced after the unload attempt, in case it
was freed. Instead, decrement first and back it out if the unload failed.
This should be relatively immune to races caused by the user since the
userref count will be zero for the duration of the actual unloading and
will stop further kldunload attempts.
Submitted by: Ustimenko Semen <semen@iclub.nsu.ru>
the buffer as still being dirty. This isn't a perfect solution, but
throwing away the buffer contents will often result in filesystem
corruption and this solution will at least correctly deal with transient
errors.
Submitted by: Kirk McKusick <mckusick@mckusick.com>
B_DELWRI and B_CACHE flags, fixing a bug that showed up with NFS.
Also, a number of cases where manually inserted code has been removed
and replaced with an inline function call giving us better functional
isolation in the source.
descriptor-passing messages was calling sorflush() without checking
to see if the descriptor was actually a socket. This can cause a
crash by exiting programs that use the mechanism under certain
circumstances.
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
flag means that there is more data to be put into the socket buffer.
Use it in TCP to reduce the interaction between mbuf sizes and the
Nagle algorithm.
Based on: "Justin C. Walker" <justin@apple.com>'s description of Apple's
fix for this problem.
lock, and add some macros and function parameters to make sure that
the information get to the point where it can be put in the lock
structure.
While I'm here, add DEBUG_VFS_LOCKS to LINT.
Fix NFS file corruption problem introduced in 1.188. The valid range
was not being set properly, causing a later reference to the buffer
to clear the B_CACHE bit.
- Have the VFS lkm support use vfs_register() etc rather than having it's
own version.
- Have the syscall lkm support use syscall_register() etc rather than
having it's own verison.
- Convert the lkm driver to a module.
suggestions from Greg Lehey some time ago. In the face of multiple
potential file formats, try and give a more sensible error than just
ENOEXEC.
XXX a good case can be made that the loading process is wrong - the linker
should locate the file first (using the search paths etc), then run the
loaders to see if they recognize it. While the present system allows for
the possibility of different search paths for different formats, we do not
use it and it just makes things more complicated than they need to be.
* A function device_printf() to make pretty-printing driver messages easier.
* A function device_get_children() to query the children of a device.
* Generic implementations of BUS_ALLOC_RESOURCE and BUS_RELEASE_RESOURCE.
* Change bus_generic_print_child() so that it is actually useful.
It was nay'ed before committing on the grounds that this is not
the way to do it, and has been decided as such several times in
the past.
There is not point in loading gobs of ascii into the kernel when
the only use of that ascii is presentation to the user.
Next thing we'd be adding all section 4 man pages to the loaded
kernel as well.
The argument about KLD's is bogus, klds can store a file in
/usr/share/doc/sysctl/dev/foo/thisvar.txt with a description and
sysctl or other facilities can pick it up there.
Proper documentation will take several K worth of text for many
sysctl variables, we don't want that in the kernel under any
circumstances.
I will welcome any well thought out attempt at improving the
situation wrt. sysctl documentation, but this wasn't it.
shared signal handling when there is shared signal handling being
used.
This removes the main objection to making the shared signal handling
a standard ability in rfork() and friends and 'unconditionalising'
this code. (i.e. the allocation of an extra 328 bytes per process).
Signal handling information remains in the U area until such a time as
it's reference count would be incremented to > 1. At that point a new
struct is malloc'd and maintained in KVM so that it can be shared between
the processes (threads) using it.
A function to check the reference count and move the struct back to the U
area when it drops back to 1 is also supplied. Signal information is
therefore now swapable for all processes that are not sharing that
information with other processes. THis should addres the concerns raised
by Garrett and others.
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
from sc, vt and sio drivers. Use instead a linker_set to collect them.
Staticize ??cngetc(), ??cnputc(), etc functions in sc and vt drivers.
We must still have siocngetc() and siocnputc() as globals because they
are directly referred to by i386-gdbstub.c :-(
Oked by: bde
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.
The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.
Submitted by: Richard Seaman <dick@tar.com>
* Move the user stack from VM_MAXUSER_ADDRESS to a place below the 32bit
boundary (needed to support 32bit OSF programs). This should also save
one pagetable per process.
* Add cvtqlsv to the set of instructions handled by the floating point
software completion code.
* Disable all floating point exceptions by default.
* A minor change to execve to allow the OSF1 image activator to support
dynamic loading.
There's something that's been bugging me for a while, so I decided to fix it.
FreeBSD now will DTRT WRT DDB and DDB_UNATTENDED (!debugger_on_panic), at least
in my opinion. The behavior change is such that:
1. Nothing changes when debugger_on_panic != 0.
2. When DDB_UNATTENDED (!debugger_on_panic), if a panic occurs, the
machine will reboot. Also, if a trap occurs, the machine will
panic and reboot, unlike how it broke to DDB before. HOWEVER,
a trap inside DDB will not cause a panic, allowing full use
of DDB without having to worry about the machine being stuck
at a DDB prompt if something goes wrong during the day.
Patches for this behavior follow my signature, and it would
be a boon to anyone (like me) who uses DDB_UNATTENDED, but
actually wants the machine to panic on a trap (otherwise,
what's the use, if the machine causes a fatal trap rather than
a true panic, of debugger_on_panic?). The changes cause no
adverse behavior, but do involve two symbols becoming global
Submitted by: Brian Feldman <green@unixhelp.org>
last cleanup. Since the oid_arg2 field of struct sysctl_oid is not wide
enough to hold a long, the SYSCTL_LONG() macro has been modified to only
support exporting long variables by pointer instead of by value.
Reviewed by: bde
merge). This fixes at least hanging in revoke(2) when a somewhat
active slave pty is revoked. The hang made the window for the
null pointer bug in ufsspec_{read,write} much larger.
There are many other bugs in this area (revoke of an active fifo
at best leaks memory...).
object are not page aligned). This should fix the mount_msdos panic after a
failed attemp to mount as ffs.
Reviewed By: Matthew Dillon <dillon@apollo.backplane.com>
Archie Cobbs <archie@whistle.com>
Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>
there does not seem to be a problem with this.
PR: kern/8732
Analysis by: David G Andersen <danderse@cs.utah.edu>
Tested by: Alfred Perlstein <bright@hotjobs.com>
Submitted by: "Richard Seaman, Jr." <lists@tar.com>
Obtained from: linux :-)
Code to allow Linux Threads to run under FreeBSD.
By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.
adjusted related casts to match (only in the kernel in this commit).
The pointer was only wanted in one place in kern_exec.c. Applications
should use the kern.ps_strings sysctl instead of PS_STRINGS, so they
shouldn't notice this change.
across the kernel -> application interface, and for the one sysctl where
they were passed and actually used (kern.ps_strings), the applications
want addresses represented as u_longs anyway (the other sysctl that
passed them, kern.usrstack, has never been used).
Agreed to by: dfr, phk
Obtained from: Stephen Clawson <sclawson@cs.utah.edu>
Wakeup anyone waiting on a mount point prior to returning from umount,
whether an error occurs or not. Fixes a stat/NFS-umount race and other
potential future problems. Fix taken from bug/pr which also indicated
that the same fix has already been applied to OpenBSD and NetBSD.
This is odd, especially in the case of USB where the driver is found
in several tries: vendor specific, class specific, interface specific.
The mouse driver is found at the interface specific level...
Reviewed by: Doug Rabson (dfr@freebsd.org)
0. This makes it difficult to do efficient manipulation of the
struct pollfd since you can't leave a slot empty.
PR: 8599
Submitted-by: Marc Slemko <marcs@znep.com>
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>
problem is worked around by using an interrupt gate for the page
fault handler. This code was originally made for NetBSD/pc98 by
Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp> and has already
been in PC98 tree. Because of this bug, trap_fatal cannot show
correct page fault address if %cr2 is obtained in this function.
Therefore, trap_fatal uses the value from trap() function.
- The trap handler always enables interruption when buggy application
or kernel code has disabled interrupts and then trapped. This code
was prepared by Bruce Evans <bde@FreeBSD.org>.
Submitted by: Bruce Evans <bde@FreeBSD.org>
Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp>
can set if your hw/sw produces the "calcru negative..." message.
Setting the alternate method (sysctl -w kern.timecounter.method=1)
makes the the get{nano|micro}*() functions call the real thing at
resulting in a measurable but minor overhead.
I decided to NOT have the "calcru" change the method automatically
because you should be aware of this problem if you have it.
The problems currently seen, related to usleep and a few other corners
are fixed for both methods.
runtime. p_runtime is unsigned while p_cpulimit is not, so this avoids the
nasty side effect of the process getting killed when the runtime comes up
"negative" due to other bugs.
out interrupts for too long. If you still see the "calcru: negative
time..." message you can increase NTIMECOUNTER (see LINT).
Sideeffect is that a timecounter is required to not wrap around in
less than (1 + delta) seconds instead of the (1/hz + delta) required
until now.
Many thanks to: msmith, wpaul, wosch & bde
from an interrupt context and fsetown() wants to peek at curproc, call
malloc(..., M_WAITOK), and fiddle with various unprotected data structures.
The fix is to move the code that duplicates the F_SETOWN/FIOSETOWN state
of the original socket to the new socket from sonewconn() to accept1(),
since accept1() runs in the correct context. Deferring this until the
process calls accept() is harmless since the process can't do anything
useful with SIGIO on the new socket until it has the descriptor for that
socket.
One could make the case for not bothering to duplicate the
F_SETOWN/FIOSETOWN state and requiring the process to explicitly make the
fcntl() or ioctl() call on the new socket, but this would be incompatible
with the previous implementation and might break programs which rely on
the old semantics.
This bug was discovered by Andrew Gallatin <gallatin@cs.duke.edu>.
Many (mostly machine-dependent ones) are still missing. NIST-PCTS found
this bug for all the ioctls used to implement the POSIX tc* functions
(TIOCCBRK, TIOCDRAIN, TIOCSPGRP, TIOCSBRK, TIOCSTART and TIOCSTOP), and
I found FIOASYNC, TIOCCONS, TIOCEXCL, TIOCHPCL, TIOCNXCL, TIOCSCTTY and
TIOCSDRAINWAIT by inspection. TIOCSPGRP was ifdefed out for some reason.
Handle tcsetattr()'s historical speed conversions correctly and more
centrally:
- don't store speeds of 0 in the final termios struct. Drivers can now
depend on tp->t_ispeed and tp->t_ospeed giving the actual speed.
Applications can now depend on tcgetattr() being POSIX.1 conformant.
- convert from a proposed input speed of 0 to the proposed output speed
(except if that is 0, convert to the current output speed). Drivers
can now depend on the proposed input speed being nonzero.
- don't reject negative speeds. Negative speeds can't happen now that
speed_t is unsigned, and rejecting invalid speeds is a bug - tcsetattr()
is supposed to succeed if it can "perform any of the requested actions",
so it shouldn't fail in practice.
bio interrupts, and a truncated file that along with the precise alignment
of the planets could result in a page being freed multiple times or a
just-freed page being put onto the inactive queue.
system, the mapping from logical to physical block number may be lost.
Hence we have to check for a reconstituted buffer and redo the call to
VOP_BMAP if the physical block number has been lost.
devstat_end_transaction error message that gets printed whenever the
busy count is < 0.
This will help catch drivers that improperly implement devstat(9) support.
MALLOC_DEFINE() and MALLOC_DEFINE() is needed by the recently
reenabled "reallocblks" code, but <sys/kernel.h> was only included
if CLUSTERDEBUG was defined. This was too harmless. gcc only
warns about garbage like `SYSINIT(blech);' at file scope ...
- Interface wth the new resource manager.
- Allow for multiple drivers implementing a single devclass.
- Remove ordering dependencies between header files.
- Style cleanup.
- Add DEVICE_SUSPEND and DEVICE_RESUME methods.
- Move to a single-phase interrupt setup scheme.
Kernel builds on the Alpha are brken until Doug gets a chance to incorporate
these changes on that side.
Agreed to in principle by: dfr
This avoids the fsck-on-reboot symptoms if you're shutting down with a
hung or unreachable NFS server mounted. Also remove non-local
filesystems from the mount list to prevent the system hanging when it tries
to unmount them (for the same reason).
Drew points out that there's a good argument for forcibly removing all
"non syncable" filesystems from the mount list (eg. NFS mounts, disks
that aren't responding, etc.) as this then allows you to sync and
cleanly unmount their parents. No such change is included in this
patch.
Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
basically do a on-the-fly defragmentation of the FFS filesystem, changing
file block allocations to make them contiguous. Thanks to Kirk McKusick
for providing hints on what needed to be done to get this working.
linker. This is intended to replace kvm_mkdb etc. The first version
only does name->value lookups, but it's open ended. value->name lookups
would probably be a good thing to do too.
It's been suggested to try and connect the symbol tables to sysctl (which
is probably a more flexible way of doing it if it's done right), but that
is far more complex and difficult than I was ready to have a shot at.
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.
This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices. For more details, see the description on the PR.
Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.
PR: kern/7899
Reviewed by: bde, elvind
leaked memory on each unload and were limited to items referenced in
the kernel copy of vnode_if.c. Now a kernel module is free to create
it's own VOP_FOO() routines and the rest of the system will happily
deal with it, including passthrough layers like union/umap/etc.
Have VFS_SET() call a common vfs_modevent() handler rather than
inline duplicating the common code all over the place.
Have VNODEOP_SET() have the vnodeops removed at unload time (assuming a
module) so that the vop_t ** vector is reclaimed.
Slightly adjust the vop_t ** vectors so that calling slot 0 is a panic
rather than a page fault. This could happen if VOP_something() was called
without *any* handlers being present anywhere (including in vfs_default.c).
slot 1 becomes the default vector for the vnodeop table.
TODO: reclaim zones on unload (eg: nfs code)
removed at module unload (if in a module of course).
However; this introduces a new dependency on <sys/kernel.h> for things
that use MALLOC_DECLARE(). Bruce told me it is better to add sys/kernel.h
to the handful of files that need it rather than add an extra include to
sys/malloc.h for kernel compiles. Updates to follow in subsequent commits.
dereference a NULL pointer, causing a panic. Instead of following
s_leader to find the session id, store it in the session structure.
Jukka found the following info:
BTW - I just found what I have been looking for. Std 1003.1
Part 1: SYSTEM API [C LANGUAGE] section 2.2.2.80 states quite
explicitly...
Session lifetime: The period between when a session is created
and the end of lifetime of all the process groups that remain
as members of the session.
So, this quite clearly tells that while there is any single
process in any process group which is a member of the session,
the session remains as an independent entity.
Reviewed by: peter
Submitted by: "Jukka A. Ukkonen" <jau@jau.tmt.tele.fi>
of the input file more strict and the error messages more elaborate.
Second, the output file has slightly improved looks when >80 character
lines are concerned (I needed a 80 character line formatter anyway for
work...)."
Submitted by: Nick Hibma <nick.hibma@jrc.it>
truncated to 32 bits.
* Change the calling convention of the device mmap entry point to
pass a vm_offset_t instead of an int for the offset allowing
devices with a larger memory map than (1<<32) to be supported
on the alpha (/dev/mem is one such).
These changes are required to allow the X server to mmap the various
I/O regions used for device port and memory access on the alpha.
we can recurse when loading dependencies and that the kstack is limited
to something like 6 or 7KB. Having a single dependency caused an instant
double panic, and I stronly suspect some of the other strange "events"
that I have seen are possibly as a result of taking a couple of interrupts
with a large chunk of the stack already in use.
While here, fix a minor logic hiccup in a sanity check.
file to a stream socket. sendfile(2) is similar to implementations in
HP-UX, Linux, and other systems, but the API is more extensive and
addresses many of the complaints that the Apache Group and others have
had with those other implementations. Thanks to Marc Slemko of the
Apache Group for helping me work out the best API for this.
Anyway, this has the "net" result of speeding up sends of files over
TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU
cycles) when compared to a traditional read/write loop.
Also fix data types and printf formats while I'm here.
PR: misc/8494
Panic instead of looping forever in sbflush(). If sb_mbcnt counts
more mbufs than sb_cc counts bytes, the original code can turn into an
infinite loop of removing 0 bytes from the socket buffer until it's empty.
the NFSv3 ACCESS RPC problems a little for busy clients that do a lot of
open/close. The nfs code could probably cache the results, but I'm not
sure whether this would be legal or useful. The problem is that with
a CPU farm, on each open there would be a lookup, getattr then access RPC
then the read/write RPC activity. Caching the access results probably
isn't going to help much if the clients access lots of files. Having the
nfs_access() routine interpret the getattr results is a bit of a hack, but
it's how NFSv2 is done and it might be OK for a mount attribute for v3.
- Use TAILQ_* macros extensively instead of internal names
- use b_xflags instead of the NOLIST magic number hack in the next pointer
- clean bufs are inserted at the tail rather than the head.
- redo dirty buffer insert so that metadata (negative lbn) goes to the
tail directly rather than at the HEAD. This makes a difference when
inserting dirty data blocks in lbn sorted order since data block
insertion will not have to bypass all the metadata cruft. data is
lbn sorted since it makes sense for clustering and writeback ordering,
while metadata sorting doesn't help much since the lbn's are
meaningless when walking the list for writebacks.
Small systems will not notice much (if any) benefit from this, but really
busy systems with large dirty block lists should get a lot more.
I've tested this with softdep, and it doesn't seem to mind the change of
queueing of metadata.
Reviewed (in princible) by: dg
Obtained from: partly from John Dyson's work-in-progress patches in June.
the old true/false.
While here, have vfs_msync() only call vm_object_page_clean() with
OBJPC_SYNC if called with MNT_WAIT flags. vfs_msync() is called at unmount
time (with MNT_WAIT) and from the syncer process (formerly update).
This should make dirty mmap writebacks a little less nasty.
I have tested this a little with SOFTUPDATES enabled, but I don't normally
use it since I've been badly burned too many times.
installed.
Remove cpu_power_down, and replace it with an entry at the end of the
SHUTDOWN_FINAL queue in the only place it's used (APM).
Submitted by: Some ideas from Bruce Walter <walter@fortean.com>
clear if the check is necessary, but vfs_object_create() is called
for all vnodes and it was silly to create objects for VBLK vnodes
that don't even have a driver.
- dev != NODEV was checked for, but 0 was returned on failure. This was
fixed in Lite2 (except the return code was still slightly wrong (ENODEV
instead of ENXIO)) but the changes were not merged. This case probably
doesn't actually occur under FreeBSD.
- major(dev) was not checked to have a valid non-NULL bdevsw entry. This
caused panics when the driver for the root device didn't exist.
Fixed minor misformattings in bdevvp(). Rev.1.14 consisted mainly of
gratuitous reformattings that seem to have caused many Lite2 merge
errors.
PR: 8417
If you have problems with the "calcru" messages and processes being
killed for excessive cpu time, try to increase the NTIMECOUNTER
#define and report your findings.
- Use the system headers method for Elf32/Elf64 symbol compatability
- get rid of the UPRINTF debugging.
- check the ELF header for compatability much more completely
- optimize the section mapper. Use the same direct VM interfaces that
imgact_aout.c and kern_exec.c use.
- Check the return codes from the vm_* functions better. Some return
KERN_* results, not an errno.
- prefault the page tables to reduce startup faults on page tables like
a.out does.
- reset the segment protection to zero for each loop, otherwise each
segment could get progressively more privs. (eg: if the first was
read/write/execute, and the second was meant to be read/execute, the
bug would make the second r/w/x too. In practice this was not a
problem because executables are normally laid out with text first.)
- Don't impose arbitary limits. Use the limits on headers imposed by
the need to fit them into one page.
- Remove unused switch() cases now that the verbose debugging is gone.
I've been using an earlier version of this for a month or so.
This sped up ELF exec speed a bit for me but I found it hard to get
consistant benchmarks when I tested it last (a few weeks ago).
I'm still bothered by the page read out of order caused by the
transition from data to bss. This which requires either part filling the
transition page or clearing the remainder.
a raw partition at a nonzero offset (EINVAL should have been EXDEV;
DIOCSDINFO was broken, and DIOCWDINFO was broken because it depended
on DIOCSDINFO).
A zero offset for the raw partition should probably be enforced in
setdisklabel(), and DIOCWDINFO should probably always be handled by
first calling setdisklabel() so that writedisklabel() doesn't need to
enforce it, but this has never been done; dsioctl() has a special
check. Changes in this commit are limited to dsioctl() to preserve
bug for bug compatibility in drivers that don't use the slice code
(notably the ccd driver, which allows setting a bogus label in
DIOCWDINFO and doesn't undo the setting when writedisklabel() fails).
partition that the label ioctl is being done on just because it has
offset 0, since there is no guarantee that such a partition is large
enough to contain the label. Don't use the wrong raw partition (0
instead of RAW_PART).
This fixes problems rewriting bizarre labels (with a nonzero offset
for the 'a' partition) in newfs(8). Such labels shouldn't normally
be used, but creating them was allowed if the ioctl was done on the
raw partition, and sysinstall creates them if the root partition isn't
allocated first.
Note that allowing write access to a partition other than the one that
has been checked for write access doesn't increase security holes
significantly, since write access to any partition already allows
changing the in-core label.
This fix should be in 3.0R. Rev.1.26 of newfs/newfs.c shouldn't be
in 3.0R.
This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.
release goes out the door. We know there's a bug in the devstat
implementation in the wd driver, but bde and msmith haven't been able to
fix it yet.
So, disable the printf to avoid confusing/worrying people.
Suggested by: msmith
1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.
things, like msdosfs, do not work (panic) on devices with VMIO enabled.
FFS enable VMIO on mounted devices, and nothing previously disabled it, so,
after you mounted FFS floppy, you could not mount msdosfs floppy anymore...)
This is mostly a quick before-release fix.
Reviewed by: bde
Drastically quieten down the verbose load progress messages. They were
more useful for debugging than anything, but are beyond a joke when loading
a few dozen modules.
Simplify the ELF extended symbol table load format. Just take the main
symbol table and the string table that corresponds. This is what we will
be getting local symbols from. (needed for the alpha stack tracebacks).
Use the (optional) full symbol tables in lookups. This means we have to
furhter distinguish between symbols that can come from the dynamic linking
table and the complete table.
The alpha boot code now needs to be adapted as ddb/db_elf.c cannot use
the simpler format.
I have not implemented loading the extended symbol tables from the syscall
interface yet, just for preloaded modules.
I am not sure about the symbol resolution. I *think* it's possible that
a local symbol can be found in preference to a global, depending on the
search sequence and dependency tree.
Formerly, the heuristic involving the interpreter path took
precedence.
Also, print a better error message if the brand is missing or not
recognized. If there is no brand at all, give the user a hint that
"brandelf" needs to be run.
Implement preloading in a fairly MI way, assuming the information is
prepared.
DDB interface helpers.. Provide some support for db_kld.c so that we
don't have to export too much detail.
Debugging and cosmetic nits left in from development..
The other half of the containing file hack so modules can associate
themselves with their "file".
but I can't think of another (relatively) easy way of getting the info
since the boot-time initialization is not done immediately after "loading".
XXX module_register() gained an extra arg. This might break the alpha
compile, if so, just add a zero to get the old behavior.
should probably be moved to i386/i386/link_machdep.c (and the same for the
alpha).
Implement "deleting" a preloaded module by destroying it's tags. This is a
hack. We cannot reuse the data, it's been destroyed by relocation,
statically initialized variables have been modified, etc. Note that to
reclaim the load space is going to be more machine-dependent work.
Implement a relocate hook for machdep.c to call so that the physical
addresses get converted to the equivalent KVM addresses.
- seperate unload for preloaded linker objects.
- Don't build a kernel object if running as an a.out kernel.
- extract the real kernel name rather than hardwiring "kernel" for kldstat.
(sysctl kern.bootfile getst the full name via bootinfo)
- use real addresses on the kernel "module" rather than fictitious ones.
- preloaded module support
- search module path for file modules.
- symbols are checked to see if they are in the right containing file
before using their indexes into string tables. This is to help ddb
since it only supplies a pointer to an opaque symbol and there is no
telling which file/object/module/whatever it came from.
- symbol_values checks that the symbol is indeed belonging to the
correct symbol and string table pairs before looking up. (since there
could be many pairs, and KLD/DDB need to find out).
- different ops for files versus preload modules - the unload mechanism
is different. (a preloaded module has to be deleted on unload since
the in-core image is tainted by relocation and variables used)
- Do not build an a.out kernel module if we're running on an elf
kernel. :-) Note that it should theoretically be possible to
mix a.out and elf KLD modules providing -mno-underscores was used
to compile it, or some other symbol conversion takes place.
- Support preload modules (even though /boot/loader doesn't yet)
- Search the module path when loading files.
check off SYSINIT entries as they are run, and when more arrive, we re-sort
and restart (skipping the already-run entries).
This can *only* be done after KMEM (and malloc) is up and running - this is
fine because KLD is the only consumer of this and it's done after that.
The nice thing about this is that the SYSINIT's within preloaded KLD modules
are executed in their natural order. It should be possible to register
devices for the probes which follow, etc. (soon.. several key things
prevent this, such as use of linker sets for things like pci devices).
help track down bugs in the devstat implementation in various drivers.
(i.e., any situation where the driver does not call the devstat routines
once and only once for each transaction initiation and completion)
Prompted by: msmith
NFS_ROOT will produce kernel that cannot mount a UFS /.
Vfs type numbers must be distinct from VFS_GENERIC (and VFS_VFSCONF, but
that has the same value and should go away).
The problem happens because NFS is the first vfs (in sys/conf order) so it
gets type number 0 and conflicts harmfully with VFS_GENERIC which is also 0.
The conflict is apparently harmless in the usual case when another vfs
gets type number 0, because nfs is the only vfs that has sysctls.
Inital fix by: Dima <dima@tejblum.dnttm.rssi.ru>
Reason why it worked by: bde
Reviewed by: Luoqi Chen <luoqi@watermarkgroup.com>
Fixed problem where write()s can get lost due to buffers flagged B_DELWRI
being improperly released in brelse().
The last consumer of this code (the old SCSI system) has left us and
the CAM code does it's own bouncing. The isa dma system has been
doing it's own bouncing for a while too.
Reviewed by: core