for as much as one second, but no more. Allows a miscreant to
double-time march the clock, but no worse.
XXX Unlike putting negative deltas in a while(1), performing small
positive steps inside of a while(1) will return EPERM for the
unpermitted ones. Repeated negative deltas are clamped without
error (but the kernel does log a notice).
1 second prior to the highest the clock has run so far. This allows
time adjusters like xntpd to do their work, but the worst a miscreant
can do is "freeze" the clock, not go back in time.
We still need to decide on an algorithm to clamp positive adjustments.
As it stands, it is possible to achieve arbitrary negative adjustments
by "wrapping" time around.
PR: 10361
condition ( bufspace > hibufspace ), an inappropriate scan of the empty
queue was performed looking for buffer space to free up.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
ufs/ufs/ufs_readwrite.c kern/vfs_bio.c
Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>
NetBSD compatible.
Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that
the offset is set in the struct uio).
Factor out some common code from read/pread/write/pwrite syscalls.
the address of the ps_strings structure to the process via %ebx.
For other kinds of binaries, %ebx is still zeroed as before.
Submitted by: Thomas Stephens <tas@stephens.org>
Reviewed by: jdp
* bus_setup_intr() as a wrapper for BUS_SETUP_INTR
* bus_teardown_intr() as a wrapper for BUS_TEARDOWN_INTR
* device_get_nameunit() which returns e.g. "foo0" for name "foo" and unit 0.
* device_set_desc_copy() malloc a copy of the description string.
* device_quiet(), device_is_quiet(), device_verbose() suppress probe message.
Add one method to the BUS interface, BUS_CHILD_DETACHED() which is called
after the child has been detached to allow the bus to clean up any memory
which it allocated on behalf of the child.
I also fixed a bug which corrupted the list of drivers in a devclass if
a driver was added to more than one devclass.
if uio->uio_offset != -1. This fixes a problem with aio_read/write
and permits a straightforward implementation of pread/pwrite.
PR: kern/8669
Submitted by: John Plevyak <jplevyak@inktomi.com>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
kern.chroot_allow_open_directories = 0
chroot(2) fails if there are open directories.
kern.chroot_allow_open_directories = 1 (default)
chroot(2) fails if there are open directories and the process
is subject of a previous chroot(2).
kern.chroot_allow_open_directories = anything else
filedescriptors are not checked. (old behaviour).
I'm very interested in reports about software which breaks when
running with the default setting.
space. When doing this, it is possible to for another process to attempt
to get an exclusive lock on the vnode and deadlock the mmap/read
combination when the uiomove() call tries to obtain a second
shared lock on the vnode. There is still a potential deadlock
situation with write()/mmap().
Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Luoqi Chen <luoqi@freebsd.org>
Delimmitted by tag PRE_MATT_MMAP_LOCK and POST_MATT_MMAP_LOCK
in kern/kern_lock.c kern/kern_subr.c
including alan, john, me, luoqi, and kirk
Submitted by: Matt Dillon <dillon@frebsd.org>
This change implements a relatively sophisticated fix to getnewbuf().
There were two problems with getnewbuf(). First, the writerecursion
can lead to a system stack overflow when you have NFS and/or VN
devices in the system. Second, the free/dirty buffer accounting was
completely broken. Not only did the nfs routines blow it trying to
manually account for the buffer state, but the accounting that was
done did not work well with the purpose of their existance: figuring
out when getnewbuf() needs to sleep.
The meat of the change is to kern/vfs_bio.c. The remaining diffs are
all minor except for NFS, which includes both the fixes for bp
interaction AND fixes for a 'biodone(): buffer already done' lockup.
Sys/buf.h also contains a chaining structure which is not used by
this patchset but is used by other patches that are coming soon.
This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF.
(sorry for the missing T matt)
was discarded on every call to calcru(). Hacking on the `switchtime'
global for a related fix in rev.1.38 of kern_resource.c was too fragile
and broke when p_switchtime went away.
PR: 10402
This code is backwards compatible with the older "microkernel" PLL, but
allows ntpd v4 to use nanosecond resolution. Many other improvements.
PPS_SYNC and hardpps() are NOT supported yet.
unregister them after sysuninits when unloading.
* Add code to vfs_register() to set the oid number of vfs sysctls to
the type number of the filesystem.
Reviewed by: bde
to manage their own memory. Tested on my machine (make buildworld).
I've made analogous changes on the alpha, but don't have a machine
to test.
Not-objected-to by: dg, gibbs
numbers as chars or use bogus casts in an attempt to unmisrepresnt
them. In top, don't assume that 0xff is the only negative cpu
number when cpu numbers are (mis)represented.
Higher numbers led to smaller quanta.
In discussion with BDE, change this parameter to be in uSecs
to make it machine independent,
and limit it to non zero multiples of 'tick' (rounding down).
Also make the variabel globally available so that the present function that
returns its value (used for posix scheduling I believe) can go away.
Submitted by: Bruce Evans <bde@freebsd.org>
This produced races resulting in panics and filesystem corruptions
under some circumstances.
Reviewed by: luoqi chen <luoqi@freebsd.org>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
Submitted by: Matt Dillon <dillon@freebsd.org>
not per-process. Keep it in `switchtime' consistently.
It is now clear that the timestamp is always valid in fork_trampoline()
except when the child is running on a previously idle cpu, which
can only happen if there are multiple cpus, so don't check or set
the timestamp in fork_trampoline except in the (i386) SMP case.
Just remove the alpha code for setting it unconditionally, since
there is no SMP case for alpha and the code had rotted.
Parts reviewed by: dfr, phk
Add d_parms() to {c,b}devsw[]. If non-NULL this function points to
a device routine that will properly fill in the specinfo structure.
vfs_subr.c's checkalias() supplies appropriate defaults. This change
should be fully backwards compatible with existing devices.
often for it to be a good criterion for switching kernel cpu hogs --
it is true after most wakeups. Use the criterion "has been running
for >= 2 quanta" instead.
in "src/sys/sys/param.h".
Fix the ELF image activator so that it can handle dynamic linkers
which are executables linked at a fixed address. This improves
compliance with the ABI spec, and it opens the door to possibly
better dynamic linker performance in the future. I've experimented
a bit with a fixed-address dynamic linker, and it works fine. But
I don't have any measurements yet to determine whether it's
worthwhile.
Also, remove a few calculations that were never used for anything.
I will increment __FreeBSD_version, since this adds a new capability
to the kernel that the dynamic linker might some day rely upon.
being loaded twice. It used rindex() to strip the pathname but failed
to account for the fact that rindex() will return a pointer to the '/',
not the first character of the filename.
Submitted by: Nick Hibma <hibma@skylink.it>
is the preparation step for moving pmap storage out of vmspace proper.
Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>
complaints about ps_refcnt greater than two when we try to fork() a
kthread from proc0 with RFSIGSHARE flag set.
Noticed by: Tor Egge <tegge@fast.no>
Reviewed by: Richard Seaman, Jr. <dick@tar.com>
This makes it possible to change the sysctl tree at runtime.
* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.
Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
peripheral drivers can determine where in the devstat(9) list they are
inserted.
This requires recompilation of libdevstat, systat, vmstat, rpc.rstatd, and
any ports that depend on the devstat code, since the size of the devstat
structure has changed. The devstat version number has been incremented as
well to reflect the change.
This sorts devices in the devstat list in "more interesting" to "less
interesting" order. So, for instance, da devices are now more important
than floppy drives, and so will appear before floppy drives in the default
output from systat, iostat, vmstat, etc.
The order of devices is, for now, kept in a central table in devicestat.h.
If individual drivers were able to make a meaningful decision on what
priority they should be at attach time, we could consider splitting the
priority information out into the various drivers. For now, though, they
have no way of knowing that, so it's easier to put them in an easy to find
table.
Also, move the checkversion() call in vmstat(8) to a more logical place.
Thanks to Bruce and David O'Brien for suggestions, for reviewing this, and
for putting up with the long time it has taken me to commit it. Bruce did
object somewhat to the central priority table (he would rather the
priorities be distributed in each driver), so his objection is duly noted
here.
Reviewed by: bde, obrien
to an architecture-specific value defined in <machine/elf.h>. This
solves problems on large-memory systems that have a high value for
MAXDSIZ.
The load address is controlled by a new macro ELF_RTLD_ADDR(vmspace).
On the i386 it is hard-wired to 0x08000000, which is the standard
SVR4 location for the dynamic linker.
On the Alpha, the dynamic linker is loaded MAXDSIZ bytes beyond
the start of the program's data segment. This is the same place
a userland mmap(0, ...) call would put it, so it ends up just below
all the shared libraries. The rationale behind the calculation is
that it allows room for the data segment to grow to its maximum
possible size.
These changes have been tested on the i386 for several months
without problems. They have been tested on the Alpha as well,
though not for nearly as long. I would like to merge the changes
into 3.1 within a week if no problems have surfaced as a result of
them.
attempt to optimize forks but were essentially given-up on due to
problems and replaced with an explicit dup of the vm_map_entry structure.
Prior to the removal, they were entirely unused.
to run Solaris executables (or executables from any other ELF system)
directly off the CD-ROM without having to waste megabytes of disk
by copying them to another filesystem just to brand them.
give the same behaviour produced before today. If sysadmin sets it
to a valid ELF brand, ELF image activator will attempt to run unbranded
ELF exectutables as if they were branded with that value.
Suggested by: Dima Ruban <dima@best.net>
process to sneak in and write to or close the pipe. The read code
enters a 'piperd' state after doing the lock operation without
checking to see if the state changed, which can cause the process
to wait forever.
The code has also been documented more.
completes, change if() to KASSERT(). This is not a bug, we are
simplify clarifying and optimizing the code.
In if/else in vfs_object_create(), the failure of both conditionals
will lead to a NULL object. Exit gracefully if this case occurs.
( this case does not normally occur, but needed to be handled ).
Obtained from: Eivind Eklund <eivind@FreeBSD.org>
code when examining their fix, which caused my code (in rev 1.52) to:
- panic("soaccept: !NOFDREF")
- fatal trap 12, with tracebacks going thru soclose and soaccept
can be hung.
Add a tunable delay at the beginning of the SHUTDOWN_FINAL at_shutdown
queue, allowing time to settle before we launch into the list of things
that are expected to turn the system off.
Fix a bug in at_shutdown_pri() where the second insertion always put
the item in second position in the queue.
Reviewed by: "D. Rock" <rock@cs.uni-sb.de>
terminating c_caddr_t with extreme prejudice. Here we depended
on the "opaque" type c_caddr_t being precisely `const char *'
to do unportable pointer arithmetic.
c_caddr_t with extreme prejudice. Here the original casts to
caddr_t were to support K&R compilers (or missing prototypes),
but the relevant source files require an ANSI compiler.
c_caddr_t with extreme prejudice. Here the point of the original
cast to caddr_t was to break the warning about the const mismatch
between write(2)'s `const void *buf' and `struct uio's `char
*iov_base' (previous bitrot gave a gratuitous dependency on caddr_t
being char *). Compiling with -Wcast-qual made the cast a full
no-op.
This change has no effect on the warning for discarding `const'
on assignment to iov_base. The warning should not be fixed by
splitting `struct iovec' into a non-const version for read()
and a const version for write(), since correct const poisoning
would affect all pointers to i/o addresses. Const'ness should
probably be forgotten by not declaring it in syscalls.master.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
directory containing rc.conf.local and rc.local, and possibly other
things in the future.
This sysctl is used by the diskless startup code and new rc.conf. If
it cannot be found or is empty, the system should revert to using /etc.
where select(2) can return that a listening socket has a connected socket
queued, the connection is broken, and the user calls accept(2), which then
blocks because there are no connections queued.
Reviewed by: wollman
Obtained from: NetBSD
(ftp://ftp.NetBSD.ORG/pub/NetBSD/misc/security/patches/19990120-accept)
buggy for fifos, and no one seems to have investigated its behaviour
on other types of files. It has been broken since the Lite2 merge
in rev.1.54.
Nagged about by: Brian Feldman (green@unixhelp.org)
patch. lf can't be dereferenced after the unload attempt, in case it
was freed. Instead, decrement first and back it out if the unload failed.
This should be relatively immune to races caused by the user since the
userref count will be zero for the duration of the actual unloading and
will stop further kldunload attempts.
Submitted by: Ustimenko Semen <semen@iclub.nsu.ru>
the buffer as still being dirty. This isn't a perfect solution, but
throwing away the buffer contents will often result in filesystem
corruption and this solution will at least correctly deal with transient
errors.
Submitted by: Kirk McKusick <mckusick@mckusick.com>
B_DELWRI and B_CACHE flags, fixing a bug that showed up with NFS.
Also, a number of cases where manually inserted code has been removed
and replaced with an inline function call giving us better functional
isolation in the source.
descriptor-passing messages was calling sorflush() without checking
to see if the descriptor was actually a socket. This can cause a
crash by exiting programs that use the mechanism under certain
circumstances.
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
flag means that there is more data to be put into the socket buffer.
Use it in TCP to reduce the interaction between mbuf sizes and the
Nagle algorithm.
Based on: "Justin C. Walker" <justin@apple.com>'s description of Apple's
fix for this problem.
lock, and add some macros and function parameters to make sure that
the information get to the point where it can be put in the lock
structure.
While I'm here, add DEBUG_VFS_LOCKS to LINT.
Fix NFS file corruption problem introduced in 1.188. The valid range
was not being set properly, causing a later reference to the buffer
to clear the B_CACHE bit.
- Have the VFS lkm support use vfs_register() etc rather than having it's
own version.
- Have the syscall lkm support use syscall_register() etc rather than
having it's own verison.
- Convert the lkm driver to a module.
suggestions from Greg Lehey some time ago. In the face of multiple
potential file formats, try and give a more sensible error than just
ENOEXEC.
XXX a good case can be made that the loading process is wrong - the linker
should locate the file first (using the search paths etc), then run the
loaders to see if they recognize it. While the present system allows for
the possibility of different search paths for different formats, we do not
use it and it just makes things more complicated than they need to be.
* A function device_printf() to make pretty-printing driver messages easier.
* A function device_get_children() to query the children of a device.
* Generic implementations of BUS_ALLOC_RESOURCE and BUS_RELEASE_RESOURCE.
* Change bus_generic_print_child() so that it is actually useful.
It was nay'ed before committing on the grounds that this is not
the way to do it, and has been decided as such several times in
the past.
There is not point in loading gobs of ascii into the kernel when
the only use of that ascii is presentation to the user.
Next thing we'd be adding all section 4 man pages to the loaded
kernel as well.
The argument about KLD's is bogus, klds can store a file in
/usr/share/doc/sysctl/dev/foo/thisvar.txt with a description and
sysctl or other facilities can pick it up there.
Proper documentation will take several K worth of text for many
sysctl variables, we don't want that in the kernel under any
circumstances.
I will welcome any well thought out attempt at improving the
situation wrt. sysctl documentation, but this wasn't it.
shared signal handling when there is shared signal handling being
used.
This removes the main objection to making the shared signal handling
a standard ability in rfork() and friends and 'unconditionalising'
this code. (i.e. the allocation of an extra 328 bytes per process).
Signal handling information remains in the U area until such a time as
it's reference count would be incremented to > 1. At that point a new
struct is malloc'd and maintained in KVM so that it can be shared between
the processes (threads) using it.
A function to check the reference count and move the struct back to the U
area when it drops back to 1 is also supplied. Signal information is
therefore now swapable for all processes that are not sharing that
information with other processes. THis should addres the concerns raised
by Garrett and others.
Submitted by: "Richard Seaman, Jr." <dick@tar.com>
from sc, vt and sio drivers. Use instead a linker_set to collect them.
Staticize ??cngetc(), ??cnputc(), etc functions in sc and vt drivers.
We must still have siocngetc() and siocnputc() as globals because they
are directly referred to by i386-gdbstub.c :-(
Oked by: bde
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.
The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.
Submitted by: Richard Seaman <dick@tar.com>
* Move the user stack from VM_MAXUSER_ADDRESS to a place below the 32bit
boundary (needed to support 32bit OSF programs). This should also save
one pagetable per process.
* Add cvtqlsv to the set of instructions handled by the floating point
software completion code.
* Disable all floating point exceptions by default.
* A minor change to execve to allow the OSF1 image activator to support
dynamic loading.
There's something that's been bugging me for a while, so I decided to fix it.
FreeBSD now will DTRT WRT DDB and DDB_UNATTENDED (!debugger_on_panic), at least
in my opinion. The behavior change is such that:
1. Nothing changes when debugger_on_panic != 0.
2. When DDB_UNATTENDED (!debugger_on_panic), if a panic occurs, the
machine will reboot. Also, if a trap occurs, the machine will
panic and reboot, unlike how it broke to DDB before. HOWEVER,
a trap inside DDB will not cause a panic, allowing full use
of DDB without having to worry about the machine being stuck
at a DDB prompt if something goes wrong during the day.
Patches for this behavior follow my signature, and it would
be a boon to anyone (like me) who uses DDB_UNATTENDED, but
actually wants the machine to panic on a trap (otherwise,
what's the use, if the machine causes a fatal trap rather than
a true panic, of debugger_on_panic?). The changes cause no
adverse behavior, but do involve two symbols becoming global
Submitted by: Brian Feldman <green@unixhelp.org>
last cleanup. Since the oid_arg2 field of struct sysctl_oid is not wide
enough to hold a long, the SYSCTL_LONG() macro has been modified to only
support exporting long variables by pointer instead of by value.
Reviewed by: bde
merge). This fixes at least hanging in revoke(2) when a somewhat
active slave pty is revoked. The hang made the window for the
null pointer bug in ufsspec_{read,write} much larger.
There are many other bugs in this area (revoke of an active fifo
at best leaks memory...).
object are not page aligned). This should fix the mount_msdos panic after a
failed attemp to mount as ffs.
Reviewed By: Matthew Dillon <dillon@apollo.backplane.com>
Archie Cobbs <archie@whistle.com>
Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>
there does not seem to be a problem with this.
PR: kern/8732
Analysis by: David G Andersen <danderse@cs.utah.edu>
Tested by: Alfred Perlstein <bright@hotjobs.com>
Submitted by: "Richard Seaman, Jr." <lists@tar.com>
Obtained from: linux :-)
Code to allow Linux Threads to run under FreeBSD.
By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.
adjusted related casts to match (only in the kernel in this commit).
The pointer was only wanted in one place in kern_exec.c. Applications
should use the kern.ps_strings sysctl instead of PS_STRINGS, so they
shouldn't notice this change.
across the kernel -> application interface, and for the one sysctl where
they were passed and actually used (kern.ps_strings), the applications
want addresses represented as u_longs anyway (the other sysctl that
passed them, kern.usrstack, has never been used).
Agreed to by: dfr, phk
Obtained from: Stephen Clawson <sclawson@cs.utah.edu>
Wakeup anyone waiting on a mount point prior to returning from umount,
whether an error occurs or not. Fixes a stat/NFS-umount race and other
potential future problems. Fix taken from bug/pr which also indicated
that the same fix has already been applied to OpenBSD and NetBSD.
This is odd, especially in the case of USB where the driver is found
in several tries: vendor specific, class specific, interface specific.
The mouse driver is found at the interface specific level...
Reviewed by: Doug Rabson (dfr@freebsd.org)
0. This makes it difficult to do efficient manipulation of the
struct pollfd since you can't leave a slot empty.
PR: 8599
Submitted-by: Marc Slemko <marcs@znep.com>
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>
problem is worked around by using an interrupt gate for the page
fault handler. This code was originally made for NetBSD/pc98 by
Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp> and has already
been in PC98 tree. Because of this bug, trap_fatal cannot show
correct page fault address if %cr2 is obtained in this function.
Therefore, trap_fatal uses the value from trap() function.
- The trap handler always enables interruption when buggy application
or kernel code has disabled interrupts and then trapped. This code
was prepared by Bruce Evans <bde@FreeBSD.org>.
Submitted by: Bruce Evans <bde@FreeBSD.org>
Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp>