to `register_t *'. This fixes bugs like misplacement of argc and argv
on the user stack on i386's with 64-bit longs. We still use longs to
represent "words" like argc and argv, and assume that they are on the
stack (and that there is stack). The suword() and fuword() families
should also use register_t.
register_t, so pointers to it must be passed around as `register_t *',
not as `int *'. The type mismatches were non-benign on alphas, but
the broken code is normally only configured by LINT.
fixee incoherency of pipe timestamps relative to file timestamps in
the usual case where getnanotime() is not used for the latter. (File
and pipe timestamps are still incoherent relative to real time unless
the vfs_timestamp_precision sysctl is set to 2 or 3).
NFSSERVER defined, useful for userland fileservers that want to
use a filehandle type interface to the filesystem.
Submitted by: Assar Westerlund assar@stacken.kth.se
PR: kern/15452
stressful situations. buf_daemon now makes a distinction between
being woken up and its sleep timing out, and as a consequence is now
much better able to dynamically tune itself to its environment.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
differentiate between one of three different scenarios:
1. No init.
2. Path to init munged by an incorrect loader configuration.
3. Root file system not mounted.
Reviewed-by: billf
The variables "m_mclalloc_wid" and "m_mballoc_wid" were not in the
proper place. They should have been in uipc_mbuf.c and have been global,
not in mbuf.h and local per each file that uses mbuf.h.
Sorta bug fix:
In mbuf.h, the definitions of various things for KERNEL and not
KERNEL cases were very screwy. This fixes all of that which I could
find.
1. Data written beyond end of pipe buffer, causing kernel memory corruption.
- Check that space is still valid after obtaining the pipe lock.
- Defer the calculation of transfer size until the pipe
lock has been obtained.
- Update the pipe buffer pointers while holding the pipe lock.
2. Writes of size <= PIPE_BUF not always atomic.
- Allow an internal write to span two contiguous segments,
so writes of size <= PIPE_BUF can be kept atomic
when wrapping around from the end to the start of the
pipe buffer.
PR: 15235
Reviewed by: Matt Dillon <dillon@FreeBSD.org>
the kernel while the vnode_if.h header is a bunch of inlines to call the
code that is in the kernel. Generating the .h file on the fly is kinda
bogus because it has to match the one compiled into the kernel.
IMHO we should have kern/vnode_if.c and sys/vnode_if.h committed in the
tree but that's another battle.
means that running out of mbuf space isn't a panic anymore, and code
which runs out of network memory will sleep to wait for it.
Submitted by: Bosko Milekic <bmilekic@dsuper.net>
Reviewed by: green, wollman
madvise().
This feature prevents the update daemon from gratuitously flushing
dirty pages associated with a mapped file-backed region of memory. The
system pager will still page the memory as necessary and the VM system
will still be fully coherent with the filesystem. Modifications made
by other means to the same area of memory, for example by write(), are
unaffected. The feature works on a page-granularity basis.
MAP_NOSYNC allows one to use mmap() to share memory between processes
without incuring any significant filesystem overhead, putting it in
the same performance category as SysV Shared memory and anonymous memory.
Reviewed by: julian, alc, dg
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
return value: LK_EXCLOTHER, when the lock is held exclusively by another
process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
locked/unlocked.
This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.
Discussed with: grog, mch, peter, phk
Reviewed by: peter
adequate for the IDE disks that I have available for testing. Most seem
to wait between 1 and 3 seconds before flushing their caches.
Add the ability to override the delay at compile time via the
undocumented option POWEROFF_DELAY. The delay can still be set via
sysctl as it was originally implemented.
boots I try in vain to remember which month or even year this system
was last booted in.
Print out the uptime before rebooting, and give people like me
less (or more as it may be) to think about while the systems boots.
some time ago that changes kern.randompid from a boolean to a randomness
range for the next pid assigment. Too high causes a lot of extra work
to scan for free pids, and too low merely wastes randomness entropy. It's
still possible to select a completely random range by using PID_MAX (100k)
or -1 as a shortcut to mean "the whole range".
Also, don't waste randomness when doing a wraparound.
device_add_child_ordered(). 'ivars' may now be set using the
device_set_ivars() function.
This makes it easier for us to change how arbitrary data structures are
associated with a device_t. Eventually we won't be modifying device_t
to add additional pointers for ivars, softc data etc.
Despite my best efforts I've probably forgotten something so let me know
if this breaks anything. I've been running with this change for months
and its been quite involved actually isolating all the changes from
the rest of the local changes in my tree.
Reviewed by: peter, dfr
because in the case of mbuf clusters they only increment the reference
count rather than actually copying the data.
Add comments to this effect, and add a new routine called m_dup() that
returns a real, writable copy of an mbuf chain.
This is preliminary work required for implementing 'ipfw tee'.
Reviewed by: julian
drops the counting in bwrite and puts it all in spec_strategy.
I did some tests and verified that the counts collected for writes
in spec_strategy is identical to the counts that we previously
collected in bwrite. We now also get read counts (async reads
come from requests for read-ahead blocks). Note that you need
to compile a new version of mount to get the read counts printed
out. The old mount binary is completely compatible, the only
reason to install a new mount is to get the read counts printed.
Submitted by: Craig A Soules <soules+@andrew.cmu.edu>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
Hopefully this clears up some confusion about the nature of
devclass_get_softc() vs. device_get_softc() as well.
The check against DS_ATTACHED remains as this is not
a change that modifies functionality.
Reviewed by: Peter "in principle" Wemm
what it is.
Be more correct in unbusying the mountpoint (especially before freeing it).
Remove support for mounting 'r' devices as root. You don't mount 'r'
devices anywhere else, and they're going away anyway.
Submitted by: bde
(kern.randompid), which is currently defaulted off. Use ARC4 (RC4) for our
random number generation, which will not get me executed for violating
crypto laws; a Good Thing(tm).
Reviewed and Approved by: bde, imp
commit to kern_synch.c:
----------------------------
revision 1.55
date: 1999/02/23 02:56:03; author: ross; state: Exp; lines: +39 -10
Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code
=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)
=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.
=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.
=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
----------------------------
The details are a little different in FreeBSD:
=== nice bug === Fixing this is the main point of this commit. We use
essentially the same clipping rule as NetBSD (our limit on p_estcpu
differs by a scale factor). However, clipping at all is fundamentally
bad. It gives free CPU the hoggiest hogs once they reach the limit, and
reaching the limit is normal for long-running hogs. This will be fixed
later.
=== New schedclk() mechanism === We don't use the NetBSD schedclk()
(now schedclock()) mechanism. We require (real)stathz to be about 128
and scale by an extra factor of 2 compared with NetBSD's statclock().
We scale p_estcpu instead of scaling the clock. This is more accurate
and flexible.
=== Algorithm change === Same change.
=== Other bugs === The p_pctcpu bug was fixed long ago. We don't try as
hard to abstract functionality yet.
Related changes: the new limit on p_estcpu must be exported to kern_exit.c
for clipping in wait1().
Agreed with by: dufault
commit to kern_synch.c:
----------------------------
revision 1.55
date: 1999/02/23 02:56:03; author: ross; state: Exp; lines: +39 -10
Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code
=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)
=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.
=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.
=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
----------------------------
The details are a little different in FreeBSD:
=== nice bug === Fixing this is the main point of this commit. We use
essentially the same clipping rule as NetBSD (our limit on p_estcpu
differs by a scale factor). However, clipping at all is fundamentally
bad. It gives free CPU the hoggiest hogs once they reach the limit, and
reaching the limit is normal for long-running hogs. This will be fixed
later.
=== New schedclk() mechanism === We don't use the NetBSD schedclk()
(now schedclock()) mechanism. We require (real)stathz to be about 128
and scale by an extra factor of 2 compared with NetBSD's statclock().
We scale p_estcpu instead of scaling the clock. This is more accurate
and flexible.
=== Algorithm change === Same change.
=== Other bugs === The p_pctcpu bug was fixed long ago. We don't try as
hard to abstract functionality yet.
Related changes: the new limit on p_estcpu must be exported to kern_exit.c
for clipping in wait1().
Agreed with by: dufault
and extend. The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).
Agreed with in principle by: dufault
Add MD_ROOT and MD_ROOT_SIZE options to the md driver.
Make the md driver handle MFS_ROOT and MFS_ROOT_SIZE options for compatibility.
Add md driver to GENERIC, PCCARD and LINT.
This is a cleanup which removes the need for some of the worse hacks in
MFS: We really want to have a rootvnode but MFS on a preloaded image
doesn't really have one. md is a true device, so it is less trouble.
This has been tested with make release, and if people remember to add
the "md" pseudo-device to their kernels, PicoBSD should be just fine
as well. If people have no other use for MFS, it can be removed from
the kernel.
with NetBSD and the Single Unix Specification v2.
This updates some structures with other, almost equivalent types and
effort is under way to get the whole more consistent.
Also removes a double definition of INET6 and some other clean-ups.
Reviewed by: green, bde, phk
Some part obtained from: NetBSD, SUSv2 specification
parameter a char ** instead of a const char **. This make these
kernel routines consistent with the corresponding libc userland
routines.
Which is actually 'correct' is debatable, but consistency and
following the spec was deemed more important in this case.
Reviewed by (in concept): phk, bde
for IPv6 yet)
With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
p_trespass(struct proc *p1, struct proc *p2)
which returns zero or an errno depending on the legality of p1 trespassing
on p2.
Replace kern_sig.c:CANSIGNAL() with call to p_trespass() and one
extra signal related check.
Replace procfs.h:CHECKIO() macros with calls to p_trespass().
Only show command lines to process which can trespass on the target
process.
I've made a seperate version (c_index() etc) that use const/const, but
I'm not sure it's worth it considering there is one file in the tree
that uses index on const strings (kern_linker.c) and it's easily adjusted
to scan the strings directly (and is perhaps more efficient that way).
linked list to store the callbak routines. The patch converts the
lists to queue(3) TAILQs, making the code slightly clearer and ensuring
that callbacks are executed in FIFO order.
Man page also updated as necesary.
(discontinued use of M_TEMP malloc type while here anyway /phk)
Submitted by: Jake Burkholder jake@checker.org
PR: 14912
returned to user mode in the spare fields of the stat structure.
PR: kern/14966
Reviewed by: dillon@freebsd.org
Submitted by: Kelly Yancey kbyanc@posi.net
This fixes some nasty procfs problems for SMP, makes ps(1) run much faster,
and makes ps(1) even less dependent on /proc which will aid chroot and
jails alike.
To disable this facility and revert to previous behaviour:
sysctl -w kern.ps_arg_cache_limit=0
For full details see the current@FreeBSD.org mail-archives.
Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.
Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914
Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.
This batch of changes compile to the same object files.
Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914
All Makefiles now use MACHINE_ARCH for the target architecture.
Unification is required for cross-building.
Tags added to:
sys/boot/Makefile
sys/boot/arc/loader/Makefile
sys/kern/Makefile
usr.bin/cpp/Makefile
usr.bin/gcore/Makefile
usr.bin/truss/Makefile
usr.bin/gcore/Makefile:
fixed typo: MACHINDE -> MACHINE_ARCH
Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.
Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code.
Unify spec_open() for bdev and cdev cases.
Remove the disabled bdev specific read/write code.