civilized way which doesn't cause grief.
The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.
Also, curthread may or may not have anything to do with the I/O request
at hand.
The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.
Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.
The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
the DT_PLTGOT value. On ia64 this is the value of GP. We need this
to construct function descriptors, but the elf file structure is
not exported to MD code.
Note that the name of the function is based on the meaning that
DT_PLTGOT has on ia64. This may differ on other architectures. As
such, link_elf_get_gp() has a high level of MD to it. Renaming the
function to describe what DT_* value is returned makes it generic,
but also makes the MD code less clear and if we only need this on
ia64, then a general name for a specific function doesn't help.
In short: I don't know what is "right" at this time, so I'll go
with what I have.
in various extattr_*() calls to match the rest of the file. Originally,
these bits at the end looked more like style(9). This patch was submitted
by green by way of the TrustedBSD MAC tree, and I fixed a few problems
with it on the way through. Someone with more time on their hands should
convert the entire file to style(9); this commit is for diff reduction
purposes.
Submitted by: green
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
constructing a struct aio and invoking VOP_READ() directly. This cleans
up the code a little, but also has the advantage of making sure almost
all vnode read/write access in the kernel goes through the helper
function, meaning that instrumentation of that helper function can impact
almost all relevant read/write operations. In this case, it permits us
to put MAC hooks into vn_rdwr() and not modify uipc_syscalls.c (yet).
In general, if helper vn_*() functions exist, they should be used in
preference to direct VOP's in system call service code.
Submitted by: green
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
needed in the current code, in the MAC tree, create_init() relies on the
ability to modify the credentials present for initproc, and should not
perform that modification on a shared credential. Pro-active diff
reduction against MAC changes that are in the queue; also facilitates
other work, including the capabilities implementation.
Submitted by: green
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
environment needed at boot time to a dynamic subsystem when VM is
up. The dynamic kernel environment is protected by an sx lock.
This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.
The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).
Reviewed by: peter
where some client operations might be unexpectedly cancelled during
an unsuccessful non-forced unmount attempt. This causes problems
for amd(8), because it periodically attempts a non-forced unmount
to check if the filesystem is still in use.
Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set
only during the operation of a forced unmount. Use this instead of
MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.
Also correct a problem where dounmount() might inadvertently clear
the MNTK_UNMOUNT flag.
Reported by: simokawa
MFC after: 1 week
- Use temporary variables to hold a pointer to a pgrp while we dink with it
while not holding either the associated proc lock or proctree_lock. It
is in theory possible that p->p_pgrp could change out from under us.
sx lock. Trying to get the lock order between these locks was getting
too complicated as the locking in wait1() was being fixed.
- leavepgrp() now requires an exclusive lock of proctree_lock to be held
when it is called.
- fixjobc() no longer gets a shared lock of proctree_lock now that it
requires an xlock be held by the caller.
- Locking notes in sys/proc.h are adjusted to note that everything that
used to be protected by the pgrpsess_lock is now protected by the
proctree_lock.
Apply the change as a continuous slew rather than as a series of
discrete steps and make it possible to adjust arbitraryly huge
amounts of time in either direction.
In practice this is done by hooking into the same once-per-second
loop as the NTP PLL and setting a suitable frequency offset deducting
the amount slewed from the remainder. If the remaining delta is
larger than 1 second we slew at 5000PPM (5msec/sec), for a delta
less than a second we slew at 500PPM (500usec/sec) and for the last
one second period we will slew at whatever rate (less than 500PPM)
it takes to eliminate the delta entirely.
The old implementation stepped the clock a number of microseconds
every HZ to acheive the same effect, using the same rates of change.
Eliminate the global variables tickadj, tickdelta and timedelta and
their various use and initializations.
This removes the most significant obstacle to running timecounter and
NTP housekeeping from a timeout rather than hardclock.
information related to bucket size effeciency. Three things are printed on
each row:
Size is the size the user actually asked for rounded to 16 bytes.
Requests is the number of times this size was asked for.
Real Size is the size we actually handed out.
At the end the total memory used and total waste is displayed. Currently my
system displays about 33% wasted memory.
The intent of this code is to gather statistics for tuning the malloc bucket
sizes. It is not intended to be run with INVARIANTS and it is not entirely
mp safe. It can be enabled via 'options MALLOC_PROFILE' which was commited
earlier.
Updated the kmemzones logic such that the ks_size bitmap can be used as an
index into it to report the size of the zone used.
Create the kern.malloc sysctl which replaces the kvm mechanism to report
similar data. This will provide an easy place for statistics aggregation if
malloc_type statistics become per cpu data.
Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing
the malloc buckets.
we can use td_ucred.
- In killpg1(), the proc lock is sufficient to check if p_stat is SZOMB
or not. We don't need sched_lock.
- Close some races in psignal(). In psignal() there is a big switch
statement based on p_stat. All the different cases are assuming that
the process (or thread) isn't going to change state out from under it.
To ensure this is true, just lock sched_lock for the entire switch. We
practically held it the entire time already anyways. This also
simplifies the locking somewhat and actually results in fewer lock
operations.
- Allow signotify() to be called with the sched_lock held since psignal()
now does that.
- Use td_ucred in a couple of places.
process so it can use td_ucred.
- Require the target process of donice() to be locked when donice() is
called.
- Use td_ucred.
- Lock the target process of p_cansee() and while reading the credentials
of a process.
- Change the logic of rtprio() slightly so it does it's copyin() if needed
prior to locking the target process.
- rtprio() no longer needs Giant. In theory with full KSE it would still
need Giant to protect p_ucred of curproc for the p_canfoo() functions
but p_canfoo() will be changing to using td_ucred of curthread before
full KSE hits the tree.
allocate a blank cred first, lock the process, perform checks on the
old process credential, copy the old process credential into the new
blank credential, modify the new credential, update the process
credential pointer, unlock the process, and cleanup rather than trying
to allocate a new credential after performing the checks on the old
credential.
- Cleanup _setugid() a little bit.
- setlogin() doesn't need Giant thanks to pgrp/session locking and
td_ucred.
and acquire the proctree_lock if needed first. Then we lock the process
if necessary and fiddle with it as appropriate. Finally we drop locks and
do any needed copyout's. This greatly simplifies the locking.
belong to a user virtual address; while this happens to work on some
architectures, it can't on sparc64, since user and kernel virtual
address spaces overlap there (the distinction between them is done via
separate address space identifiers).
Instead, look up the page in the vm_map of the process in question.
Reviewed by: jake
so it can use td_ucred.
- Push Giant down into the end of settime() where we actually set the time
on the timecounter and time of day clock.
- Remove Giant from clock_settime().
- Push Giant down in settimeofday() to just protect the 'tz' global
variable.
linker_search_module().
Without this, modules loaded from loader.conf that then try to load
in additional modules (such as digi.ko loading a card's BIOS) die
badly in the vn_open() called from linker_search_module().
It may be worth checking (KASSERTing?) that rootdev != NODEV in
vn_open() too.
mod_depend * (which may be NULL). The only consumer of this
function at the moment is digi_loadmoduledata(), and that passes
a NULL mod_depend *.
In linker_reference_module(), check to see if we've already got
the required module loaded. If we have, bump the reference count
and return that, otherwise continue the module search as normal.
is called.
- Change sysctl_out_proc() to require that the process is locked when it
is called and to drop the lock before it returns. If this proves too
complex we can change sysctl_out_proc() to simply acquire the lock at
the very end and have the calling code drop the lock right after it
returns.
- Lock the process we are going to export before the p_cansee() in the
loop in sysctl_kern_proc() and hold the lock until we call
sysctl_out_proc().
- Don't call p_cansee() on the process about to be exported twice in
the aforementioned loop.
p_pgrp since the pgrp locking went in. We also don't need it to check for
invalid values in the options argument to wait1(), so push Giant down
slightly.
behavior by default. Also, change the options line to reflect this.
If there are no problems reported this will become the only behavior and the
knob will be removed in a month or so.
Demanded by: obrien
separate strings instead of passing "foo=bar".
o Don't forget to clear the VMOUNT flag on the vnode when vfs_nmount()
fails because the fs doesn't implement VFS_NMOUNT (and in vfs_mount()
when the fs doesn't implement VFS_MOUNT) ; also decrement the vfs
refcount in the !MNT_UPDATE case.
that we can compile gcc. This is a hack because it adds a fixed 2MB to
each process's VSIZE regardless of how much is really being used since
there is no grow-up stack support. At least it isn't physical memory.
Sigh.
Add a sysctl to enable tweaking it for new processes.
a set of helper routines to deal with real-time clocks. The generic
functions access the clock diver using a kobj interface. This is intended
to reduce code reduplication and make it easy to support more than one
clock model on a single architecture.
This code is currently only used on sparc64, but it is planned to convert
the code of the other architectures to it later.
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
the generic lock type for use with witness. If this argument is NULL then
the lock name is used as the lock type. Add a macro for a lock type name
for network driver locks.
point to a more generic name for a lock that is more suitable for use by
witness when grouping locks. For example, although network driver locks
use the interface name for the name of each lock, they should all use the
same witness and be treated the same as witness. Another example is that
all UMA zone locks should be treated the same. The witness code has also
been updated to print out the lock type in addition to the lock name in a
few places where it is relevant.
This shrinks the size 4 bytes on alpha, down to the same 276 bytes
as all other platforms.
Construct a hack to make old ioctls work on new kernels.
Once world is recompiled only the new and correct sysctls will be
used.
This hack will become annoying around 1st of may to make people
rebuild their worlds and it will be gone before 5.0.
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.
Avoid locking in userret() in most of the remaining cases.
Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon
inline function sigsetmasked() and a new macro SIGPENDING(). CURSIG()
will soon be moved out of the normal path of execution for syscalls and
traps. Then its efficiency will be less important but the new interfaces
will be useful for checking for unmasked pending signals in more places.
Submitted by: luoqi (long ago, in a slightly different form)
Assert that sched_lock is not held in CURSIG().
We get enough protection from the lock on the individual lists that we
aquire later.
Noticed/Tested by: Steven G. Kargl <kargl@troutmask.apl.washington.edu>
Submitted by: Jonathan Mini <mini@haikugeek.com>
securelevel_*() to be NULL for a while now.
- Use KASSERT() instead of if (foo) panic(); to optimize the
!INVARIANTS case.
Submitted by: Martin Faxer <gmh003532@brfmasthugget.se>
without removing the buffer from the vnode's dirty buffer list, which
can result in a panic in NFS. Replaced the code with a call to bundirty()
which deals with it properly.
PR: kern/36108, kern/36174
Submitted by: various people
Special mention: to Danny Schales <dan@coes.LaTech.edu> for providing a core dump that helped me track this down.
MFC after: 1 day
even when the number of records approaches the size of the hash table.
Besides, the previous implementation (using linear probing) was broken :)
Also, use the newly introduced MTX_SYSINIT.
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
needed for mutexes such as setting up thread0's contested lock list
and initializing MI mutexes. Change the various MD startup routines
to call this function instead of duplicating all the code themselves.
Tested on: alpha, i386
locks to be able to setup a SYSINIT call. This helps in places where
a lock is needed to protect some data, but the data is not truly
associated with a subsystem that can properly initialize it's lock.
The macros use the mtx_sysinit() and sx_sysinit() functions,
respectively, as the handler argument to SYSINIT().
Reviewed by: alfred, jhb, smp@
release times. Measurements are made and stored in nanoseconds but
presented in microseconds, which should be sufficient for the locks for
which we actually want this (those that are held long and / or often).
Also, rename some variables and structure members to unit-agnostic names.