Commit Graph

6435 Commits

Author SHA1 Message Date
davidxu
450b9799ce unlock sched_lock at right time. 2003-04-27 04:32:40 +00:00
alc
d5ac0bc453 Various changes to vm_object_page_remove():
- Eliminate an odd, special-case feature:
   if start == end == 0 then all pages are removed.  Only one caller
   used this feature and that caller can trivially pass the object's
   size.
 - Assert that the vm_object is locked on entry; don't bother testing
   for a NULL vm_object.
 - Style: Fix lines that are longer than 80 characters.
2003-04-26 23:41:30 +00:00
alc
4f5c780f99 - Lock the vm_object on entry to vm_object_terminate(). 2003-04-26 19:36:19 +00:00
alc
373b18b5c3 - Convert vm_object_pip_wait() from using tsleep() to msleep().
- Make vm_object_pip_sleep() static.
 - Lock the vm_object when performing vm_object_pip_wait().
2003-04-26 18:33:18 +00:00
alc
3db24e0c46 - Lock the vm_object when performing vm_page_alloc() in allocbuf(). 2003-04-26 07:42:24 +00:00
phk
773e071682 Update the "last malloc failure timestamp" also for simulated
malloc errors.
2003-04-25 21:49:24 +00:00
jhb
942b3c3db7 Remove Giant from getpgid() and getsid() and tweak the logic to more
closely match that of 4.x.
2003-04-25 20:09:31 +00:00
jhb
db5f78d397 Push down Giant around calls to proc_rwmem() in kern_ptrace. kern_ptrace()
should now be MP safe.
2003-04-25 20:02:16 +00:00
jhb
57c0e7ab21 Push Giant down into kern_sigaction() instead of locking it around calls
to kern_sigaction() in the various callers of the function.
2003-04-25 20:01:19 +00:00
jhb
b3c19f6ec9 - Push down Giant around vnode operations in ktrace().
- Mark the ktrace() and utrace() syscalls as being MP safe.
- Validate the facs argument to ktrace() prior to doing any vnode
  operations or acquiring any locks.
- Share lock the proctree lock over the entire section that calls
  ktrsetchildren() and ktrops().  We already did this for process groups.
  Doing it for the process case closes a small race where a process might
  go away after we look it up.  As a result of this, ktrstchildren() now
  just asserts that the proctree lock is locked rather than acquiring the
  lock itself.
- Add some missing comments to #else and #endif.
2003-04-25 19:59:35 +00:00
deischen
3d51b3a280 Add an argument to get_mcontext() which specified whether the
syscall return values should be cleared.  The system calls
getcontext() and swapcontext() want to return 0 on success
but these contexts can be switched to at a later time so
the return values need to be cleared in the saved register
sets.  Other callers of get_mcontext() would normally want
the context without clearing the return values.

Remove the i386-specific context saving from the KSE code.
get_mcontext() is not i386-specific any more.

Fix a bad pointer in the alpha get_mcontext() code.  The
context was being bcopy()'d from &td->tf_frame, but tf_frame
is itself a pointer, so the thread was being copied instead.
Spotted by jake.

Glanced at by:  jake
Reviewed by:    bde (months ago)
2003-04-25 01:50:30 +00:00
tjr
5464273ad7 Include altkstack pages in the RSS regardless of whether the process
is swapped out. Pointed out by jhb.
2003-04-25 00:20:40 +00:00
des
4e35cc9041 It seems that 1 was not a magic value as I thought, but a coincidence.
Instead of applying the adjustment to processes with a start time of 1,
apply it to all processes with a start time of less than 3600.

None of this would be necessary if the start times were recorded in ticks
instead of seconds and microseconds.
2003-04-24 12:12:06 +00:00
tjr
bf8bb3cbd9 Do a better job of calculating the RSS for swapped-out processes:
don't include the kernel stacks of swapped-out threads in the page count,
but do include the alternate kernel stack. jhb provided some helpful
comments on this.

PR:		49102
2003-04-24 11:03:04 +00:00
tjr
2b308e25a0 Free mount credentials (mnt_cred) when freeing the mount struct
in failure cases to avoid leaking struct ucreds, and ultimately
leaking struct uidinfo references.
2003-04-24 08:16:06 +00:00
alc
87da2c3cf3 - Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
   whether its caller has locked the vm_object.  (This is a temporary
   measure to bootstrap vm_object locking.)
2003-04-24 04:31:25 +00:00
des
51f01cb3f7 When filling out a kinfo_proc structure, if we come across a process
whose p_stats->p_start has the magic value 1, replace it with boottime.
Some users were apparently confused by the fact that ps(1) reported a
start time in early 1970 for system processes.
2003-04-24 03:37:59 +00:00
jhb
9b55ca02a0 Remove Giant from osigblock(), osigsetmask(), and kern_sigaltstack(). 2003-04-23 19:49:18 +00:00
jhb
2c416d197d The signotify() sanity check in userret() doesn't need Giant anymore. 2003-04-23 18:51:55 +00:00
jhb
26097c18e1 Add lock assertions for various proc/thread/kse/ksegroup fields to the
scheduler functions.
2003-04-23 18:51:05 +00:00
jhb
89c52cff2e - Reorganize osigstack() to do the copyin first, grab the proc lock once,
do all the various sigstack dances, unlock the proc lock, and finally do
  the copyout.  This more closely resembles the behavior of
  kern_sigaltstack() and closes a small race.
- Remove Giant from osigstack as it is no longer needed.
2003-04-23 18:50:25 +00:00
jhb
2958cf621b Remove Giant from [gs]etpriority(). 2003-04-23 18:48:55 +00:00
jhb
a0bf3a3e6f - Protect p_numthreads with the sched_lock.
- Protect p_singlethread with both the sched_lock and the proc lock.
- Protect p_suspcount with the proc lock.
2003-04-23 18:46:51 +00:00
obrien
590e10e4ac Add /dev to the Alpha manual mount root example. 2003-04-23 05:02:40 +00:00
jhb
128ae3c8d8 - Move PS_PROFIL and its new cousin PS_STOPPROF back over to p_flag and
rename them appropriately.  Protect both flags with both the proc lock
  and the sched_lock.
- Protect p_profthreads with the proc lock.
- Remove Giant from profil(2).
2003-04-22 20:54:04 +00:00
jhb
41837c0a14 - Assert that the proc lock and sched_lock are held in sched_nice().
- For the 4BSD scheduler, this means that all callers of the static
  function resetpriority() now always hold sched_lock, so don't lock
  sched_lock explicitly in that function.
2003-04-22 20:50:38 +00:00
jhb
ced60d737a Lock both the proc lock and sched_lock when calling sched_nice since
kg_nice is now protected by both.  Being protected by both means that
other places in the kernel that want to read kg_nice only need one of the
two locks.
2003-04-22 20:45:38 +00:00
jhb
d5cf4c5275 Prefer the proc lock to sched_lock when testing PS_INMEM now that it is
safe to do so.
2003-04-22 20:01:56 +00:00
jhb
8c172a3498 Protect p_swtime with the sched_lock. 2003-04-22 19:48:25 +00:00
jhb
cfedd4c7d6 - Mark the kse_purge_group() and kse_purge() definitions static to match
their prototypes.
- Remove sched_lock locking from kse_purge() as all callers already lock
  the sched_lock before calling it.
- Hold the proc lock slightly longer to protect P_SHOULDSTOP().
2003-04-22 19:47:55 +00:00
imp
fb873e6edf Create a new function, device_is_attached(), that is like
device_is_alive() that tells us if the device has successfully
attached.  device_is_alive just tells us that the device has
successfully probed.
2003-04-21 18:19:08 +00:00
davidxu
d5ff3e991d Fix lock order reversal problem. 2003-04-21 14:42:04 +00:00
davidxu
7e0ecb5345 Introduce two flags to control upcall behaviour:
o KMF_NOUPCALL
	Ask kse_release to not return to userland upcall entry, but instead
	direct returns to userland by using current thread's stack and return
	address on stack. This flags is intended to be used by UTS in critical
	region to wait another UTS thread to leave critical region, by using
	kse_release with this flag to avoid spinnng and burning CPU. Also this
	flags can be used by UTS to poll completed context when there is nothing
	to do in userland and needn't restart from its entry like normal upcall.

o KMF_NOCOMPLETED
	Ask kernel to not bring completed thread contexts back to userland when
	doing upcall, this flags is intend to be used with above flag when an
	upcall thread is in critical region and can not process completed contexts
	at that time.

Tested by: deischen
2003-04-21 07:27:59 +00:00
imp
cf3cf85267 Fix /dev/devctl's implementation of poll. We should only be setting
the poll bits when there's actually something in the queue.
Otherwise, select always returned '2' when there were no items to be
read, and '3' when there were.  This would preclude being able to read
in a threaded (libc_r) program, as well as checking to see if there
were pending events or not.
2003-04-21 05:58:51 +00:00
alc
e7c8e4e470 - Lock the vm_object when performing vm_object_pip_add(). 2003-04-20 07:29:50 +00:00
alc
c9e51c9b11 Lock the vm_object in vfs_busy_pages(). 2003-04-20 00:17:05 +00:00
alc
dc48d3db81 - Lock the vm_object when performing vm_object_pip_subtract().
- Assert that the vm_object lock is held in vm_object_pip_subtract().
2003-04-19 22:11:41 +00:00
alc
ef4e8a19cf - Lock the vm_object when performing vm_object_pip_wakeupn().
- Assert that the vm_object lock is held in vm_object_pip_wakeupn().
 - Add a new macro VM_OBJECT_LOCK_ASSERT().
2003-04-19 21:15:44 +00:00
alc
d558a7a53b Lock the jumbo_vm_object when performing vm_page_alloc(). 2003-04-19 19:13:25 +00:00
davidxu
a10a41ca38 Test next upcall time correctly. 2003-04-19 06:16:04 +00:00
davidxu
28038e92fe Unbreak sigaltstack syscall. sigonstack is now a function and
want proc lock be held.
2003-04-19 05:04:06 +00:00
davidxu
8ef415ed06 Use correct thread pointer. 2003-04-19 04:39:10 +00:00
jhb
801acfe1d4 - Make sigonstack() a regular function instead of an inline and add a proc
lock assertion to it.
- SIGPENDING() no longer needs sched_lock, so only grab sched_lock to set
  the TDF_NEEDSIGCHK and TDF_ASTPENDING flags in signotify().
- Add a proc lock assertion to tdsigwakeup().
- Since we always set TDF_OLDMASK while holding the proc lock, the proc
  lock is sufficient protection to check its state in postsig() and we only
  need sched_lock when clearing the actual flag.
2003-04-18 20:59:05 +00:00
jhb
8b7a3b47d1 Use the proc lock to protect p_singlethread and a P_WEXIT test. This
fixes a couple of potential KSE panics on non-i386 arch's that weren't
holding the proc lock when calling thread_exit().
2003-04-18 20:20:00 +00:00
jhb
fa6200c9ec Rename do_sigprocmask() to kern_sigprocmask() and make it a global symbol
so that it can be used by binary emulators.
2003-04-18 20:18:44 +00:00
jhb
de4c9711d0 Add a couple of sched_lock asserts. 2003-04-18 20:17:47 +00:00
jhb
f043193969 - Add a static function pgadjustjobc() to adjust the job control count for
a process group.
- Call pgadjustjobc() twice in fixjobc() to avoid code duplication and
  improve readability.
- Use the proc lock to protect P_SHOULDSTOP() instead of sched_lock.
- Check to see if a process is PRS_NEW with sched_lock before trying to
  lock its proc lock since the lock may not be constructed yet.
2003-04-18 20:17:05 +00:00
rwatson
9abedb6965 Update NAI copyright to 2003, missed in earlier commits and merges. 2003-04-18 19:57:37 +00:00
alc
83fe46be18 Update locking around vm_object_page_remove() to use the new macros. 2003-04-18 16:39:03 +00:00
jeff
556bb64555 - Set the ke_cpu field in sched_add() for interrupt and realtime threads
since they are going on the current cpu and not their previously assigned
   cpu.
 - sched_runnable() should only return true in the SMP case if the other
   processor has more than one thread that is runnable.  We can not steal
   curthread.
 - Change kseq_print() to accept the cpuid instead of a kseq pointer.  This
   makes use of this function in ddb much easier.
2003-04-18 05:24:10 +00:00
julian
0e096a3dd1 Add a thread_unlink() and use it.
It could also be used twice in kern_thr.c but that's owned by jeff
so I'l let him change it when he's next there.
2003-04-18 00:16:13 +00:00
jhb
e1dd224437 - kthread's don't have p_textvp set to anything, so replace code that
dealt with that possibility with a KASSERT().
- No need to set P_SYSTEM, kthread_create() does that for us.
2003-04-17 22:37:48 +00:00
jhb
05864a7334 - Use a local struct proc variable to improve readability.
- Use a local variable to close a minor race when determining if the wmesg
    printed out needs a prefix such as when a thread is blocked on a lock.
2003-04-17 22:36:40 +00:00
jhb
bffa90cc0a Tweak locking in the PS_XCPU handler to hold the sched_lock while reading
p_runtime.
2003-04-17 22:33:04 +00:00
jhb
5023bfe74a The sched_lock is not needed while clearing two of the P_STOPPED bits in
p_flag.  Also, the proc lock can't be recursed, so simplify an older proc
lock assertion.
2003-04-17 22:31:54 +00:00
jhb
872336ea36 Don't assume that p_session hasn't changed out from under us after unlocking
the process and session.  Instead, cache a true reference to the session
when we do the hold and release our reference on that session.  This avoids
the need for the proc lock when dropping the reference.
2003-04-17 22:30:43 +00:00
jhb
a5725b28f3 Lock the sched_lock while setting TDF_INPANIC. 2003-04-17 22:29:23 +00:00
jhb
2cdea9a30c Use TD_IS_RUNNING() instead of thread_running() in the adaptive mutex
code.
2003-04-17 22:28:58 +00:00
jhb
c94962975b fork1() already sets PS_INMEM, so don't set it again. This lets us push
sched_lock down slightly so that it isn't needed in the RFSTOPPED case.
2003-04-17 22:28:28 +00:00
jhb
08b81c369f - The prison mutex cannot possibly protect pointers to the prison it
protects, so don't bother locking it while we assign it to a ucred's
  cr_prison.
- Fully construct the new credential for a process before assigning it to
  p_ucred.
2003-04-17 22:26:53 +00:00
jhb
313b87d41a Add some locking in for a few proc and thread fields. 2003-04-17 22:25:35 +00:00
jhb
ab40c1468e - Push Giant down into the fork1() function a small bit.
- Set p_acflag earlier while already hold the proc lock in fork1().
- Mark the realitexpire() callout MPSAFE for new processes.  It was already
  marked safe for proc0 a long while ago.
2003-04-17 22:24:59 +00:00
jhb
b09e86b501 Adjust a few comments. 2003-04-17 22:22:47 +00:00
jhb
96015b90e0 Protect td_sigmask with the proc lock. 2003-04-17 22:21:57 +00:00
jhb
e517678fbb Test the P_WEXIT flag while already hold the proc lock instead of right
after dropping it.
2003-04-17 22:21:05 +00:00
jhb
2e488b055b Hold the proc lock across a wider range of fields that it protects. 2003-04-17 22:20:30 +00:00
jhb
5921ce0c8b Don't hold the proc lock while performing sigset conversions on local
variables.
2003-04-17 22:07:56 +00:00
jhb
4b2bc05ffe - Remove garbage SIGSETOR() that snuck into struct sigpending_args
definition.
- Use the proper constant for the last arg to kern_sigaction() in osigvec()
  instead of a magic value.
2003-04-17 22:06:43 +00:00
jhb
e7a906488e Use local struct proc variables to reduce repeated td->td_proc dereferences
and improve readability.
2003-04-17 22:02:47 +00:00
jhb
ac139f5914 Adjust a KTR trace to log thread state instead of proc state as that is
more relevant.
2003-04-17 22:01:01 +00:00
harti
cfd99881a1 Unbreak vinum, iostat and systat on sparc64 by changing the devstat
generation number back to a long (sizeof(u_int) != sizeof(long) on
sparc64). The alternative would have been to heavily change the libdevstat API.

Discussed with: phk, ken
2003-04-17 15:06:28 +00:00
phk
6dd4776ecc Don't include <sys/disklabel.h> 2003-04-16 20:57:35 +00:00
rwatson
ee95862054 mac_init_mbuf_tag() accepts malloc flags, not mbuf allocator flags, so
don't try and convert the argument flags to malloc flags, or we risk
implicitly requesting blocking and generating witness warnings.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-04-15 19:33:23 +00:00
silby
0c01d3cc3f Add another MBUF_STRESS_TEST feature, m_defragrandomfailures.
When enabled, this causes m_defrag to randomly return NULL (following
its normal failure case so that extra memory leaks are not introduced.)

Code similar to this was used to find / fix a few bugs last week.
2003-04-15 02:14:43 +00:00
rwatson
53a050aac3 Move MAC label storage for mbufs into m_tags from the m_pkthdr structure,
returning some additional room in the first mbuf in a chain, and
avoiding feature-specific contents in the mbuf header.  To do this:

- Modify mbuf_to_label() to extract the tag, returning NULL if not
  found.

- Introduce mac_init_mbuf_tag() which does most of the work
  mac_init_mbuf() used to do, except on an m_tag rather than an
  mbuf.

- Scale back mac_init_mbuf() to perform m_tag allocation and invoke
  mac_init_mbuf_tag().

- Replace mac_destroy_mbuf() with mac_destroy_mbuf_tag(), since
  m_tag's are now GC'd deep in the m_tag/mbuf code rather than
  at a higher level when mbufs are directly free()'d.

- Add mac_copy_mbuf_tag() to support m_copy_pkthdr() and related
  notions.

- Generally change all references to mbuf labels so that they use
  mbuf_to_label() rather than &mbuf->m_pkthdr.label.  This
  required no changes in the MAC policies (yay!).

- Tweak mbuf release routines to not call mac_destroy_mbuf(),
  tag destruction takes care of it for us now.

- Remove MAC magic from m_copy_pkthdr() and m_move_pkthdr() --
  the existing m_tag support does all this for us.  Note that
  we can no longer just zero the m_tag list on the target mbuf,
  rather, we have to delete the chain because m_tag's will
  already be hung off freshly allocated mbuf's.

- Tweak m_tag copying routines so that if we're copying a MAC
  m_tag, we don't do a binary copy, rather, we initialize the
  new storage and do a deep copy of the label.

- Remove use of MAC_FLAG_INITIALIZED in a few bizarre places
  having to do with mbuf header copies previously.

- When an mbuf is copied in ip_input(), we no longer need to
  explicitly copy the label because it will get handled by the
  m_tag code now.

- No longer any weird handling of MAC labels in if_loop.c during
  header copies.

- Add MPC_LOADTIME_FLAG_LABELMBUFS flag to Biba, MLS, mac_test.
  In mac_test, handle the label==NULL case, since it can be
  dynamically loaded.

In order to improve performance with this change, introduce the notion
of "lazy MAC label allocation" -- only allocate m_tag storage for MAC
labels if we're running with a policy that uses MAC labels on mbufs.
Policies declare this intent by setting the MPC_LOADTIME_FLAG_LABELMBUFS
flag in their load-time flags field during declaration.  Note: this
opens up the possibility of post-boot policy modules getting back NULL
slot entries even though they have policy invariants of non-NULL slot
entries, as the policy might have been loaded after the mbuf was
allocated, leaving the mbuf without label storage.  Policies that cannot
handle this case must be declared as NOTLATE, or must be modified.

- mac_labelmbufs holds the current cumulative status as to whether
  any policies require mbuf labeling or not.  This is updated whenever
  the active policy set changes by the function mac_policy_updateflags().
  The function iterates the list and checks whether any have the
  flag set.  Write access to this variable is protected by the policy
  list; read access is currently not protected for performance reasons.
  This might change if it causes problems.

- Add MAC_POLICY_LIST_ASSERT_EXCLUSIVE() to permit the flags update
  function to assert appropriate locks.

- This makes allocation in mac_init_mbuf() conditional on the flag.

Reviewed by:	sam
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-04-14 20:39:06 +00:00
rwatson
0cf8c3b34b Abstract access to the mbuf header label behind a new function,
mbuf_to_label().  This permits the vast majority of entry point code
to be unaware that labels are stored in m->m_pkthdr.label, such that
we can experiment storage of labels elsewhere (such as in m_tags).

Reviewed by:	sam
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-04-14 18:11:18 +00:00
rwatson
c48acd8ffd Use MBTOM() to convert mbuf allocator flags to malloc() flags, rather
than using the same compare/substitute in many places.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-04-14 16:04:10 +00:00
cognet
64f4176c9e Use while (*controlp != NULL) instead of do ... while (*control != NULL)
There are valid cases where *controlp will be NULL at this point.

Discussed with:	dwmalone
2003-04-14 14:44:36 +00:00
alc
a05b4b3347 Update locking on the kernel_object to use the new macros. 2003-04-14 00:36:53 +00:00
jake
0827d0138a Made vmspace0 non-static. Its useful to be able to identify a vmspace as
the kernel vmspace.
2003-04-13 21:29:11 +00:00
alc
eac23cf75a Lock some manipulations of the vm object's flags. 2003-04-13 19:36:18 +00:00
phk
bf262ea8a8 Since dynamic allocation of device major numbers so far have not
resulted in any earthquakes, civil wars or early onset hair-loss,
I think we can do without the printf announcing the assigned number.
2003-04-13 15:27:49 +00:00
alc
227f7746c4 Use vm_object_pip_wait() rather than reimplementing it. 2003-04-13 05:10:44 +00:00
jeff
7e8e873c9e - Unbreak priority prop. for timeshare threads. Always place something on
the current queue if its priority is really elevated.  This needs more work
   as there are cases where a next queue kse could be holding up what would
   be a curr queue kse, and thus hurting interactivity.  Also, when a thread
   with an elevated priority has its priority lowered it should be placed
   back on the next queue.
2003-04-12 22:33:24 +00:00
jeff
25eb78399b - Clean up some debug code left over from my earlier megacommit. 2003-04-12 07:28:36 +00:00
jeff
98c98fa0fc - We only care about the base priority. Ignore the SCHED_FIFO_BIT so that
we dont get confused.

Reported and debugged by:	Steve Kargl <sgk@troutmask.apl.washington.edu>
2003-04-12 07:00:16 +00:00
davidxu
d5e8438f32 Style fix. 2003-04-12 02:54:46 +00:00
kbyanc
d9072317cb Fix race between a process registering a NOTE_EXIT EVFILT_PROC event and
the target process exiting which causes attempts to register the kevent
to randomly fail depending on whether the target runs to completion before
the parent can call kevent(2).  The bug actually effects EVFILT_PROC
events on any zombie process, but the most common manifestation is with
parents trying to monitor child processes.

MFC after:	2 weeks
Sponsored by:	NTT Multimedia Communications Labs
2003-04-12 01:57:04 +00:00
davidxu
cb24fd3c57 Check SIG_HOLD action ealier to avoid missing test it in later code. 2003-04-12 00:38:47 +00:00
jeff
97ecb163cb - Call sched_exit_{kse,thread} and sched_fork{kse,thread} so that thr works
with ULE.  This was not strictly required by sched_4bsd.
2003-04-11 19:24:37 +00:00
jeff
b5bf29d735 - Add sched_exit_*
- Call sched_exit_kse() from sched_exit() instead of implementing it here.
2003-04-11 19:24:00 +00:00
jeff
62e212c14f - Only select kseqs with more than one kse to steal. The running kse
is reflected in the load now and you can't very well migrate that.
2003-04-11 18:40:34 +00:00
jeff
678516a39b - When migrating a kse from one kseq to the next actually insert it onto
the second kseq's run queue so that it is referenced by the kse when
   it is switched out.
 - Spell ksq_rslices properly.

Reported by:	Ian Freislich <ianf@za.uu.net>
2003-04-11 18:37:34 +00:00
alc
ae179e006f The data in an sf_buf should not be modified by the mbuf system. Mark
the mbuf as read only.

Reviewed by:	gallatin
2003-04-11 07:02:36 +00:00
jeff
d3e0edc523 - Add a SYSCTL node for the ule scheduler.
- Allow user adjustable min and max time slices (suggested by hiten).
 - Change the SLP_RUN_MAX to 100ms from 2 seconds so that we learn whether a
   process is interactive or not much more quickly.
 - Place a process on the current run queue if it is interactive or if it is
   running at an interrupt thread priority due to priority prop.
 - Use the 'current' timeshare queue for interrupt threads, realtime threads,
   and idle threads that are running at higher priority due to priority prop.
   This fixes problems where priorities would have been elevated but we would
   not check the timeshare run queue until other lower priority tasks were
   no longer runnable.
 - Keep an array of loads indexed by the priority class as well as a global
   load.
 - Keep an bucket of nice values with a count of the number of kses currently
   runnable with that nice value.
 - Keep track of the minimum nice value of any running thread.
 - Remove the unused short term sleep accounting.  I was attempting to use
   this for load balancing but it didn't work out.
 - Define a kseq_print() for use with debugging.
 - Add KTR debugging at useful places so we can easily debug slice and
   priority assignment.
 - Decouple the runq assignment from the kseq assignment.  kseq_add now keeps
   track of statistics.  This is done so that the nice and load is still
   tracked for the currently running process.  Previously if a niced process
   was added while a non nice process was running the niced process would
   still get a slice since it was not aware of the unnice process.
 - Make adjustments for the sched api changes.
2003-04-11 03:47:14 +00:00
jeff
b7c587c68a - Catch up with sched api changes. 2003-04-11 03:39:48 +00:00
jeff
a033a84006 - Adjust sched hooks for fork and exec to take processes as arguments instead
of ksegs since they primarily operation on processes.
 - KSEs take ticks so pass the kse through sched_clock().
 - Add a sched_class() routine that adjusts a ksegrp pri class.
 - Define a sched_fork_{kse,thread,ksegrp} and sched_exit_{kse,thread,ksegrp}
   that will be used to tell the scheduler about new instances of these
   structures within the same process.  These will be used by THR and KSE.
 - Change sched_4bsd to reflect this API update.
2003-04-11 03:39:07 +00:00
julian
6f175a0e20 Move the _oncpu entry from the KSE to the thread.
The entry in the KSE still exists but it's purpose will change a bit
when we add the ability to lock a KSE to a cpu.
2003-04-10 17:35:44 +00:00
mike
79d60009e2 Regen. 2003-04-09 02:57:29 +00:00
mike
75859ca578 o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
  fields (pr_path for reporting and pr_root vnode instance) to store
  the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
  vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
  to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
  struct xprison instances that represent a snapshot of active jails.

Reviewed by:	rwatson, tjr
2003-04-09 02:55:18 +00:00
alc
e1dab143cd Remove some dead code. 2003-04-08 18:24:28 +00:00
des
567ac2b268 Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header.  Use it instead of hand-
rolled versions wherever applicable.

Submitted by:	Hiten Pandya <hiten@unixdaemons.com>
2003-04-08 14:25:47 +00:00
jake
d3a6b05ded Merged from kern_thread.c 1.113, avoid a panic in cpu_throw when the first
thread of a multithreaded process exits.

This unrelated and possibly wrong change was not mentioned in the commit
message for kern_thread.c 1.113.
2003-04-08 08:13:47 +00:00
davidxu
bf5f76d431 Inherit blocked thread's context for upcall thread. 2003-04-08 07:45:56 +00:00
peter
099c5dc381 Search for "elf32 kernel" (and elf64) and "elf32 module" (and elf64)
as well as "elf kernel" and "elf module".  This is a precursor to
x86-64 support in the i386 loader so it can load an elf64 x86-64 kernel.
2003-04-06 05:20:00 +00:00
alc
e847e1c64f Remove an unnecessary trunc_page() from vmapbuf().
Reviewed by:	tegge
2003-04-06 00:40:54 +00:00
alc
663bf88111 Don't reinitialize fields that are already initialized by getpbuf(). 2003-04-05 23:02:58 +00:00
alc
c0badd1444 Sufficient access checks are performed by vmapbuf() that calling
useracc() is pointless.  Remove the call to useracc() from physio().

Reviewed by:	tegge
2003-04-05 21:19:58 +00:00
alc
3c03fd9f54 o Remove useracc() calls from aio_qphysio(); they are redundant
given the checks performed by vmapbuf().

Reviewed by:	tegge
2003-04-04 06:26:28 +00:00
alc
cbd6318ffd o Check the b_bufsize passed to vmapbuf() returning an error
if it is invalid.
 o Remove a debugging printf() from vmapbuf().

Suggested by:   tegge
2003-04-04 06:14:54 +00:00
phk
8207e9e353 Remove BIO_SETATTR from non-GEOM part of kernel as well. 2003-04-03 19:22:32 +00:00
jeff
12c39a9461 - Keep seperate statistics and run queues for different scheduling classes.
- Treat each class specially in kseq_{choose,add,rem}.  Let the rest of the
   code be less aware of scheduling classes.
 - Skip the interactivity calculation for non TIMESHARE ksegrps.
 - Move slice and runq selection into kseq_add().  Uninline it now that it's
   big.
2003-04-03 00:29:28 +00:00
peter
46969da5f8 Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup.  Currently its under
options LAZY_SWITCH.  What this does is avoid %cr3 reloads for short
context switches that do not involve another user process.  ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb.  However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still.  There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option.  I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff.  This was more
meant for the kthread case, while his was for interrupts.  Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place.  This allows us to catch the silly
case of doing a cpu_switch() to the current process.  This happens
uncomfortably often.  This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle).  This has been
implemented on i386 and (thanks to jake) sparc64.  The others will come
soon.  This is actually seperate to the lazy switch stuff.

Glanced at by:  jake, jhb
2003-04-02 23:53:30 +00:00
jhb
c0b4f09416 Lock the process before sending it a SIGIO. Not doing so is a panic(2)
implementation with INVARIANTS.
2003-04-02 21:54:51 +00:00
hsu
0878b5fe85 Need to hold the same SMP lock for (knote) list traversal as for
list manipulation.  This lock also protects read-modify-write operations
on the pipe_state field.
2003-04-02 15:24:50 +00:00
jeff
6a2f46d4e8 - Make the interactivity calculator decay faster.
- Make the pcpu estimator update faster.
2003-04-02 08:22:33 +00:00
jeff
036d55a8d6 - I meant divide by two and not shift by two in SCHED_PRI_NHALF. 2003-04-02 08:21:24 +00:00
jake
02364d4f5d - Make casuptr return the old value of the location we're trying to update,
and change the umtx code to expect this.

Reviewed by:	jeff
2003-04-02 08:02:27 +00:00
jeff
6470e002eb - Add in support for KSEs with 0 slice values on the run queue. If we try
to select a KSE with a slice of 0 we will update its slice and insert it
   onto the next queue.
 - Pass the KSE instead of the ksegrp into sched_slice().  This more
   accurately reflects the behavior of the code.  Slices are granted to kses.
 - Add a function kseq_nice_min() which finds the smallest nice value
   assigned to the kseg of any KSE on the queue.
 - Rewrite the logic in sched_slice().  Add a large comment describing the
   new slice selection scheme.  To summarize, slices are assigned based on
   the nice value.  Priorities are still calculated based on the nice and
   interactivity of a process.  Slice sizes of 0 may be granted for KSEs
   whos nice is 20 or futher away from the lowest nice on the run queue.
   Other nice values are scaled across the range [min, min+20].  This fixes
   ULEs bad behavior with positively niced processes.
2003-04-02 06:46:43 +00:00
jake
ac9bc07ca9 - Fix UC_COPY_SIZE. Adding up the size of structure fields doesn't take
alignment into account.
- Return EJUSTRETURN from set_context on success to avoid clobbering the
  first 2 out registers with td_retval on sparc64.
2003-04-01 23:25:18 +00:00
phk
1becd36845 #include <geom/geom_disk.h> 2003-04-01 19:00:38 +00:00
phk
78929984b3 Introduce bioq_flush() function. 2003-04-01 12:49:40 +00:00
jeff
3c4f704ebe - p will be unused in cursig() if INVARIANTS is not defined. Access it
through td->td_proc to avoid the unused variable.

Spotted by:	Maxim Konovalov <maxim@macomnet.ru>
2003-04-01 09:07:36 +00:00
jeff
c724e78c22 - Regen. 2003-04-01 02:34:21 +00:00
jeff
4f9ba753d8 - thr_exit() should no longer be called with Giant held. 2003-04-01 02:32:53 +00:00
jeff
d41c709cb3 - Mark the various thr syscalls as MP safe. Previously there was a bug if
this was not done since thr_exit() unwinds giant.
2003-04-01 02:32:07 +00:00
jeff
1b4d7b91ce - Borrow the KSE single threading code for exec and exit. We use the check
if (p->p_numthreads > 1) and not a flag because action is only necessary
   if there are other threads.  The rest of the system has no need to
   identify thr threaded processes.
 - In kern_thread.c use thr_exit1() instead of thread_exit() if P_THREADED
   is not set.
2003-04-01 01:26:20 +00:00
jeff
ddd8314458 - Regen for umtx. 2003-04-01 01:22:18 +00:00
jeff
a7da772fc1 - Include umtx.h in files generated by makesyscalls.sh
- Add system calls for umtx.
2003-04-01 01:12:24 +00:00
jeff
2921bb5e34 - Add an api for doing smp safe locks in userland.
- umtx_lock() is defined as an inline in umtx.h.  It tries to do an
   uncontested acquire of a lock which falls back to the _umtx_lock()
   system-call if that fails.
 - umtx_unlock() is also an inline which falls back to _umtx_unlock() if the
   uncontested unlock fails.
 - Locks are keyed off of the thr_id_t of the currently running thread which
   is currently just the pointer to the 'struct thread' in kernel.
 - _umtx_lock() uses the proc pointer to synchronize access to blocked thread
   queues which are stored in the first blocked thread.
2003-04-01 01:10:42 +00:00
jeff
919a0c8fa4 - We now have to include umtx.h and ucontext.h in the system call related
headers.
2003-04-01 00:35:12 +00:00
jeff
814bb99933 - Regen for thr related system calls. 2003-04-01 00:34:29 +00:00
jeff
5e69249b17 - Add the four thr related system calls. 2003-04-01 00:31:37 +00:00
jeff
a417418db4 - Add two files to support the thr threading interface.
- sys/thr.h contains the user space visible api that is intended only for
   use in threading library packages.
 - kern/kern_thr.c contains thr system calls and other thr specific code.
2003-04-01 00:30:30 +00:00
jeff
71a412bee0 - Regen for the sig*wait* system calls. 2003-03-31 23:33:45 +00:00
jeff
b23496dd54 - Define sigwait, sigtimedwait, and sigwaitinfo in terms of
kern_sigtimedwait() which is capable of supporting all of their semantics.
 - These should be POSIX compliant but more careful review is needed before
   we announce this.
2003-03-31 23:30:41 +00:00
jeff
46e6ba39f1 - Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
 - signotify() now operates on a thread since unmasked pending signals are
   stored in the thread.
 - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
2003-03-31 22:49:17 +00:00
julian
803202f956 Do NOT return from an non-interruptable cv_wait, falsely
claiming to have timed out. I don't know what I was thinking..
2003-03-31 22:41:47 +00:00
jeff
6e01278555 - Mark signals which may be delivered to any thread in the process with
SA_PROC.  Signals without this flag should be directed to a particular
   thread if this is possible.
2003-03-31 22:12:09 +00:00
jeff
4a3718fb25 - Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.
2003-03-31 22:02:38 +00:00
alc
3333da28bf Recent changes to uipc_cow.c have eliminated the need for some sf_buf-
related variables to be global.  Make them either local to sf_buf_init() or
static.
2003-03-31 06:25:42 +00:00
phk
bb8188c895 retire the "busy" field in bioqueues, it's served it's purpose. 2003-03-30 10:16:31 +00:00
phk
a0fbf93755 Preparation commit before I start on the bioqueue lockdown:
Collect all the bits of bioqueue handing in subr_disk.c, vfs_bio.c is big
enough as it is and disksort already lives in subr_disk.c.
2003-03-30 08:51:23 +00:00
jeff
83e8b19361 - We are not guaranteed that read ahead blocks are not in memory already.
Check for B_DELWRI as well as B_CACHED before issuing io on a buffer.  This
   is especially important since we are changing the b_iocmd.
2003-03-30 02:57:32 +00:00
alc
6f59be774d Pass the vm_page's address to sf_buf_alloc(); map the vm_page as part
of sf_buf_alloc() instead of expecting sf_buf_alloc()'s caller to map it.

The ultimate reason for this change is to enable two optimizations:
(1) that there never be more than one sf_buf mapping a vm_page at a time
and (2) 64-bit architectures can transparently use their 1-1 virtual
to physical mapping (e.g., "K0SEG") avoiding the overhead of pmap_qenter()
and pmap_qremove().
2003-03-29 06:14:14 +00:00
silby
7d7faf316e Add the m_defrag routine, as discussed on committers@. This
incarnation should address the concerns of all in the discussion,
and keeps statistics which show how much it is used.

MFC after:	2 weeks
2003-03-29 05:48:36 +00:00
jhb
4fcebd533b Check for the PS_NEEDSIGCHK flag in the right flags field. 2003-03-28 18:08:57 +00:00
silby
430664f150 Allow m_dup_pkthdr to accept mbufs with attached clusters as
targets.

Submitted by:	bmilekic
2003-03-28 05:57:48 +00:00
iedowse
b399d5ecbd Add a checksum to the kernel message buffer, and update it every
time a character is written. Use this at boot time to reject the
existing buffer contents if they are corrupt. This fixes a problem
seen on some hardware (especially laptops) where the message buffer
gets partially corrupted during a short power cycle or reset, but
the msgbuf structure is left intact so it gets reused, resulting
in random junk and control characters appearing in dmesg and
/var/log/messages.

PR:		kern/28497
2003-03-28 02:50:10 +00:00
tegge
ede5ebede7 Add support for reading directly from file to userland buffer when the
O_DIRECT descriptor status flag is set and both offset and length is a
multiple of the physical media sector size.
2003-03-26 23:40:42 +00:00
tegge
5e14826743 Adjust the number of vnodes scanned by vlrureclaim() according to the
size of the vnode list.
2003-03-26 22:15:58 +00:00
rwatson
84af8bf695 Permit debug.malloc.failure_rate to be specified using a tunable so
that the feature can be enabled during the boot process.  Note the
continued limitation that FreeBSD fails so rapidly with this setting
enabled that it's hard to narrow down particular failures for
correction; we really need per-malloc type failure rates.
2003-03-26 20:44:29 +00:00
rwatson
68d9c43724 Add a new kernel option, MALLOC_MAKE_FAILURES, which compiles
in a debugging feature causing M_NOWAIT allocations to fail at
a specified rate.  This can be useful for detecting poor
handling of M_NOWAIT: the most frequent problems I've bumped
into are unconditional deference of the pointer even though
it's NULL, and hangs as a result of a lost event where memory
for the event couldn't be allocated.  Two sysctls are added:

debug.malloc.failure_rate

  How often to generate a failure: if set to 0 (default), this
  feature is disabled.  Otherwise, the frequency of failures --
  I've been using 10 (one in ten mallocs fails), but other
  popular settings might be much lower or much higher.

debug.malloc.failure_count

  Number of times a coerced malloc failure has occurred as a
  result of this feature.  Useful for tracking what might have
  happened and whether failures are being generated.

Useful possible additions: tying failure rate to malloc type,
printfs indicating the thread that experienced the coerced
failure.

Reviewed by:	jeffr, jhb
2003-03-26 20:18:40 +00:00
tegge
d9da9de257 fp->f_offset doesn't need any protection when it isn't accessed. 2003-03-26 19:21:12 +00:00
rwatson
e5680de54a Modify the mac_init_ipq() MAC Framework entry point to accept an
additional flags argument to indicate blocking disposition, and
pass in M_NOWAIT from the IP reassembly code to indicate that
blocking is not OK when labeling a new IP fragment reassembly
queue.  This should eliminate some of the WITNESS warnings that
have started popping up since fine-grained IP stack locking
started going in; if memory allocation fails, the creation of
the fragment queue will be aborted.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-26 15:12:03 +00:00
jhb
671aa92ea0 Remove extraneous check. We are not going to return from copyin/out on
the stack of a thread A but actually be thread B instead of thread A.
2003-03-25 20:13:24 +00:00
mdodd
fe36e1c847 Give print_child a default method. 2003-03-25 04:32:52 +00:00
jake
783ae539c3 - Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
  with PAE.
- Use this to represent physical addresses in the MI vm system and in the
  i386 pmap code.  This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
  detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by:	DARPA, Network Associates Laboratories
Discussed with:	re, phk (cdevsw change)
2003-03-25 00:07:06 +00:00
jhb
98a481610a Replace the at_fork, at_exec, and at_exit functions with the slightly more
flexible process_fork, process_exec, and process_exit eventhandlers.  This
reduces code duplication and also means that I don't have to go duplicate
the eventhandler locking three more times for each of at_fork, at_exec, and
at_exit.

Reviewed by:	phk, jake, almost complete silence on arch@
2003-03-24 21:15:35 +00:00
jhb
966c72c345 - Remove witness_dead and just use witness_watch instead. If witness_watch
is set to 0, it now has the same affect as setting witness_dead used to
  have.
- Added a sysctl handler that allows root to change witness_watch from a
  non-zero value to zero to disable witness at runtime.  Note that you
  can't turn witness back on once it is off.  You can only turn it off as
  a one-way switch.
- Added a comment describing the possible values of witness_watch.
2003-03-24 21:03:53 +00:00
mux
cfd612e2d7 Remove a trailing semicolon in SCHED_QUANTUM definition.
Luckily this didn't cause any bugs.

Spotted by:	Samy Al Bahra <samy@kerneled.com>
2003-03-24 15:16:21 +00:00
cognet
501da8bd5f s/discriptors/descriptors/ 2003-03-23 19:41:34 +00:00
tjr
9785758af0 Remove unused mtx_lock_giant(), mtx_unlock_giant(), related globals
and sysctls.
2003-03-23 11:26:11 +00:00
yar
f9968b4d9f We shouldn't assert that a vode is locked in vop_lock_post()
if VOP_LOCK() has failed.

Reviewed by:	jeff
2003-03-22 13:21:54 +00:00
jhb
300b98684d Use td_ucred of curthread instead of p_ucred of curproc. This required
changing sem_perm() and sem_hasopen() to take a thread instead of a proc
for the first argument.
2003-03-20 21:12:31 +00:00
phk
afab808bc4 Backout the getcwd changes, a more comprehensive effort will be needed. 2003-03-20 10:40:45 +00:00
davidxu
8e88e8da05 Adjust code for userland preemptive. Userland can set a quantum in
kse_mailbox to schedule an upcall, this is useful for userland timeout
routine, for example pthread_cond_timedwait().

Also extract upcall scheduling code from kse_reassign and create
a new function called thread_switchout to include these code.

Reviewed by: julain
2003-03-19 05:49:38 +00:00
des
d0bee7242c Unregisterize, ansify. 2003-03-19 00:49:40 +00:00
des
e97206db4c Whitespace cleanup. 2003-03-19 00:33:38 +00:00
jake
ce68d8fa22 long != int. Use SYSCTL_UINT for kern.devstat.generation. Fixes booting
on sparc64.
2003-03-18 23:32:27 +00:00
gallatin
d18eb82da5 Fix a race condition in socow_setup(): The page must be wired before
sf_buf_alloc() is called, as sf_buf_alloc() may sleep.  If it does sleep,
the page might be reclaimed before wiring occurs.

Reported by: alc
2003-03-18 18:27:33 +00:00
phk
eef257a93e If devstat_new_entry() is passed a unit number of -1 assume that
the devstat is for an "interior" GEOM node and register using the
name argument as a geom identity pointer.  Do not put these devstat
structures on the list returned by the sysctl.

This gives us the ability to tell the two kinds of nodes apart and
leave the current "strictly physical" view of devstat intact without
modifications, yet be able to use devstat for both kinds of devices.

It also saves us bloating struct devstat with another 48 bytes of
space for the name.  At least for now.

Reviewed by:    ken
2003-03-18 09:30:31 +00:00
phk
45ffc6d110 Make devstat fully Giant agnostic:
Add a mutex and protect the allocation and traversal of the list with it.

When we allocate a page for devstat use we drop the mutex and use
M_WAITOK this is not nice, but under the given circumstances the
best we can do.

In the sysctl handler for returning the devstat entries we do not want to
hold the mutex across copyout(9) calls, so we keep a very careful eye on
the devstat_generation count, and abandon with EBUSY if it changes under
our feet.

Specifically test for BIO_WRITE, rather than default non-read,non-deletes
as write.  Make the default be DEVSTAT_NO_DATA.

Add atomic increments of the sequence[01] fields so applications using the
mmap'ed view stand a chance of detecting updates in progress.

Reviewed by:    ken
2003-03-18 09:20:20 +00:00
phk
e059b79437 Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
phk
6c4e57c44b Make devstat_new_entry() take a const void * rather than const char *
argument, GEOM nodes are not identified by ascii string.
2003-03-18 07:52:59 +00:00
jeff
c0f79e52d6 - Unlock the target bp and not the pager buf bp in a failure case in
cluster_wbuild().  This was causing strange panics that were widely
   reported on current@.

Big Pointy Hat to:	jeff
2003-03-17 18:38:49 +00:00
phk
90f136c592 (This commit certainly increases the need for a wash&clean of vfs_cache.c,
but I decided that it was important for this patch to not bit-rot, and
since it is mainly moving code around, the total amount of entropy is
epsilon /phk)

This is a patch to move the common parts of linux_getcwd() back into
kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's
extended functionality.  The linux syscall linux_getcwd() in
compat/linux/linux_getcwd.c has been rewritten to use it too.  It should
be possible to simplify libc's getcwd() after this.  No doubt this code
needs some cleaning up, since I've left in the sysctl variables I used
for debugging.

PR:	48169
Submitted by:	James Whitwell <abacau@yahoo.com.au>
2003-03-17 12:21:08 +00:00
phk
1ff2d4dcb1 Add a #define for the device name of the mmap device for devstat.
Constify the geom identification pointer.
2003-03-16 23:20:05 +00:00
alc
4641d3d127 Pass the sf buf to MEXTADD() as the optional argument. This permits
the simplification of socow_iodone() and sf_buf_free(); they don't
have to reverse engineer the sf buf from the data's address.
2003-03-16 07:19:12 +00:00
phk
1b534a022f One devstat_start_transaction_bio() is enough. 2003-03-15 22:20:38 +00:00
phk
f432014308 Run a revision of the devstat interface:
Kernel:

Change statistics to use the *uptime() timescale (ie: relative to
boottime) rather than the UTC aligned timescale.  This makes the
device statistics code oblivious to clock steps.

Change timestamps to bintime format, they are cheaper.

Remove the "busy_count", and replace it with two counter fields:
"start_count" and "end_count", which are updated in the down and
up paths respectively.  This removes the locking constraint on
devstat.

Add a timestamp argument to devstat_start_transaction(), this will
normally be a timestamp set by the *_bio() function in bp->bio_t0.
Use this field to calculate duration of I/O operations.

Add two timestamp arguments to devstat_end_transaction(), one is
the current time, a NULL pointer means "take timestamp yourself",
the other is the timestamp of when this transaction started (see
above).

Change calculation of busy_time to operate on "the salami principle":
Only when we are idle, which we can determine by the start+end
counts being identical, do we update the "busy_from" field in the
down path.  In the up path we accumulate the timeslice in busy_time
and update busy_from.

Change the byte_* and num_* fields into two arrays: bytes[] and
operations[].

Userland:

Change the misleading "busy_time" name to be called "snap_time" and
make the time long double since that is what most users need anyway,
fill it using clock_gettime(CLOCK_MONOTONIC) to put it on the same
timescale as the kernel fields.

Change devstat_compute_etime() to operate on struct bintime.

Remove the version 2 legacy interface: the change to bintime makes
compatibility far too expensive.

Fix a bug in systat's "vm" page where boot relative busy times would
be bogus.

Bump __FreeBSD_version to 500107

Review & Collaboration by:	ken
2003-03-15 21:59:06 +00:00
phk
4a623e073c Add a devstat_start_transaction_bio() to match the
devstat_end_transaction_bio() we already have.

For now it just calls devstat_start_transaction(), but that will change
shortly.
2003-03-15 10:33:32 +00:00
davidxu
c2573f692d Export current time when returning from never blocked syscall. 2003-03-14 03:52:16 +00:00
jhb
e90ccce535 Trim some trailing whitespace. 2003-03-13 23:07:09 +00:00
jhb
a21ccbbf8c Add a new userland-visible ktrace flag KTR_DROP and an internal ktrace flag
KTRFAC_DROP to track instances when ktrace events are dropped due to the
request pool being exhausted.  When a thread tries to post a ktrace event
and is unable to due to no available ktrace request objects, it sets
KTRFAC_DROP in its process' p_traceflag field.  The next trace event to
successfully post from that process will set the KTR_DROP flag in the
header of the request going out and clear KTRFAC_DROP.

The KTR_DROP flag is the high bit in the type field of the ktr_header
structure.  Older kdump binaries will simply complain about an unknown type
when seeing an entry with KTR_DROP set.  Note that KTR_DROP being set on a
record in a ktrace file does not tell you anything except that at least one
event from this process was dropped prior to this event.  The user has no
way of knowing what types of events were dropped nor how many were dropped.

Requested by:	phk
2003-03-13 18:31:15 +00:00
jhb
f02ef38080 - Cache a reference to the credential of the thread that starts a ktrace in
struct proc as p_tracecred alongside the current cache of the vnode in
  p_tracep.  This credential is then used for all later ktrace operations on
  this file rather than using the credential of the current thread at the
  time of each ktrace event.
- Now that we have multiple ktrace-related items in struct proc that are
  pointers, rename p_tracep to p_tracevp to make it less ambiguous.

Requested by:	rwatson (1)
2003-03-13 18:24:22 +00:00
iedowse
db8dd4828e In m_dup_pkthdr(), convert the supplied `how' argument into malloc
flags when passing it into m_tag_copy_chain(), as m_tag* functions
use malloc, not mbuf flags.
2003-03-13 09:02:19 +00:00
jeff
459181e3ed - Add a lock for protecting against msleep(bp, ...) wakeup(bp) races.
- Create a new function bdone() which sets B_DONE and calls wakup(bp). This
   is suitable for use as b_iodone for buf consumers who are not going
   through the buf cache.
 - Create a new function bwait() which waits for the buf to be done at a set
   priority and with a specific wmesg.
 - Replace several cases where the above functionality was implemented
   without locking with the new functions.
2003-03-13 07:31:45 +00:00
jeff
ec5374265b - Remove a dead check for bp->b_vp == vp in vtruncbuf(). This has not been
possible for some time.
 - Lock the buf before accessing fields.  This should very rarely be locked.
 - Assert that B_DELWRI is set after we acquire the buf.  This should always
   be the case now.
2003-03-13 07:22:53 +00:00
jeff
ae3c8799da - Remove a race between fsync like functions and flushbufqueues() by
requiring locked bufs in vfs_bio_awrite().  Previously the buf could
   have been written out by fsync before we acquired the buf lock if it
   weren't for giant.  The cluster_wbuild() handles this race properly but
   the single write at the end of vfs_bio_awrite() would not.
 - Modify flushbufqueues() so there is only one copy of the loop.  Pass a
   parameter in that says whether or not we should sync bufs with deps.
 - Call flushbufqueues() a second time and then break if we couldn't find
   any bufs without deps.
2003-03-13 07:19:23 +00:00
alfred
91e561ec03 Make sure we actually have a dev before dereferencing in case someone
botches and sends us a NULL pointer.  The other code in this file seems
to expect it to be able to handle it behaving this way.
2003-03-13 06:29:44 +00:00
jeff
814703b2a4 - Tune down read_max. For single disks we get no gain out of reading more
than a MAXPHYS size block ahead.  Having this set too high just leaves
   other processes starved for IO and screws up interactive response.  Let the
   users with RAID set it higher when they need it.
2003-03-13 06:17:59 +00:00
tjr
2230824221 Tidy up previous change: move comment about obtaining an exclusive
reference where it belongs, and remove a blank line to make it more
obvious what the comment applies to.
2003-03-13 00:57:47 +00:00
tjr
a9d877b4c2 Back out previous. The locking here needs a rethink. 2003-03-13 00:54:53 +00:00
jhb
954b82f293 - Various little style fixes.
- If SYSCTL_OUT() fails in sysctl_kern_proc_args(), return the error
  instead of ignoring it if we have new arguments for the process.
- If the new arguments for a process are too long, return ENOMEM instead of
  returning success but not doing the actual copy.

Submitted by:	bde
2003-03-12 20:17:40 +00:00
jhb
7510e91aa2 - Avoid dropping the proc lock around a simple permissions check and just
hold hold it across the check to avoid extra lock operations in the
  common case.
- Copy in the new args to a temporary pargs structure before we drop the
  reference to the old one.  Thus, if the copyin() fails, the process
  arguments are unchanged rather than being deleted.  Also, p_args is no
  longer NULL during the sysctl operation.
2003-03-12 16:14:55 +00:00
tjr
679efe569a Acquire sched_lock around use of FOREACH_KSEGRP_IN_PROC, accesses
to kg_nice and calls to sched_nice() in getpriority() and setpriority()
(really donice()).
2003-03-12 11:24:41 +00:00
tjr
59d5730195 In wait1(), remove the zombie process from zombproc before removing
it from its pgrp to avoid leaving zombies around with p_pgrp == NULL.
This bug was apparent as a NULL-dereference in the pid selection code
in fork1().
2003-03-12 11:10:04 +00:00
jhb
0c3ac305c8 Trim an extra blank line that snuck into the last commit. 2003-03-11 22:33:42 +00:00
kan
378cd3b05d Rename vfs_stdsync function to vfs_stdnosync which matches more
closely what function is really doing. Update all existing consumers
to use the new name.

Introduce a new vfs_stdsync function, which iterates over mount
point's vnodes and call FSYNC on each one of them in turn.

Make nwfs and smbfs use this new function instead of rolling their
own identical sync implementations.

Reviewed by:	jeff
2003-03-11 22:15:10 +00:00