Commit Graph

6540 Commits

Author SHA1 Message Date
Mike Makonnen
994599d782 Fix umtx locking, for libthr, in the kernel.
1. There was a race condition between a thread unlocking
   a umtx and the thread contesting it. If the unlocking
   thread won the race it may try to wakeup a thread that
   was not yet in msleep(). The contesting thread would then
   go to sleep to await a wakeup that would never come. It's
   not possible to close the race by using a lock because
   calls to casuptr() may have to fault a page in from swap.
   Instead, the race was closed by introducing a flag that
   the unlocking thread will set when waking up a thread.
   The contesting thread will check for this flag before
   going to sleep. For now the flag is kept in td_flags,
   but it may be better to use some other member or create
   a new one because of the possible performance/contention
   issues of having to own sched_lock. Thanks to jhb for
   pointing me in the right direction on this one.

2. Once a umtx was contested all future locks and unlocks
   were happening in the kernel, regardless of whether it
   was contested or not. To prevent this from happening,
   when a thread locks a umtx it checks the queue for that
   umtx and unsets the contested bit if there are no other
   threads waiting on it. Again, this is slightly more
   complicated than it needs to be because we can't hold
   a lock across casuptr(). So, the thread has to check
   the queue again after unseting the bit, and reset the
   contested bit if it finds that another thread has put
   itself on the queue in the mean time.

3. Remove the if... block for unlocking an uncontested
   umtx, and replace it with a KASSERT. The _only_ time
   a thread should be unlocking a umtx in the kernel is
   if it is contested.
2003-07-17 11:06:40 +00:00
Bosko Milekic
48719ca7c8 Change the style of the english used to print accounting enabled
and disabled.  This means no period at the end and changing
"Process accounting <foo>" to "Accounting <foo>".

Pointed out by: bde
2003-07-16 13:20:10 +00:00
Bosko Milekic
d2dbf5bc0b Log process accounting activation/deactivation.
Useful for some auditing purposes.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/54529
2003-07-16 03:59:50 +00:00
Don Lewis
6ff1481d5c Rearrange the SYSINIT order to call lockmgr_init() earlier so that
the runtime lockmgr initialization code in lockinit() can be eliminated.

Reviewed by:	jhb
2003-07-16 01:00:39 +00:00
David Xu
af161f2232 If initial thread is still a bound thread, don't change its signal mask. 2003-07-15 14:04:38 +00:00
Hartmut Brandt
7e9024cdd9 Add a facility for devices, specifically network interfaces, that require
large to huge amounts of small or medium sized receive buffers. The problem
with these situations is that they eat up the available DMA address space
very quickly when using mbufs or even mbuf clusters. Additionally this
facility provides a direct mapping between 32-bit integers and these buffers.
This is needed for devices originally designed for 32-bit systems. Ususally
the virtual address of the buffer is used as a handle to find the buffer as
soon as it is returned by the card. This does not work for 64-bit machines
and hence this mapping is needed.
2003-07-15 08:59:38 +00:00
David Xu
4b7d5d84ee Rename thread_siginfo to cpu_thread_siginfo 2003-07-15 04:26:26 +00:00
Jeffrey Hsu
330841c763 Rev 1.121 meant to pass the value 1 to soalloc() to indicate waitok.
Reported by:	arr
2003-07-14 20:39:22 +00:00
Don Lewis
857d9c60d0 Extend the mutex pool implementation to permit the creation and use of
multiple mutex pools with different options and sizes.  Mutex pools can
be created with either the default sleep mutexes or with spin mutexes.
A dynamically created mutex pool can now be destroyed if it is no longer
needed.

Create two pools by default, one that matches the existing pool that
uses the MTX_NOWITNESS option that should be used for building higher
level locks, and a new pool with witness checking enabled.

Modify the users of the existing mutex pool to use the appropriate pool
in the new implementation.

Reviewed by:	jhb
2003-07-13 01:22:21 +00:00
Robert Drehmel
baf731e6ed Make the system call vector name of a process accessible to user
land applications by introducing the KERN_PROC_SV_NAME sysctl node,
which is searchable by PID.
2003-07-12 02:00:16 +00:00
David Xu
ffb2e92a98 If a thread is sending signal to its process, if the thread can handle
the signal itself, it should get it without looking for other threads.
2003-07-11 13:42:23 +00:00
Mike Silbersack
347194c172 Add init_param3() to subr_param. This function is called
immediately after the kernel map has been sized, and is
the optimal place for the autosizing of memory allocations
which occur within the kernel map to occur.

Suggested by:	bde
2003-07-11 00:01:03 +00:00
Peter Wemm
e95babf3a8 unifdef -DLAZY_SWITCH and start to tidy up the associated glue. 2003-07-10 01:02:59 +00:00
Mike Silbersack
ff56f15e26 A few minor changes:
- Use atomic ops to update the bigpipe count
- Make the bigpipe count sysctl readable
- Remove a duplicate comparison in an if statement
- Comment two SYSCTLs.
2003-07-09 21:59:48 +00:00
Mike Silbersack
41f16f8208 Pull in the entire kmem_map size calculation from kern_malloc, rather
than the shortcircuited version I had been using, which only worked
properly on i386 & amd64.

Also, change an autoscale constant to account for the more correct
kmem_map size.

Problem noticed by:     mux
2003-07-08 18:59:21 +00:00
Jeff Roberson
0c0a98b231 - When stealing a kse in kseq_move() ignore the current kseq's min nice
value.  We want to steal any thread, even one that is not given a slice
   on its current queue.
2003-07-08 06:19:40 +00:00
Mike Silbersack
289016f2d1 Put some concrete limits on pipe memory consumption:
- Limit the total number of pipes so that we do not
  exhaust all vm objects in the kernel map.  When
  this limit is reached, a ratelimited message will
  be printed to the console.

- Put a soft limit on the amount of memory consumable
  by pipes.  Once the limit has been reached, all new
  pipes will be limited to 4K in size, rather than the
  default of 16K.

- Put a limit on the number of pages that may be used
  for high speed page flipping in order to reduce the
  amount of wired memory.  Pipe writes that occur
  while this limit is exceeded will fall back to
  non-page flipping mode.

The above values are auto-tuned in subr_param.c and
are scaled to take into account both the size of
physical memory and the size of the kernel map.

These limits help to reduce the "kernel resources exhausted"
panics that could be caused by opening a large
number of pipes.  (Pipes alone are no longer able
to exhaust all resources, but other kernel memory hogs
in league with pipes may still be able to do so.)

PR:			53627
Ideas / comments from:	hsu, tjr, dillon@apollo.backplane.com
MFC after:		1 week
2003-07-08 04:02:31 +00:00
Jeff Roberson
0ec896fd28 - Clean up an unused variable.
Submitted by:	Steve Kargl <skg@routmask.apl.washington.edu>
2003-07-07 21:08:28 +00:00
Mike Makonnen
14b5ae1a98 Make the conditional, which decides what siglist to put a signal on,
more concise and improve the comment.

Submitted by: bde
2003-07-05 08:37:40 +00:00
Mike Makonnen
e55c35c433 I was so happy I found the semi-colon from hell that I didn't
notice another typo in the same line. This typo makes libthr unuseable,
but it's effects where counter-balanced by the extra semicolon, which
made libthr remarkably useable for the past several months.
2003-07-04 23:28:42 +00:00
Jeff Roberson
749d01b011 - Parse the cpu topology map in sched_setup().
- Associate logical CPUs on the same physical core with the same kseq.
 - Adjust code that assumed there would only be one running thread in any
   kseq.
 - Wrap the HTT code with a ULE_HTT_EXPERIMENTAL ifdef.  This is a start
   towards HyperThreading support but it isn't quite there yet.
2003-07-04 19:59:00 +00:00
Poul-Henning Kamp
1226914c17 Use the f_vnode field to tell which file descriptors have a vnode. 2003-07-04 12:20:27 +00:00
Mike Makonnen
1069e3a6f4 It's unfair how one extraneous semi-colon can cause so much grief. 2003-07-04 11:18:07 +00:00
Mike Makonnen
71cfaac0b0 style(9)
o Remove double-spacing, and while I'm here add a couple
  of braces as well.

Requested by:	bde
2003-07-04 06:59:28 +00:00
Olivier Houchard
a10d5f02c8 In setpgrp(), don't assume a pgrp won't exist if the provided pgid is the same
as the target process' pid, it may exist if the process forked before leaving
the pgrp.
Thix fixes a panic that happens when calling setpgid to make a process
re-enter the pgrp with the same pgid as its pid if the pgrp still exists.
2003-07-04 02:21:28 +00:00
Mike Makonnen
8689793bfb kse_thr_interrupt should target the thread, specifically.
Requested by:	davidxu
2003-07-04 01:41:32 +00:00
Mike Makonnen
c197abc49a Signals sent specifically to a particular thread must
be delivered to that thread, regardless of whether it
has it masked or not.

Previously, if the targeted thread had the signal masked,
it would be put on the processes' siglist. If
another thread has the signal umasked or unmasks it before
the target, then the thread it was intended for would never
receive it.

This patch attempts to solve the problem by requiring callers
of tdsignal() to say whether the signal is for the thread or
for the process. If it is for the process, then normal processing
occurs and any thread that has it unmasked can receive it.
But if it is destined for a specific thread, it is put on
that thread's pending list regardless of whether it is currently
masked or not.

The new behaviour still needs more work, though.  If the signal
is reposted for some reason it is always posted back to the
thread that handled it because the information regarding the
target of the signal has been lost by then.

Reviewed by:	jdp, jeff, bde (style)
2003-07-03 19:09:59 +00:00
John Baldwin
f7ee15901a - Add comments about the maintenance of the per-thread list of contested
locks held by each thread.
- Fix a bug in the original BSD/OS code where a contested lock was not
  properly handed off from the old thread to the new thread when a
  contested lock with more than one blocked thread was transferred from
  one thread to another.
- Don't use an atomic operation to write the MTX_CONTESTED value to
  mtx_lock in the aforementioned special case.  The memory barriers and
  exclusion provided by sched_lock are sufficient.

Spotted by:	alc (2)
2003-07-02 16:14:09 +00:00
John Baldwin
6591b31040 Add a resource_disabled() helper function that returns true (non-zero) if
a specified resource has been disabled via a non-zero 'disabled' hint and
false otherwise.
2003-07-02 16:01:38 +00:00
Poul-Henning Kamp
d94e36521e typo fix in comment. 2003-07-02 08:01:52 +00:00
David Xu
34178711be Allow SA process unblocks a thread blocked in condition variable.
Reviewed by: deischen
2003-07-02 01:19:15 +00:00
Ian Dowse
318f2fb4bf Add a new mount flag MNT_BYFSID that can be used to unmount a file
system by specifying the file system ID instead of a path. Use this
by default in umount(8). This avoids the need to perform any vnode
operations to look up the mount point, so it makes it possible to
unmount a file system whose root vnode cannot be looked up (e.g.
due to a dead NFS server, or a file system that has become detached
from the hierarchy because an underlying file system was unmounted).
It also provides an unambiguous way to specify which file system is
to be unmunted.

Since the ability to unmount using a path name is retained only for
compatibility, that case now just uses a simple string comparison
of the supplied path against f_mntonname of each mounted file system.

Discussed on:	freebsd-arch
mdoc help from:	ru
2003-07-01 17:40:23 +00:00
Scott Long
79501b66a7 Make swi_vm be INTR_MPSAFE. On all platforms, it is only used to activate
busdma_swi().  Now that busdma_swi() uses driver-provided locking, this
should be safe.
2003-07-01 16:00:38 +00:00
David Xu
df9c6cda37 Fix typo. 2003-06-30 10:04:04 +00:00
Marcel Moolenaar
4e4422d4d4 Don't use fuword() and suword() on struct members of type int. This
happens to work on 32-bit platforms as sizeof(long)=sizeof(int), but
wrecks all kinds of havoc (garbage reads, corrupting writes and
misaligned loads/stores) on 64-bit architectures.
The fix for now is to use fuword32() and suword32() and change the
type of the applicable int fields to int32. This is to make it
explicit that we depend on these fields being 32-bit. We may want
to revisit this later.

Reviewed by: deischen
2003-06-28 19:45:15 +00:00
Jeff Roberson
7a20304f84 - Don't migrate to stopped cpus. 2003-06-28 09:09:33 +00:00
David Xu
9dde3bc999 o Change kse_thr_interrupt to allow send a signal to a specified thread,
or unblock a thread in kernel, and allow UTS to specify whether syscall
  should be restarted.
o Add ability for UTS to monitor signal comes in and removed from process,
  the flag PS_SIGEVENT is used to indicate the events.
o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with
  this flag set to wait for above signal event.
o For SA based thread, kernel masks all signal in its signal mask, let
  UTS to use kse_thr_interrupt interrupt a thread, and install a signal
  frame in userland for the thread.
o Add a tm_syncsig in thread mailbox, when a hardware trap occurs,
  it is used to deliver synchronous signal to userland, and upcall
  is schedule, so UTS can process the synchronous signal for the thread.

Reviewed by: julian (mentor)
2003-06-28 08:29:05 +00:00
Jeff Roberson
86f8ae9663 - If smp is not started yet don't try to load balance or we'll put threads
on cpus that aren't running yet.
2003-06-28 08:24:42 +00:00
David Xu
418228df24 Fix POSIX compatible bug for sigwaitinfo and sigtimedwait.
POSIX says siginfo pointer parameter can be NULL and if the
function success, it should return signal number but not zero.
The waitset it past should be negatived before it can be
used as thread signal mask.
2003-06-28 08:03:28 +00:00
Jeff Roberson
a91172ade1 - Throttle the inherited sleep and run time in sched_fork_kseg(). This
allows us to learn the behavior of a thread much more quickly after it
   starts up.
2003-06-28 06:19:56 +00:00
Jeff Roberson
e493a5d90c - Adjust the default maximum slice value to ~140ms. This has improved the
nice distribution without significantly impacting interactive response.
   As a side effect it should also allow batch processes to run for a
   slightly longer period which will positively impact their performance.
2003-06-28 06:04:47 +00:00
Peter Wemm
eabd19726f Tidy up leftover lazy_switch instrumentation that is no longer needed.
This cleans up some #ifdef hell.
2003-06-27 22:39:14 +00:00
Sean Kelly
6cda41555b Fix this to build on alpha. Build test successful.
Suggested fix from:	tjr
2003-06-27 08:35:05 +00:00
Sean Kelly
370c3cb57c - Add a software watchdog facility.
This commit has two pieces. One half is the watchdog kernel code which lives
primarily in hardclock() in sys/kern/kern_clock.c. The other half is a userland
daemon which, when run, will keep the watchdog from firing while the userland
is intact and functioning.

Approved by:	jeff (mentor)
2003-06-26 09:50:52 +00:00
Warner Losh
4f2073fb4c Fix leap second processing by the kernel time keeping routines.
Before, we would add/subtract the leap second when the system had been
up for an even multiple of days, rather than at the end of the day, as
a leap second is defined (at least wrt ntp).  We do this by
calculating the notion of UTC earlier in the loop, and passing that to
get it adjusted.  Any adjustments that ntp_update_second makes to this
time are then transferred to boot time.  We can't pass it either the
boot time or the uptime because their sum is what determines when a
leap second is needed.  This code adds an extra assignment and two
extra compare in the typical case, which is as cheap as I could made
it.

I have confirmed with this code the kernel time does the correct thing
for both positive and negative leap seconds.  Since the ntp interface
doesn't allow for +2 or -2, those cases can't be tested (and the folks
in the know here say there will never be a +2s or -2s leap event, but
rather two +1s or -1s leap events).

There will very likely be no leap seconds for a while, given how the
earth is speeding up and slowing down, so there will be plenty of time
for this fix to propigate.  UT1-UTC is currently at "about -0.4s" and
decrementing by .1s every 8 months or so.  6 * 8 is 48 months, or 4
years.

-stable has different code, but a similar bug that was introduced
about the time of the last leap second, which is why nobody has
noticed until now.

MFC After: 3 weeks
Reviewed by: phk

"Furthermore, leap seconds must die." -- Cato the Elder
2003-06-25 21:23:51 +00:00
Warner Losh
eac3c62b51 During a positive leap second, the tai_time offset should be
incremented at the start of the leap second, not after the leap second
has been inserted.  This is because at the start of the leap second,
we set the time back one second.  This setting back one second is the
moment that the offset changes.  The old code set it back after the
leap second, but that's one second too late.  The negative leap second
case is handled correctly.

Reviewed by: phk
2003-06-25 20:56:40 +00:00
Olivier Houchard
7f3bfd6651 At this point targp will always be NULL, so remove the useless if. 2003-06-25 13:28:32 +00:00
Warner Losh
4e82e5f6f1 Use UTC rather than GMT to describe time scale. latter is obsolete. 2003-06-23 20:14:08 +00:00
Robert Watson
f51e58036e Redesign the externalization APIs from the MAC Framework to
the MAC policy modules to improve robustness against C string
bugs and vulnerabilities.  Following these revisions, all
string construction of labels for export to userspace (or
elsewhere) is performed using the sbuf API, which prevents
the consumer from having to perform laborious and intricate
pointer and buffer checks.  This substantially simplifies
the externalization logic, both at the MAC Framework level,
and in individual policies; this becomes especially useful
when policies export more complex label data, such as with
compartments in Biba and MLS.

Bundled in here are some other minor fixes associated with
externalization: including avoiding malloc while holding the
process mutex in mac_lomac, and hence avoid a failure mode
when printing labels during a downgrade operation due to
the removal of the M_NOWAIT case.

This has been running in the MAC development tree for about
three weeks without problems.

Obtained from:	TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-06-23 01:26:34 +00:00
Robert Watson
6b42f0a2eb Prefer the vop_rmextattr() vnode operation for removing extended
attributes from objects over vop_setextattr() with a NULL uio; if
the file system doesn't support the vop_rmextattr() method, fall
back to the vop_setextattr() method.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-22 23:03:07 +00:00
Robert Watson
77533ed2aa Expose vop_rmextattr as an explicit operation at the vnode operation
interface, rather than relying on a NULL uio for the deletion
operation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-22 22:45:24 +00:00
Robert Watson
4b090e41ff Add an explicit credential argument to alq_open() to allow the caller to
specify what credential to use when authorizing vn_open() and later
write operations, rather than curthread->td_ucred.

When writing KTR traces to an ALQ, specify the credential of the thread
generating the sysctl request.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-22 22:28:56 +00:00
Poul-Henning Kamp
3b6d965263 Add a f_vnode field to struct file.
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
2003-06-22 08:41:43 +00:00
Ian Dowse
adef9265ef When DDB is active, always send printf() output directly to the
console, even if there is a TIOCCONS console tty. We were already
doing this after a panic, but it's also useful when entering DDB
for some other reason too.
2003-06-22 03:20:24 +00:00
Ian Dowse
d29bf12ff8 Use a new message buffer `consmsgbuf' to forward messages to a
TIOCCONS console (e.g. xconsole) via a timeout routine instead of
calling into the tty code directly from printf(). This fixes a
number of cases where calling printf() at the wrong time (such as
with locks held) would cause a panic if xconsole is running.

The TIOCCONS message buffer is 8k in size by default, but this can
be changed with the kern.consmsgbuf_size sysctl. By default, messages
are checked for 5 times per second. The timer runs and the buffer
memory remains allocated only at times when a TIOCCONS console is
active.

Discussed on:	freebsd-arch
2003-06-22 02:54:33 +00:00
Ian Dowse
4784a46912 Replace the code for reading and writing the kernel message buffer
with a new implementation that has a mostly reentrant "addchar"
routine, supports multiple message buffers in the kernel, and hides
the implementation details from callers.

The new code uses a kind of sequence number to represend the current
read and write positions in the buffer. This approach (suggested
mainly by bde) permits the read and write pointers to be maintained
separately, which reduces the number of atomic operations that are
required. The "mostly reentrant" above refers to the way that while
it is now always safe to have any number of concurrent writers,
readers could see the message buffer after a writer has advanced
the pointers but before it has witten the new character.

Discussed on:	freebsd-arch
2003-06-22 02:18:31 +00:00
Jeff Roberson
1a7a9d0ec2 - lticks was erroneously being updated in sched_pctcpu(). This was causing
us to skip the pctcpu_update() call which lead to inaccurate cpu usage
   statistics for processes that didn't run often.
2003-06-21 02:31:49 +00:00
Jeff Roberson
665cb285a8 - Don't allow nice to have such a large effect on priority. This was
causing poor interactive performance while unnice processes were running.
   The new scheme still allows nice to have an effect on priority but it is
   not as dramatic as the effect of the interactivity score.
2003-06-21 02:22:47 +00:00
Bosko Milekic
b2b417bb41 Fix a divide-by-zero on kern.log_wakeups_per_second tunable.
Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/53557
2003-06-20 22:18:38 +00:00
Stefan Eßer
c2ef4dd48a Add comment about **vpp being special-cased in vnode_if.awk (1.38) 2003-06-20 12:24:06 +00:00
David Xu
ab78d4d641 cpu_set_upcall_kse needs to access userspace, release schedule lock
before calling it for bound thread. To avoid this problem, change
thread_schedule_upcall to not put new thread on run queue, let caller
do it, so we can tweak the new thread before setting it to run.

Reported by: pho
2003-06-20 09:12:12 +00:00
Poul-Henning Kamp
166400b7e6 Don't put callout_lock under #ifdef DIAGNOSTIC despite the fact that it
works anyway.
2003-06-20 08:39:04 +00:00
Poul-Henning Kamp
568733688b Initialize b_saveaddr when we hand out buffers 2003-06-20 08:26:38 +00:00
Poul-Henning Kamp
ce6912c420 Crude but efficient:
#ifdef DIAGNOSTIC hold a mutex while calling callout's so that we hear
about it if they sleep.
2003-06-20 08:07:15 +00:00
Poul-Henning Kamp
eaaca5deee Don't (re)initialize f_gcflag to zero.
Move initialization of DTYPE_VNODE specific field f_seqcount into
the DTYPE_VNODE specific code.
2003-06-20 08:02:30 +00:00
David Xu
062cf543fc When a STOP signal is being sent to a process, it is possible all
threads in the process have already masked the signal, so job control
is delayed. But later a thread unmasking the STOP signal should enable
job control, so in issignal(), scanning all threads in process to see
if we can direct suspend some of them, not just suspend current thread.
2003-06-20 03:36:45 +00:00
David Xu
8b56079e2b Fix typo. td should be td0. 2003-06-20 01:56:28 +00:00
Alfred Perlstein
bab88630ba Unlock the struct file lock before aquiring Giant, otherwise
we can deadlock because of lock order reversals.  This was not
caught because Witness ignores pool mutexes right now.

Diagnosis and help: truckman
Noticed by: pho
2003-06-19 18:13:07 +00:00
Mike Silbersack
b083ea5114 Add a ratelimited message of the form
"maxproc limit exceeded by uid %i, please see tuning(7) and login.conf(5)."

Which will be triggered whenever a user hits his/her maxproc limit or
the systemwide maxproc limit is reached.

MFC after:	1 week
2003-06-19 05:57:25 +00:00
Don Lewis
6084b6c9d5 FILE_LOCK() uses a pool mutex, as does the vnode v_vnlock. Since pool
mutexes are supposed to only be used as leaf mutexes, and what appear
to be separate pool mutexes could be aliased together, it is bad idea
for a thread to attempt to hold two pool mutexes at the same time.

Slightly rearrange the code in kern_open() so that FILE_UNLOCK() is
called before calling VOP_GETVOBJECT(), which will grab the v_vnlock
mutex.
2003-06-19 04:10:56 +00:00
Mike Silbersack
4d7dfc31b8 Add a rate limited message reporting when kern.maxfiles is exceeded,
reporting who did it.

Also, fix a style bug introduced in the previous change.

MFC after:	1 week
2003-06-19 04:07:12 +00:00
Don Lewis
8d5f9131fc VOP_GETVOBJECT() wants to be called with the vnode lock held. 2003-06-19 03:55:01 +00:00
Poul-Henning Kamp
2db4b023bb Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that
rather than assume that only DTYPE_VNODE is seekable.
2003-06-18 19:53:59 +00:00
Mike Silbersack
438f085b2f Reserve the last 5% of file descriptors for root use. This should allow
systems to fail more gracefully when a file descriptor exhaustion situation
occurs.

Original patch by:	David G. Andersen <dga@lcs.mit.edu>
PR:			45353
MFC after:		1 week
2003-06-18 18:57:58 +00:00
Poul-Henning Kamp
7c2d2efd58 Initialize struct fileops with C99 sparse initialization. 2003-06-18 18:16:40 +00:00
Jeff Roberson
d07ac847ef - Use a more robust mechanism for determining whether or not a kse is on a
kseq.
2003-06-17 19:49:18 +00:00
Scott Long
04d2f20f6b Drop the proc lock around SYSCTL_OUT in the no-threads case.
Submitted by:	truckman
2003-06-17 19:14:00 +00:00
Jeff Roberson
7cd0f83355 - Temporarily patch a problem where the interact score could be negative
because the run time exceeds the largest value a signed int can hold.
   The real solution involves calculating how far we are over the limit.
   To quickly solve this problem we loop removing 1/5th of the current value
   until it falls below the limit.  The common case requires no passes.
2003-06-17 10:21:34 +00:00
Jeff Roberson
4b60e3242e - Add a new function "sched_interact_update()" that scales back the sleep
and run time.
 - Scale the sleep and run time back via sched_interact_update() in more
   places.  This is to keep the statistic more accurate.
 - Charge a parent one tick for forking a child.
 - Add only the run time and not the sleep time to the parents kg when a
   thread exits.  This allows us to give a penalty for having an expensive
   thread exit but does not give a bonus for having an interactive thread
   exit.
 - Change the SLP_RUN_THROTTLE to limit us to 4/5th and not 1/2.
 - Change the SLP_RUN_MAX to two seconds.  This keeps bursty interactive
   applications like mozilla and openoffice in the interactive range even
   through expensive tasks.
 - Recalculate the slice after every sleep.  This ensures that once a task
   has been marked interactive it only has a slice of 1 at the risk of
   giving tasks that sleep for a very brief period a longer time slice.
2003-06-17 06:39:51 +00:00
Mike Silbersack
51710a4597 Hide the m_defrag* statistics under MBUF_STRESS_TEST, there seems
to be no need to see them in the general case (and they aren't
smp-safe anyway.)

Suggested by:	hmp
MFC after:	1 week
2003-06-17 02:34:40 +00:00
David Xu
4184d79115 Forgot to commit code to disable creating a bound thread in same
group again except first kse_create syscall.

Noticed by: julian
2003-06-16 23:46:41 +00:00
David Xu
075102cc4e Reset ncpus to 1 for bound thread group since there is only one
thread in such group.
Change message text from kse_rel to kserel, it is better displayed
in top.
2003-06-16 13:14:52 +00:00
Poul-Henning Kamp
e725c18c3a Get rid of the b_spc specialty field in struct buf by using an already
available caller private field.
2003-06-16 07:18:39 +00:00
Poul-Henning Kamp
2a0f8aeb52 I have not had any reports of trouble for a long time, so remove the
gentle versions of the vop_strategy()/vop_specstrategy() mismatch methods
and use vop_panic() instead.
2003-06-15 19:49:14 +00:00
Robert Watson
2bceb0f2b2 Various cr*() calls believed to be MPSAFE, since the uidinfo
code is locked down.
2003-06-15 15:57:42 +00:00
David Xu
cd4f6ebb13 1. Add code to support bound thread. when blocked, a bound thread never
schedules an upcall. Signal delivering to a bound thread is same as
   non-threaded process. This is intended to be used by libpthread to
   implement PTHREAD_SCOPE_SYSTEM thread.
2. Simplify kse_release() a bit, remove sleep loop.
2003-06-15 12:51:26 +00:00
Ian Dowse
4f1b457770 Don't overwrite the static panicstr buffer for secondary and further
panics. Before revision 1.38, we used to just point panicstr at the
format string if panicstr was NULL, but since we now use a static
buffer for the formatted panic message, we have to be careful to
only write to it during the first panic.

Pointed out by:	bde
2003-06-15 11:43:00 +00:00
Jeff Roberson
3c12473229 - Increase the ksegrp's cpu time history buffer to 250ms.
- Decrease the history buffer divisor to 2 so that we remember more of the
   old behavior.
2003-06-15 04:14:25 +00:00
David Xu
1d5a24bec6 1. Migrate TDF_UPCALLING from td_flags to td_pflags.
2. Add a flag TDF_SA, it will be used to distinguish SA
   based thread from bound thread.
2003-06-15 03:18:58 +00:00
Jeff Roberson
b41f3d22cc - Cap the growth of sleep and run time in sched_exit_kse(). 2003-06-15 02:52:29 +00:00
Jeff Roberson
210491d3d9 - Fix the maximum slice value. I accidentally checked in a value of '2'
which meant no process would run for longer than 20ms.
 - Slightly redo the interactivity scorer.  It follows the same algorithm but
   in a slightly more correct way.  Previously values above half were
   incorrect.
 - Lower the interactivity threshold to 20.  It seems that in testing non-
   interactive tasks are hardly ever near there and expensive interactive
   tasks can sometimes surpass it.  This area needs more testing.
 - Remove an unnecessary KTR.
 - Fix a case where an idle thread that had an elevated priority due to
   priority prop. would be placed back on the idle queue.
 - Delay setting NEEDRESCHED until userret() for threads that haad their
   priority elevated while in kernel.  This gives us the same context switch
   optimization as SCHED_4BSD.
 - Limit the child's slice to 1 in sched_fork_kse() so we detect its behavior
   more quickly.
 - Inhert some of the run/slp time from the child in sched_exit_ksegrp().
 - Redo some of the priority comparisons so they are more clear.
 - Throttle the frequency of sched_pctcpu_update() so that rounding errors
   do not make it invalid.
2003-06-15 02:18:29 +00:00
David Xu
0e2a4d3aeb Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
Alan Cox
49a2507bd1 Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM.  At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES.  The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES.  To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new.  In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().
2003-06-14 23:23:55 +00:00
Alan Cox
89f4fca265 Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm.  They were
all identical.
2003-06-14 06:20:25 +00:00
Maxime Henrion
fca737117a Style(9). 2003-06-13 19:39:21 +00:00
Dag-Erling Smørgrav
c2935410f6 Make the VFS cache use zones instead of malloc(9). This results in a
small but noticeable increase in performance for name lookup operations.

The code uses two zones, one for short names (less than 32 characters)
and one for long names (up to NAME_MAX).  Since most file names are
fairly short, this saves a considerable amount of space that would
otherwise be wasted if we always allocated NAME_MAX bytes.  The cutoff
value of 32 characters was picked arbitrarily and may benefit from some
tweaking; it could also be made into a tunable.

Submitted by:	hmp
2003-06-13 08:46:13 +00:00
Alan Cox
8630c1173e Add vm object locking to various pagers' "get pages" methods, i386 stack
management functions, and a u area management function.
2003-06-13 03:02:28 +00:00
Poul-Henning Kamp
7652131bee Initialize struct vfsops C99-sparsely.
Submitted by:   hmp
Reviewed by:	phk
2003-06-12 20:48:38 +00:00
Dag-Erling Smørgrav
633f506489 Document some sysctl variables.
Submitted by:	hmp
2003-06-12 19:46:51 +00:00
Scott Long
30c6f34e00 Add support to sysctl_kern_proc to return all threads in a proc, not just the
first one.  The old behaviour can be switched by specifying KERN_PROC_PROC.

Submitted by: julian, tweaks and added functionality by myself
2003-06-12 16:41:50 +00:00