Commit Graph

7057 Commits

Author SHA1 Message Date
jhb
3e89d3fc1b Remove a bogus assertion.
Noticed by:	bde
Pointy hat to:	jhb
2004-02-03 15:14:27 +00:00
deischen
80e9629f98 Regen after adding ksem_timedwait(). 2004-02-03 05:11:31 +00:00
deischen
057a2bca74 Add ksem_timedwait() to complement ksem_wait().
Glanced at by:	alfred
2004-02-03 05:08:32 +00:00
rwatson
37cfec7fef Don't dec/inc the amountpipes counter every time we resize a pipe --
instead, just dec/inc in the ctor/dtor.  For now, increment/decrement
in two's, since we're now performing the operation once per pair,
not once per pipe.  Not really any measurable performance change
in my micro-benchmarks, but doing less work is good, especially when
it comes to atomic operations.

Suggested by:	alc
2004-02-03 04:55:24 +00:00
rwatson
952cd3ca81 Catch instances of (pipe == NULL) that were obsoleted with recent
changes to jointly allocated pipe pairs.  Replace these checks
with pipe_present checks.  This avoids a NULL pointer dereference
when a pipe is half-closed.

Submitted by:	Peter Edwards <peter.edwards@openet-telecom.com>
2004-02-03 02:50:51 +00:00
jhb
47cec231e3 - Assert that witness_cold is not true in enroll().
- Only check witness_watch once in enroll().

Reported by:	ru (2)
2004-02-02 22:15:17 +00:00
pjd
bc4eae0936 Fix many issues related to mount/unmount:
1. Root from inside a jail was able to unmount any file system
   (except /).
2. Unprivileged root was able to unmount file systems mounted by
   privileged root (execpt /).
3. User from inside a jail was able to mount file system when
   sysctl vfs.usermount was set to 1.
4. User was able to mount file system when vfs.usermount was set to 1
   (that's ok) and unmount it even if vfs.usermount was equal to 0
   (that's not correct).

Possibility from point 1 was reported by: Dariusz Kowalski <darek@76.pl>

Only a part of this fix will be MFC'ed (if approved).

PR:		kern/60149
Reviewed by:	rwatson
Approved by:	scottl (mentor)
MFC after:	3 days
2004-02-02 19:02:05 +00:00
silby
53962ffd37 Remove debugging code that slipped into the previous commit.
Spotted by:	bde
2004-02-02 09:09:59 +00:00
jeff
d73d499ddd - style fixes to the critical_exit() KASSERT().
Submitted by:	bde
2004-02-02 08:13:27 +00:00
jeff
a1efb0d173 - Allow interactive tasks to use the maximum time-slice. This is not as
detrimental as I thought it would be in the case of massive process
   storms from a shell and it makes regular desktop usage noticeably
   better.
2004-02-01 10:38:13 +00:00
silby
a4c32edec5 Rewrite sendfile's header support so that headers are now sent in the first
packet along with data, instead of in their own packet.  When serving files
of size (packetsize - headersize) or smaller, this will result in one less
packet crossing the network.  Quick testing with thttpd and http_load has
shown a noticeable performance improvement in this case (350 vs 330 fetches
per second.)

Included in this commit are two support routines, iov_to_uio, and m_uiotombuf;
these routines are used by sendfile to construct the header mbuf chain that
will be linked to the rest of the data in the socket buffer.
2004-02-01 07:56:44 +00:00
jeff
371f8838d1 - Disable ithread binding in all cases for now. This doesn't make as much
sense with sched_4bsd as it does with sched_ule.
 - Use P_NOLOAD instead of the absence of td->td_ithd to determine whether or
   not a thread should be accounted for in sched_tdcnt.
2004-02-01 06:20:18 +00:00
rwatson
b8e797cfe0 Coalesce pipe allocations and frees. Previously, the pipe code
would allocate two 'struct pipe's from the pipe zone, and malloc a
mutex.

- Create a new "struct pipepair" object holding the two 'struct
  pipe' instances, struct mutex, and struct label reference.  Pipe
  structures now have a back-pointer to the pipe pair, and a
  'pipe_present' flag to indicate whether the half has been
  closed.

- Perform mutex init/destroy in zone init/destroy, avoiding
  reallocating the mutex for each pipe.  Perform most pipe structure
  setup in zone constructor.

- VM memory mappings for pageable buffers are still done outside of
  the UMA zone.

- Change MAC API to speak 'struct pipepair' instead of 'struct pipe',
  update many policies.  MAC labels are also handled outside of the
  UMA zone for now.  Label-only policy modules don't have to be
  recompiled, but if a module is recompiled, its pipe entry points
  will need to be updated.  If a module actually reached into the
  pipe structures (unlikely), that would also need to be modified.

These changes substantially simplify failure handling in the pipe
code as there are many fewer possible failure modes.

On half-close, pipes no longer free the 'struct pipe' for the closed
half until a full-close takes place.  However, VM mapped buffers
are still released on half-close.

Some code refactoring is now possible to clean up some of the back
references, etc; this patch attempts not to change the structure
of most of the pipe implementation, only allocation/free code
paths, so as to avoid introducing bugs (hopefully).

This cuts about 8%-9% off the cost of sequential pipe allocation
and free in system call tests on UP and SMP in my micro-benchmarks.
May or may not make a difference in macro-benchmarks, but doing
less work is good.

Reviewed by:	juli, tjr
Testing help:	dwhite, fenestro, scottl, et al
2004-02-01 05:56:51 +00:00
jeff
75249cf38e - Revert rev 1.240 we no longer need a kthread for loadav(). 2004-02-01 05:37:36 +00:00
jeff
adc4a3ea82 - Use sched_load() rather than grabbing the sx lock and traversing the proc
table to discover the load.
2004-02-01 02:51:33 +00:00
jeff
201544a2b6 - Add a new member to struct kseq called ksq_sysload. This is intended to
track the load for the sched_load() function.  In the SMP case this member
   is not defined because it would be redundant with the ksg_load member
   which already tracks the non ithd load.
 - For sched_load() in the UP case simply return ksq_sysload.  In the SMP
   case traverse the list of kseq groups and sum up their ksg_load fields.
2004-02-01 02:48:36 +00:00
jeff
c78b51b49e - Keep a variable 'sched_tdcnt' that is used for the local implementation
of sched_load().  This variable tracks the number of running and runnable
   non ithd threads.  This removes the need to traverse the proc table and
   discover how many threads are runnable.
2004-02-01 02:46:47 +00:00
rwatson
a76f13b9f3 Move KASSERT regarding td_critnest to after the value of td is set to
curthread, to avoid warning and incorrect behavior.

Hoped not to mind:	jeff
2004-02-01 02:31:36 +00:00
jeff
79b1e09199 - Assert that td_critnest > 0 in critical_exit() to catch cases of
unbalanced uses of the critical_* api.
2004-02-01 01:24:54 +00:00
rwatson
7a868d6481 Fix an error in a KASSERT string: it's pipe_free_kmem(), not
pipespace(), that contains this KASSERT.
2004-01-31 23:03:22 +00:00
phk
35592de77b Introduce the SO_BINTIME option which takes a high-resolution timestamp
at packet arrival.

For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL
since it has higher resolution and lower overhead.  Simultaneous
use of the two options is possible and they will return consistent
timestamps.

This introduces an extra test and a function call for SO_TIMEVAL, but I have
not been able to measure that.
2004-01-31 10:40:25 +00:00
rwatson
e55550188e Assert process lock in ptracestop(), since we're going to rely
on it, and later unlock it.
2004-01-29 00:58:21 +00:00
rwatson
614c424e99 Add a reset sysctl for mutex profiling: zeros all of the mutex
profiling buffers and hash table.  This makes it a lot easier to
do multiple profiling runs without rebooting or performing
gratuitous arithmetic.  Sysctl is named debug.mutex.prof.reset.

Reviewed by:	jake
2004-01-28 22:11:53 +00:00
jhb
114683f922 Move the loadav() callout into its own kthread since it uses allproc_lock
which is a sleepable lock and thus is not safe to acquire from a callout
routine.
2004-01-28 20:44:41 +00:00
jhb
7c38a96e26 Rework witness_lock() to make it slightly more useful and flexible.
- witness_lock() is split into two pieces: witness_checkorder() and
  witness_lock().  Witness_checkorder() determines if acquiring a specified
  lock at the time it is called would result in a lock order.  It
  optionally adds a new lock order relationship as well.  witness_lock()
  updates witness's data structures to assume that a lock has been acquired
  by stick a new lock instance in the appropriate lock instance list.
- The mutex and sx lock functions now call checkorder() prior to trying to
  acquire a lock and continue to call witness_lock() after the acquire is
  completed.  This will let witness catch a deadlock before it happens
  rather than trying to do so after the threads have deadlocked (i.e. never
  actually report it).
- A new function witness_defineorder() has been added that adds a lock
  order between two locks at runtime without having to acquire the locks.
  If the lock order cannot be added it will return an error.  This function
  is available to programmers via the WITNESS_DEFINEORDER() macro which
  accepts either two mutexes or two sx locks as its arguments.
- A few simple wrapper macros were added to allow developers to call
  witness_checkorder() anywhere as a way of enforcing locking assertions
  in code that might acquire a certain lock in some situations.  The
  macros are: witness_check_{mutex,shared_sx,exclusive_sx} and take an
  appropriate lock as the sole argument.
- The code to remove a lock instance from a lock list in witness_unlock()
  was unnested by using a goto to vastly improve the readability of this
  function.
2004-01-28 20:39:57 +00:00
jhb
45e377d906 Use mtx_assert() rather than using a home-rolled version. 2004-01-28 20:26:39 +00:00
kan
a62ca42084 Move the part of the comment which applies to osigsuspend where
it belongs. The current sigsuspend syscall does expect a pointer
to the mask as argument.

Submitted by:	Igor Sysoev <is at rambler-co dot ru>
2004-01-28 06:06:04 +00:00
des
8b1373b33e Rename the kern.vm.kmem.size tunable to the more logical vm.kmem_size. To
assure backward compatibility (conditional on !BURN_BRIDGES), look it up
by its old name first, and log a warning (but accept the setting) if it
was found.  If both the old and new name are defined, the new name takes
precedence.

Also export vm.kmem_size as a read-only sysctl variable; I find it hard to
tune a parameter when I don't know its default value, especially when that
default value is computed at boot time.
2004-01-27 15:59:38 +00:00
rwatson
cc302d7bf7 When aborting fork() due to a failure, if using MAC, make sure to clean
up the p_label field.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, McAfee Research
2004-01-25 18:42:18 +00:00
ru
0287790feb Register the uart(4)'s spin lock with witness(4). 2004-01-25 15:04:37 +00:00
jeff
f8b56ef20d - sched_strict has been dead for a long time now. Get rid of it. 2004-01-25 08:58:14 +00:00
jeff
d4f923e4ac - Clean up KASSERTS. 2004-01-25 08:57:38 +00:00
jeff
303af3be52 - Correct function names listed in KASSERTs. These were copied from other
code and it was sloppy of me not to adjust these sooner.
2004-01-25 08:21:46 +00:00
jeff
70c3261c3d - Implement cpu pinning and binding. This is acomplished by keeping a per-
cpu run queue that is only used for pinned or bound threads.

Submitted by:	Chris Bradfield <chrisb@ation.org>
2004-01-25 08:00:04 +00:00
jeff
2f38f1df6a - Use a unique string for the sched_setup SYSINIT and rename sched_setup to
synch_setup.  The schedulers use the sched_setup function name.
2004-01-25 07:49:45 +00:00
jeff
c85cdc3d0f - Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL.  Assert that one of these is set in mi_switch() and propery
   adjust the rusage statistics.  This is to simplify the large number of
   users of this interface which were previously all required to adjust the
   proper counter prior to calling mi_switch().  This also facilitates more
   switch and locking optimizations.
 - Change all callers of mi_switch() to pass the appropriate paramter and
   remove direct references to the process statistics.
2004-01-25 03:54:52 +00:00
rwatson
ca8a19313e Add some basic support for measuring sleep mutex contention to the
mutex profiling code.  As with existing mutex profiling, measurement
is done with respect to mtx_lock() instances in the code, as opposed
to specific mutexes.  In particular, measure two things:

(1) Lock contention.  How often did this mtx_lock() call get made and
    have to sleep (or almost sleep) waiting for the lock.  This helps
    identify the "victims" of contention.

(2) Hold contention.  How often, while the lock was held by a thread
    as a result of this mtx_lock(), did another thread try to acquire
    the same mutex.  This helps identify the causes of contention.

I'm currently exploring adding measurement of "time waited for the
lock", but the current implementation has proven useful to me so far
so I figured I'd commit it so others could try it out.  Note that this
increases the size of mutexes when MUTEX_PROFILING is enabled, so you
might find you need to further bump UMA_BOOT_PAGES.  Fixes welcome.

The once over:	des, others
2004-01-25 01:59:27 +00:00
phk
81a3e3bfb9 Deal with MOD_FREQUENCY before MOD_OFFSET because the latter is the
one which runs the actual update.  This fixes a bug where there were
a delay in applying the frequency adjustment.  In extreme cases this
could result in marginal stability of the kernel-pll.
2004-01-24 21:48:43 +00:00
jeff
8d604e6b08 - Move smp_topology to subr_smp.c so that it is defined on all architectures. 2004-01-24 19:52:48 +00:00
rwatson
597deaa7b8 Don't grab Giant in crfree(), since prison_free() no longer requires it.
The uidinfo code appears to be MPSAFE, and is referenced without Giant
elsewhere.  While this grab of Giant was only made in fairly rare
circumstances (actually GC'ing on refcount==0), grabbing Giant here
potentially introduces lock order issues with any locks held by the
caller.  So this probably won't help performance much unless you change
credentials a lot in an application, and leave a lot of file descriptors
and cached credentials around.  However, it simplifies locking down
consumers of the credential interfaces.

Bumped into by:	sam
Appeased:	tjr
2004-01-23 21:07:52 +00:00
rwatson
718d82cd61 Defer the vrele() on a jail's root vnode reference from prison_free()
to a new prison_complete() task run by a task queue.  This removes
a requirement for grabbing Giant in crfree().  Embed the 'struct task'
in 'struct prison' so that we don't have to allocate memory from
prison_free() (which means we also defer the FREE()).

With this change, I believe grabbing Giant from crfree() can now be
removed, but need to check the uidinfo code paths.

To avoid header pollution, move the definition of 'struct task'
to _task.h, and recursively include from taskqueue.h and jail.h; much
preferably to all files including jail.h picking up a requirement to
include taskqueue.h.

Bumped into by:	sam
Reviewed by:	bde, tjr
2004-01-23 20:44:26 +00:00
phk
4e1a716219 Write 100 times for tomorrow:
"Always print time_t as %jd, you never know what width it has"
2004-01-22 19:50:06 +00:00
rse
2f1199c0a6 Fix generation of random multicast MAC address.
In case no real/physical IEEE 802 address is available, both the expired
"draft-leach-uuids-guids-01" (section "4. Node IDs when no IEEE 802
network card is available") and RFC 2518 (section "6.4.1 Node Field
Generation Without the IEEE 802 Address") recommend (quoted from RFC
2518):

  "The ideal solution is to obtain a 47 bit cryptographic quality random
  number, and use it as the low 47 bits of the node ID, with the _most_
  significant bit of the first octet of the node ID set to 1. This bit
  is the unicast/multicast bit, which will never be set in IEEE 802
  addresses obtained from network cards; hence, there can never be a
  conflict between UUIDs generated by machines with and without network
  cards."

Unfortunately, this incorrectly explains how to implement this and
the FreeBSD UUID generator code inherited this generation bug from
the broken reference code in the standards draft. They should instead
specify the "_least_ significant bit of the first octet of the node ID"
as the multicast bit in a memory and hexadecimal string representation
of a 48-bit IEEE 802 MAC address.

This standards bug arised from a false interpretation, as the multicast
bit is actually the _most_ significant bit in IEEE 802.3 (Ethernet)
_transmission order_ of an IEEE 802 MAC address. The standards authors
forgot that the bitwise order of an _octet_ from a MAC address _memory_
and hexadecimal string representation is still always from left (MSB,
bit 7) to right (LSB, bit 0).

Fortunately, this UUID generation bug could have occurred on systems
without any Ethernet NICs only.
2004-01-22 13:34:11 +00:00
phk
03ff7a46df Add a sysctl (default: off) which enables a log(LOG_INFO...) warning
if the clock is stepped.
2004-01-21 21:05:40 +00:00
rwatson
9929c2e385 Reduce gratuitous includes: don't include jail.h if it's not needed.
Presumably, at some point, you had to include jail.h if you included
proc.h, but that is no longer required.

Result of:	self injury involving adding something to struct prison
2004-01-21 17:10:47 +00:00
ache
be3f79f636 pread/pwrite:
follow lseek spirit - return EINVAL on negative offset for non-VCHR
2004-01-20 01:27:42 +00:00
phk
6a1db39d2c Add linenumber and source filename to panic(9) output.
Ideally a traceback should be printed too, any takers ?
2004-01-19 21:27:11 +00:00
kan
a1caf18323 One more instance of magic number used in place of IO_SEQSHIFT.
Submitted by:	alc
2004-01-19 20:45:43 +00:00
ru
c0ac883f96 Since "m" is not part of the "mp" chain, need to free() it.
Reported by:	Stanford Metacompilation research group
2004-01-18 14:02:53 +00:00
gallatin
13a64c8523 Handle sf_buf_alloc() returning null. This can happen if the
process takes a signal while waiting for an sf_buf to become available.

Reviewed by: alc
2004-01-17 21:16:51 +00:00
des
3e90922798 Restore correct semantics for F_DUPFD fcntl. This should fix the errors
people have been getting with configure scripts.
2004-01-17 00:59:04 +00:00
des
e0a75877b5 WITNESS won't let us hold two filedesc locks at the same time, so juggle
fdp and newfdp around a bit.
2004-01-16 21:54:56 +00:00
rwatson
c294b6d48a KASSERT() that initproc->p_pid is 1. Very bad things happen if init's
pid isn't 1, and it can actually occur if kthread_create() is called
before SUB_SI_CREATE_INIT without RFHIGHPID.

Discussed with:	jhb
2004-01-16 20:29:23 +00:00
des
5af038f21a Remove two KASSERTs which were overly paranoid. 2004-01-16 08:45:56 +00:00
des
9cd3244809 Take care to drop locks when calling malloc() 2004-01-15 18:50:11 +00:00
des
f270054cd3 New file descriptor allocation code, derived from similar code introduced
in OpenBSD by Niels Provos.  The patch introduces a bitmap of allocated
file descriptors which is used to locate available descriptors when a new
one is needed.  It also moves the task of growing the file descriptor table
out of fdalloc(), reducing complexity in both fdalloc() and do_dup().

Debts of gratitude are owed to tjr@ (who provided the original patch on
which this work is based), grog@ (for the gdb(4) man page) and rwatson@
(for assistance with pxeboot(8)).
2004-01-15 10:15:04 +00:00
truckman
b028f9e268 If a device attach routine fails during boot and calls bus_teardown_intr(),
ithread_remove_handler() may fail to remove the interrupt handler if
it decides to let the ithread do the removal.  The problem is that during
boot "cold" is set, which causes msleep() to return immediately.  This
will cause ithread_remove_handler() to fail to wait for the ithread
to do the removal from the handler TAILQ before freeing the handler
back to the heap.  Bad things will happen when some other user of the
TAILQ, such as ithread_add_handler() or the actual ithread attempts to use
the freed handler.  Fix the problem by forcing ithread_remove_handler()
to do the actual removal itself if the "cold" flag is set.

Reviewed by:	jhb
2004-01-13 22:55:46 +00:00
des
b1b5dd1c63 Back out 1.160, which was committed by mistake. 2004-01-11 20:08:57 +00:00
des
4efce4e230 Back out 1.166, which was committed by mistake. 2004-01-11 20:07:15 +00:00
des
97917ae33d Mechanical whitespace cleanup + other minor style nits. 2004-01-11 19:56:42 +00:00
des
d94b79e11f Mechanical whitespace cleanup. 2004-01-11 19:54:45 +00:00
des
e77f1737d6 Mechanical whitespace cleanup; parenthesize return values; other minor
style nits.  The #ifdefs in this file give me a headache...
2004-01-11 19:52:10 +00:00
des
6ad55c20f7 Mechanical whitespace cleanup; parenthesize return values; other minor
style nits.
2004-01-11 19:48:19 +00:00
des
8de0243528 Mechanical whitespace cleanup + minor style nits. 2004-01-11 19:43:14 +00:00
des
69ce6641b6 Mechanical whitespace cleanup. 2004-01-11 19:39:14 +00:00
alc
5752042760 Remove long dead code, specifically, code related to munmapfd().
(See also vm/vm_mmap.c revision 1.173.)
2004-01-11 06:59:21 +00:00
rwatson
9190052ce9 When not creating a core dump due to resource limits specifying
a maximum dump size of 0, return a size-related error, rather
than returning success.  Otherwise, waitpid() will incorrectly
return a status indicating that a core dump was created.  Note
that the specific error doesn't actually matter, since it's lost.

MFC after:	2 weeks
PR:		60367
Submitted by:	Valentin Nechayev <netch@netch.kiev.ua>
2004-01-11 02:28:06 +00:00
schweikh
e566b473a5 s/Muliple/Multiple
Removed whitespace at EOL and EOF.
2004-01-10 18:34:01 +00:00
des
1d50ccc7f9 More unparenthesized return values. 2004-01-10 17:14:53 +00:00
des
65c30ff68d Style: parenthesize return values. 2004-01-10 13:03:43 +00:00
truckman
417693c813 Add a somewhat redundant check on the len arguement to getsockaddr() to
avoid relying on the minimum memory allocation size to avoid problems.
The check is somewhat redundant because the consumers of the returned
structure will check that sa_len is a protocol-specific larger size.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	nectar
MFC after:	30 days
2004-01-10 08:28:54 +00:00
cognet
ced43cb5e7 Prevent a race condition between fork1() and whatever changes the pgrp by
setting the new process' p_pgrp again before inserting it in the p_pglist.
Without it we can get the new process to be inserted in a different p_pglist
than the one p2->p_pgrp points to, and this is not something we want to happen.
This is not a fix, merely a bandaid, but it will work until someone finds a
better way to do it.

Discussed with: 	jhb (a long time ago)
2004-01-09 23:42:36 +00:00
rwatson
e73ac85912 Improve the expressiveness of ttyinfo (^T) when dealing with threads
in slightly less usual states:

  If the thread is on a run queue, display "running" if the thread is
  actually running, otherwise, "runnable".

  If the thread is sleeping, and it's on a sleep queue, display the
  name of the queue, otherwise "unknown" -- previously, in this situation
  we would display "iowait".

  If the thread is waiting on a lock, display *lockname.

  If the thread is suspended, display "suspended" -- previously, in
  this situation we would display "iowait".

  If the thread is waiting for an interrupt, display "intrwait" --
  previously, in this situation we would display "iowait".

  If the thread is in a state not handled by the above, display
  "unknown" -- previously, we would print "iowait".

Among other things, this avoids displaying "iowait" when the foreground
process turns out to be suspended waiting for a debugger to properly
attach.
2004-01-08 22:49:23 +00:00
rwatson
befa7a41a2 Drop the sigacts mutex around calls to stopevent() to avoid sleeping
holding the mutex.  Because the sigacts pointer can't change while
the process is "live" (proc locking (x)), we know our pointer is still
valid.

In communication with:	truckman
Reviewed by:		jhb
2004-01-08 22:44:54 +00:00
kan
6ecda71663 Add pid to the info printed in lockmgr_printinfo. This makes VFS
diagnostic messages slightly more useful.
2004-01-06 04:34:13 +00:00
kan
443a35f2bf More style fixes.
Obtained from:	bde
2004-01-05 23:40:46 +00:00
jhb
d619fa8207 - Allow mtx_trylock() to recurse on a recursive mutex. Attempts to recurse
on a non-recursive mutex will fail but will not trigger any assertions.
- Add an assertion to mtx_lock() that one never recurses on a non-recursive
  mutex.  This is mostly useful for the non-WITNESS case.

Requested by:	deischen, julian, others (1)
2004-01-05 23:09:51 +00:00
kan
2ffbc2464e style(9):
Add empty line before first code line in functions with no local
variables.
Properly terminate comment sentences.
Indent lines which are longer that 80 characters.
Move v_addpollinfo closer to the rest of poll-related functions.
Move DEBUG_VFS_LOCKS ifdefed block to the end of file.

Obtained from:	bde (partly)
2004-01-05 19:04:29 +00:00
kan
2872921967 Cosmetics: strip '\n' from a string passed to Debugger(). 2004-01-04 03:42:20 +00:00
davidxu
f39653dda8 Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr
2004-01-03 02:02:26 +00:00
njl
8fe7e8ee2d Move the kernel power change printf under bootverbose since the
power_profile script now duplicates the message via syslog.
2004-01-02 18:24:13 +00:00
sam
ca1ce6605d m_tag fixups in preparation for heavier use:
o promote several m_tag_* routines to inline
o add an m_tag_setup inline to set the fixed fields in a packet tag
o add an m_tag_free method pointer to each mtag to support, for example,
  allocating tags from zones
o have m_tag_find check if the tag list is not empty before calling
  m_tag_locate to search

Reviewed by:	brooks, silence from others
2004-01-02 17:27:39 +00:00
dwmalone
1e69eeaeee Plug a leak of open files that happens when you exec a suid program
with one of std{in,out,err} open. This helps with the file descriptor
leaks reported on -current. This should probably be merged into 5.2.

Reviewed by:	ru
Tested by:	Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net>
2003-12-28 19:27:14 +00:00
bde
7d91626477 v_vxproc was a bogus name for a thread (pointer). 2003-12-28 09:12:56 +00:00
silby
a7d8091ae5 Track three new sendfile-related statistics:
- The number of times sendfile had to do disk I/O
- The number of times sfbuf allocation failed
- The number of times sfbuf allocation had to wait
2003-12-28 08:57:09 +00:00
bde
ca9bf1d586 Fixed some style bugs (mainly, try to always use explicit comparisons with
NULL when checking for null pointers).
2003-12-28 04:37:59 +00:00
bde
9ff26a0443 Fixed some disordering in revs.1.194 and 1,196. Moved the exceve() syscall
function back to near the beginning of the file.  Rev.1.194 moved it into
the middle of auxiliary functions following kern_execve().  Moved the
__mac_execve() syscall function up together with execve().  It was new in
rev1.1.196 and perfectly misplaced after execve().
2003-12-28 04:18:13 +00:00
silby
f71dce4e63 Fix the maxpipekva warning message so that it points to the correct
sysctl, and shorten the message.

Noticed by:	bde
2003-12-28 01:19:58 +00:00
alc
7f5f3bd3db Remove GIANT_REQUIRED from exec_unmap_first_page(). 2003-12-27 19:40:03 +00:00
silby
5c5418dd6e Track current and peak sfbuf usage, export the values via sysctl. 2003-12-27 07:52:47 +00:00
jhb
17274fb2ed Create a separate kthread that executes sched_cpu() once a second. Because
sched_cpu() locks an sx lock (allproc_lock) which can sleep if it fails to
acquire the lock, it is not safe to execute this in a callout handler from
softclock().
2003-12-26 17:07:29 +00:00
alfred
984d4cf4c5 Put restrict back in, the compilation failure was my fault when I
did a bad merge from the PR.

Thanks to Bruce Evans for explaining.
2003-12-26 05:58:16 +00:00
alfred
278c6c3367 Add __restrict qualifiers to copyinfrom, copyinstrfrom, copystr, copyinstr,
copyin and copyout.
2003-12-26 05:54:35 +00:00
dwmalone
92fec55360 In socket(2) we only need Giant around the call to socreate, so just
grab it there.
2003-12-25 23:44:38 +00:00
dwmalone
ebdd5b9754 Don't TAILQ_INIT kq_head twice, once is enough. 2003-12-25 23:42:36 +00:00
silby
0884486ba2 Fix another 0 / NULL mixup. 2003-12-25 01:17:27 +00:00
alfred
b44ae63caf We're not ready for restrict qualifiers here. 2003-12-24 19:09:45 +00:00
alfred
c4b798a73d Add restrict qualifiers.
PR: 44394
Submitted by: Craig Rodrigues <rodrige@attbi.com>
2003-12-24 18:47:43 +00:00
rwatson
fc37f21a15 Document that when we are addressing an open()/close() race, the reason
we call vn_close() manually rather than letting fdrop() take care of it
is that we haven't yet hooked up the various 'struct file' fields.
2003-12-24 17:13:01 +00:00
alfred
a121c1dfef Introduce mp_maxcpus which can be used by libkvm utils to find out
how many CPUs the system was compiled for.
Export the variable via a sysctl node 'kern.smp.maxcpus' as well.
2003-12-23 13:54:16 +00:00
peter
b58a2a1deb Regen - this should be essentially a NOP, except for rcsid changes. 2003-12-23 03:52:14 +00:00
peter
06d2b26b72 Remove namespc column and attempt to un-fold some of the longer lines
that now fit.
2003-12-23 03:51:36 +00:00
peter
e3a23c9582 Remove the namespace column from the syscalls tables. We don't actually
use it, if we ever did.  They have been been VERY poorly maintained for
some time, possibly because they were a NOP.  FWIW, This brings our table
formats back closer to the other *BSD's.
2003-12-23 03:50:43 +00:00
peter
998b79089f Add an additional field to the elf brandinfo structure to support
quicker exec-time replacement of the elf interpreter on an emulation
environment where an entire /compat/* tree isn't really warranted.
2003-12-23 02:42:39 +00:00
peter
7cc77e03ae Catch a few places where NULL (pointer) was used where 0 (integer) was
expected.
2003-12-23 02:36:43 +00:00
peter
0d11c5c3c4 Don't use NULL (pointer) when we mean 0 (integer) for the number of ticks
in msleep.
2003-12-23 02:28:42 +00:00
jeff
eac1e55acc - Make our transfer decisions based on load and not transferable load. A
cpu could have been bogged down with non-transferable load and still not
   migrated a new thread to an idle cpu.  This required some benchmarking and
   tuning to get right as the comment above it suggests.
2003-12-20 22:35:20 +00:00
jeff
d4f3760df1 - Enable ithread migration on x86. This is done to work around a bug in the
IO APIC on Xeons that prevents round-robin interrupt assignment from
   working.
2003-12-20 20:36:19 +00:00
alc
a7fef684f6 Remove a variable that has been initialized but otherwise unused since
revision 1.315.
2003-12-20 19:46:21 +00:00
jeff
40c79491e2 - In kseq_transfer() return if smp has not been started.
- In sched_add(), do the idle check prior to the transfer check so that we
   don't try to transfer load from an idle cpu.  This fixes panics caused by
   IPIs on UP machines running SMP kernels.

Reported/Debugged by:	seanc
2003-12-20 14:03:14 +00:00
jeff
04d161f363 - Running interactive tasks with the minimum time-slice is fine for vi and
sh, but not so great for mozilla, X, etc.  Add a fixed define for the slice
   size granted to interactive KSEs.
2003-12-20 12:54:35 +00:00
tjr
50207b49b7 Reduce the overhead of semop() by using the kernel stack instead of
malloc'd memory to store the operations array if it is small enough
to fit.
2003-12-19 13:07:17 +00:00
jhb
ecc9efa2f1 Various style fixes.
Submitted by:	bde (mostly, if not all)
2003-12-17 21:13:04 +00:00
jeff
5443bd4c65 - In vget() if LK_NOWAIT is specified we should return EBUSY and not ENOENT.
Submitted by:	Stephan Uphoff <ups@stups.com>
2003-12-16 17:08:27 +00:00
jeff
aa712bc6e4 - When doing a forced unmount, VFS attempts to keep VCHR vnodes valid by
reassigning their v_ops field to specfs, detaching from the mountpoint, etc.
   However, this is not sufficient.  If we vclean() the vnode the pages owned
   by the vnode are lost, potentially while buffers reference them.  Implement
   parts of vclean() seperately in vgonechrl() so that the pages and bufs
   associated with a device vnode are not destroyed while in use.
2003-12-16 17:05:05 +00:00
bms
1b8b89ab32 style(9) pass and type fixups.
Submitted by:	bde
2003-12-16 14:13:47 +00:00
bms
3eb53d90ef Push m_apply() and m_getptr() up into the colleciton of standard mbuf
routines, and purge them from opencrypto.

Reviewed by:	sam
Obtained from:	NetBSD
Sponsored by:	spc.org
2003-12-15 21:49:41 +00:00
jeff
fe983d4260 - Assign the ke_cpu field in kseq_notify() so that all of our callers do not
have to do it.
 - Set the ke_runq to NULL in sched_add() before calling kseq_notify().
   Otherwise we may panic in sched_add() if INVARIANTS is on.
2003-12-14 02:06:29 +00:00
rwatson
012b8f6c02 Although sometimes to the uninitiated, it may seem like goup, KSEGOUP
is actually spelt KSEGROUP.  Go figure.

Reported by:	samy@kerneled.com
2003-12-12 21:25:56 +00:00
jeff
80e1439e63 - Now that we have kseq groups, balance them seperately.
- The new sched_balance_groups() function does intra-group balancing while
   sched_balance() balances the available groups.
 - Pick a random time between 0 ticks and hz * 2 ticks to restart each
   balancing process.  Each balancer has its own timeout.
 - Pick a random place in the list of groups to start the search for lowest
   and highest group loads.  This prevents us from prefering a group based on
   numeric position.
 - Use a nasty hack to stop us from preferring cpu 0.  The problem is that
   softclock always runs on cpu 0, so it always has a little extra load.  We
   ignore this load in the balancer for now.  In the future softclock should
   run on a random cpu and these hacks can go away.
2003-12-12 07:33:51 +00:00
jeff
6edc4a1eb1 - Don't let the pctcpu rate limiter throttle us if we have recorded over
SCHED_CPU_TICKS ticks.  This was allowing processes to display
   (1/SCHED_CPU_TIME * 100) % more cpu than they had used.
2003-12-11 04:23:39 +00:00
jeff
da98a74234 - In sched_switch(), if a thread has been assigned, don't touch the runqueues
or load.  These things have already been taken care of in sched_bind()
   which should be the only place that we're switching in an assigned thread.
2003-12-11 04:00:49 +00:00
jeff
7c857e9275 - Add support for CPU groups to ule. All SMT cores on the same physical
cpu are added to a group.
 - Don't place a cpu into the kseq_idle bitmask until all cpus in that group
   have idled.
 - Prefer idle groups over idle group members in the new kseq_transfer()
   function.  In this way we will prefer to balance load across full cores
   rather than add further load a partial core.
 - Before a cpu goes idle, check the other group members for threads.  Since
   SMT cpus may freely share threads, this is cheap.
 - SMT cores may be individually pinned and bound to now.  This contrasts the
   old mechanism where binding or pinning would have allowed a thread to run
   on any available cpu.
 - Remove some unnecessary logic from sched_switch().  Priority propagation
   should be properly taken care of in sched_prio() now.
2003-12-11 03:57:10 +00:00
peter
e24b9cafc1 Regen 2003-12-10 22:18:54 +00:00
peter
4c2b7999cf Update file locations for syscall tables to copy to. 2003-12-10 22:08:37 +00:00
marcel
b6631c500b Write the thread pointer (val) in the kse mailbox (loc) before we
set the new context in kse_switchin(2). This allows us to return
an error to the calling context when the suword() fails.
2003-12-10 01:59:23 +00:00
jhb
d8b6cc614a Adjust an assertion for the TDF_TSNOBLOCK race handling in
turnstile_unpend().  A racing thread that does not have TDI_LOCK set may
either be running on another CPU or it may be sitting on a run queue if it
was preempted during the very small window in turnstile_wait() between
unlocking the turnstile chain lock and locking sched_lock.
2003-12-09 21:14:31 +00:00
jhb
f110a9ab64 Assert that the we never give a thread a NULL turnstile when waking it up. 2003-12-09 21:09:54 +00:00
jhb
66cc89fadf Revert the previous race fix and replace it with a more general fix. The
case of a turnstile having no threads is just one instance of the more
general case where the thread we are examining has been partially awakened
already in that it has been removed from the turnstile's blocked list but
still has TDI_LOCK set.  We detect that case by checking to see if the
thread has already had a turnstile reassigned to it.
2003-12-09 21:09:04 +00:00
davidxu
69ce33ca6e Lock and unlock sched_lock when walking through thread list, current we
insert kse upcall thread into thread list at mi_switch time, process lock
is not enough.
2003-12-07 23:47:15 +00:00
truckman
e9a439edcd Pass MTX_DEF as the last argument to mtx_init() instead of 0. This
is not a functional change.  The code happened to work properly only
because MTX_DEF is defined as 0.
2003-12-07 21:53:41 +00:00
phk
239117c33f Make the DIAGNOSTIC code which complains about long {call|time}out(9)
functions less noisy:  We printf if a new function took longer than
the previous record holder, or of the previous record holder took
more than twice as long as the current record.
2003-12-07 20:03:28 +00:00
marcel
f3326a4c71 Regen due to kse_switchin(2). 2003-12-07 19:36:16 +00:00
marcel
2ba380839b Add kse_switchin(2). This syscall can be used by KSE implementations
to have the kernel switch to a new thread, instead of doing it in
userland. It is in fact needed on ia64 where syscall restarts do not
return to userland first. It's completely handled inside the kernel.
As such, any context created by the kernel as part of an upcall and
caused by some syscall needs to be restored by the kernel.
2003-12-07 19:34:29 +00:00
peter
e28e232dff rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64).
Be sure to shift (long)1 << 33 and higher, not (int)1.  Otherwise bad
things happen(TM).  This is why beast.freebsd.org paniced with ULE.

Reviewed by:  jeff
2003-12-07 09:57:51 +00:00
scottl
2b68e67c6d Re-arrange and consolidate some random debugging stuff 2003-12-07 05:04:49 +00:00
alc
4408614be4 - Giant is no longer required by vm_thread_new(). 2003-12-07 04:16:49 +00:00
rwatson
08335c63bf Rename mac_create_cred() MAC Framework entry point to mac_copy_cred(),
and the mpo_create_cred() MAC policy entry point to
mpo_copy_cred_label().  This is more consistent with similar entry
points for creation and label copying, as mac_create_cred() was
called from crdup() as opposed to during process creation.  For
a number of policies, this removes the requirement for special
handling when copying credential labels, and improves consistency.

Approved by:	re (scottl)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-12-06 21:48:03 +00:00
jhb
4b61439e79 Fix all users of mp_maxid to use the same semantics, namely:
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by:	re (scottl)
Tested on:	i386, amd64, alpha
2003-12-03 14:57:26 +00:00
jhb
907202ec1f Export a few SMP related symbols in UP kernels as well. This is needed to
aid other kernel code, especially code which can be in a module such as
the acpi_cpu(4) driver, to work properly with both SMP and UP kernels.
The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and
the smp_rendezvous() function.  This also means that CPU_ABSENT() is now
always implemented the same on all kernels.

Approved by:	re (scottl)
2003-12-03 14:55:31 +00:00
dg
da88330aaa Fixed a bug in sendfile(2) where the sent data would be corrupted due
to sendfile(2) being erroneously automatically restarted after a signal
is delivered. Fixed by converting ERESTART to EINTR prior to exiting.

Updated manual page to indicate the potential EINTR error, its cause
and consequences.

Approved by: re@freebsd.org
2003-12-01 22:12:50 +00:00
iedowse
bc2791c3fa In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in the
forced unmount case. Otherwise, a file system that is referenced
only by process fd_cdir/fd_rdir references to the file system root
vnode will be successfully unmounted without the MNT_FORCE flag.

The previous behaviour was not compatible with the unmount semantics
required by amd(8), so file systems could be unexpectedly unmounted
while there were still references to the file system root directory.

Reported by:	Erez Zadok <ezk@cs.sunysb.edu>
Approved by:	re (scottl)
2003-11-30 23:30:09 +00:00
jeff
e35dcab926 - Don't forget to unlock the vnode interlock in the LK_NOWAIT case.
Submitted by:	Stephan Uphoff <ups@stups.com>
Approved by:	re (rwatson)
2003-11-30 22:09:58 +00:00
kan
c31eef63dc Do not attempt to destroy NULL vfs options list.
Approved by: re (scottl)
Reported by: Christian Laursen <xi atborderworlds dot dk>
2003-11-23 17:13:48 +00:00
jhb
bbe7d290ea - Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
  cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
  actually present and sets mp_ncpus and all_cpus.  Splitting these up
  allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
  setting mp_maxid to MAXCPU in cpu_mp_setmaxid().  This could allow the
  CPU probing code to live in a module, for example, since modules
  sysinit's in modules cannot be invoked prior to SI_SUB_KLD.  This is
  needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
  its contents in a few places.  Also, add a smp_cpu_enabled() function
  to avoid duplicating some code.  There is room for further code
  reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
  to before this change.  i386 now sets mp_maxid to MAXCPU.

Tested on:	alpha, amd64, i386, ia64, sparc64
Approved by:	re (scottl)
2003-11-21 22:23:26 +00:00
markm
6a2f4748c4 Fix a major faux pas of mine. I was causing 2 very bad things to
happen in interrupt context; 1) sleep locks, and 2) malloc/free
calls.

1) is fixed by using spin locks instead.

2) is fixed by preallocating a FIFO (implemented with a STAILQ)
   and using elements from this FIFO instead. This turns out
   to be rather fast.

OK'ed by:	re (scottl)
Thanks to:	peter, jhb, rwatson, jake
Apologies to:	*
2003-11-20 15:35:48 +00:00
markm
832b97971f Hackfix to patch around a kernel panic I introduced. Real fix to
follow. In the meanwhile, we are not harvesting interrupt entropy.

Approved by:	re (jhb)
2003-11-18 14:35:43 +00:00
rwatson
9c969b771a Introduce a MAC label reference in 'struct inpcb', which caches
the   MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols.  This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks.  Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by:	sam, bms
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
rwatson
cc012e0835 Add a sysctl, security.bsd.see_other_gids, similar in semantics
to see_other_uids but with the logical conversion.  This is based
on (but not identical to) the patch submitted by Samy Al Bahra.

Submitted by:	Samy Al Bahra <samy@kerneled.com>
2003-11-17 20:20:53 +00:00
peter
9dedda25aa Initial landing of SMP support for FreeBSD/amd64.
- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
  (in particular, bootstrap)
- This is still a WIP.  It seems that there are some highly bogus bioses
  on nVidia nForce3-150 boards.  I can't stress how broken these boards
  are.  I have a workaround in mind, but right now the Asus SK8N is broken.
  The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE.  SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
  atpic' in addition, because they somehow managed to forget to connect the
  8254 timer to the apic, even though its in the same silicon!  ARGH!
  This directly violates the ACPI spec.
2003-11-17 08:58:16 +00:00