Commit Graph

7366 Commits

Author SHA1 Message Date
phk
e0c89dae13 There is no need to explicitly call the stop function. In all likelyhood
->l_close() did it and ttyclose certainly will.
2004-06-01 11:57:15 +00:00
rwatson
5a32935851 Add a global mutex, accept_filter_mtx, to protect the global list of
accept filters and prevent read-modify-write races.
2004-06-01 04:08:48 +00:00
rwatson
bddadcf71a The SS_COMP and SS_INCOMP flags in the so_state field indicate whether
the socket is on an accept queue of a listen socket.  This change
renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new
state field on the socket, so_qstate, as the locking for these flags
is substantially different for the locking on the remainder of the
flags in so_state.
2004-06-01 02:42:56 +00:00
truckman
d503c79cad Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call.  The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation.  The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.
2004-06-01 01:18:51 +00:00
bmilekic
f7574a2276 Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
  - Better layering between slab <-> zone caches; introduce
    Keg structure which splits off slab cache away from the
    zone structure and allows multiple zones to be stacked
    on top of a single Keg (single type of slab cache);
    perhaps we should look into defining a subset API on
    top of the Keg for special use by malloc(9),
    for example.
  - UMA_ZONE_REFCNT zones can now be added, and reference
    counters automagically allocated for them within the end
    of the associated slab structures.  uma_find_refcnt()
    does a kextract to fetch the slab struct reference from
    the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
  - integrates mbuf & cluster allocations with extended UMA
    and provides caches for commonly-allocated items; defines
    several zones (two primary, one secondary) and two kegs.
  - change up certain code paths that always used to do:
    m_get() + m_clget() to instead just use m_getcl() and
    try to take advantage of the newly defined secondary
    Packet zone.
  - netstat(1) and systat(1) quickly hacked up to do basic
    stat reporting but additional stats work needs to be
    done once some other details within UMA have been taken
    care of and it becomes clearer to how stats will work
    within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used.  The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
   - One report of 'ips' (ServeRAID) driver acting really
     slow in conjunction with mbuma.  Need more data.
     Latest report is that ips is equally sucking with
     and without mbuma.
   - Giant leak in NFS code sometimes occurs, can't
     reproduce but currently analyzing; brueffer is
     able to reproduce but THIS IS NOT an mbuma-specific
     problem and currently occurs even WITHOUT mbuma.
   - Issues in network locking: there is at least one
     code path in the rip code where one or more locks
     are acquired and we end up in m_prepend() with
     M_WAITOK, which causes WITNESS to whine from within
     UMA.  Current temporary solution: force all UMA
     allocations to be M_NOWAIT from within UMA for now
     to avoid deadlocks unless WITNESS is defined and we
     can determine with certainty that we're not holding
     any locks when we're M_WAITOK.
   - I've seen at least one weird socketbuffer empty-but-
     mbuf-still-attached panic.  I don't believe this
     to be related to mbuma but please keep your eyes
     open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
    rwatson,
    brueffer,
    Ketrien I. Saihr-Kesenchedra,
    ...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
rwatson
13656d723e Assert Giant in vn_start_write() and vn_finished_write(). 2004-05-31 20:56:10 +00:00
rwatson
afc098b3e1 Assert Giant in vrele(). 2004-05-31 19:06:01 +00:00
phk
30a7ac8468 Add missing #include <sys/module.h> 2004-05-30 20:34:58 +00:00
phk
d6f7d2bde6 Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>
2004-05-30 17:57:46 +00:00
tjr
2bc3263ac9 Enable MI bits for gcc -ftest-coverage -fprofile-arcs on amd64. 2004-05-29 01:18:14 +00:00
pjd
19d2b54248 Sysctl hw.bus.devctl_disable shouldn't be writtable from inside a jail.
Approved by:	imp
2004-05-26 16:36:32 +00:00
tmm
7b769ce88f Retire cpu_sched_exit(); it is not used any more. 2004-05-26 12:09:39 +00:00
des
7eb92d1257 As previously threatened, give each device its own sysctl context and
subtree (under the new dev top-level node).  This should greatly simplify
drivers which need per-device sysctl variables (such as ndis).
2004-05-25 12:06:26 +00:00
gad
d284b07886 Implement the new KERN_PROC_RGID option, and also implement the
KERN_PROC_SESSION option which had been previously defined but
never implemented.

PR:		bin/65803  (a very tiny piece of the PR)`
Submitted by:	Cyrille Lefevre
2004-05-22 23:11:44 +00:00
davidxu
e7578c3795 Clear KSE thread flags after KSE thread mode is ended. The side effect
of not clearing the flags for execv() syscall will result that a new
program runs in KSE thread mode without enabling it.

Submitted by: tjr
Modified by: davidxu
2004-05-21 14:50:23 +00:00
bde
9ec48f4cab Fixed some style bugs in tdsigwakeup(). 2004-05-21 10:02:24 +00:00
jhb
91621895aa In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if
a thread is on a sleep queue and should have it's sleep aborted.

Reported by:	Thierry Herbelot thierry at herbelot dot com
2004-05-20 20:17:28 +00:00
bde
b3597241df Fixed printf format errors which helped break GUPROF for arches with
64-bit function pointers.
2004-05-20 16:48:17 +00:00
bde
38ad669603 Initialize the history counter type field in struct gmonparam as
threatened in rev.1.10 of usr.sbin/kgmon/kgmon.c more than 2 years ago.
kgmon has been recovering from the missing initialization for too
long, but the fixup there is ifdefed for i386's and shouldn't be
needed for other arches.
2004-05-20 16:42:39 +00:00
bde
36151be4c9 Moved i386 asms to an i386 header. The asms are for calibration of
high resolution kernel profiling (options GUPROF.  "U" in GUPROF stands
for microseconds resolution, but the resolution is now smaller than 1
nanosecond on multi-GHz machines and the accuracy is heading towards
1 nanosecond too).  Arches that support GUPROF must now provide certain
macros for the calibration.  GUPROF is now only supported for i386's,
so the absence of the new macros for other arches doesn't break anything
that wasn't already broken.  amd64's have uncommitted support for
GUPROF, and sparc64's have support that seems to be complete except
here (there was an #error for non-i386 cases; now there are undefined
macros).

Changed the asms a little:
- declare them as __volatile.  They must not be moved, and exporting a
  label across asms is technically incorrect, so try harder to stop gcc
  moving them.
- don't put the non-clobbered register "bx" in the clobber list.  The
  clobber lists are still more conservative than necessary.
- drop the non-support for gcc-1.  It just gave a better error message,
  and this is not useful since compiling with gcc-1 would cause thousands
  of worse error messages.
- drop the support for aout.
2004-05-20 16:12:19 +00:00
pjd
7cbfe4913b Fix sysctl name: security.jail.getfsstate_getfsstatroot_only ->
security.jail.getfsstatroot_only.

Approved by:	rwatson
2004-05-20 05:28:44 +00:00
bde
eef4a49645 Include <sys/gmon.h> instead of <machine/profile.h> for the declaration
of kmupetext().  The declaration is misplaced in <machine/profile.h>
since it is not MD and not related to the lowest level of profiling.
It will be moved, but getting it via <sys/gmon.h> already works.
2004-05-19 14:36:38 +00:00
ps
b36520446e syncache broke rev 1.23 which was done to fix the "thundering herd"
problem in Apache.  Fix it.

Reviewed by:	peter
2004-05-19 00:22:10 +00:00
peter
235a3d8f64 If a symbol has section+offset definitions provided, always use instead
of doing a name lookup for global symbols.  This fixes the snd_pcm module.
2004-05-18 05:15:43 +00:00
peter
a6091b3b08 Remove leftover padding variables.
Convert some silent 'ignore programmer error' cases into panics
Remove 'align' field from section table (no longer needed)
2004-05-18 05:14:19 +00:00
peter
867065a3a4 Since we go to the trouble of compiling the kobj ops table for each class,
and cannot handle it going away, add an explicit reference to the kobj
class inside each linker class.  Without this, a class with no modules
loaded will sit with an idle refcount of 0.  Loading and unloading
a module with it causes a 0->1->0 transition which frees the ops table
and causes subsequent loads using that class to explode.  Normally, the
"kernel" module will remain forever loaded and prevent this happening, but
if you have more than one linker class active, only one owns the "kernel".

This finishes making modules work for kldload(8) on amd64.
2004-05-17 21:24:39 +00:00
peter
b7f2f12793 Clean up the code some more. Unify the text/data (progbits) and bss
(nobits) tables to simplify some code.  Try and shorten some of the very
wide lines.  Somewhere along the way, I think I fixed the memory
corruption that caused panics after going multiuser.
2004-05-17 21:20:23 +00:00
peter
961476df3c Oops, use the generic ELF_ST_BIND() macro instead of ELF64_ST_BIND.
Submitted by:  marks
2004-05-17 00:51:34 +00:00
peter
ea4215c521 Make a small revision to the api between the elf linker core and the
elf_reloc() backends for two reasons.  First, to support the possibility
of there being two elf linkers in the kernel (eg: amd64), and second, to
pass the relocbase explicitly (for relocating .o format kld files).
2004-05-16 20:00:28 +00:00
bde
802b835b3d Fixed some common printf format errors. Don't assume that "struct foo *"
is "void *" (it isn't) or that the default promotion of pid_t is int.
Instead, assume that casting "struct foo *" to "void *" and printing the
result with %p is useful, and that all pid_t's are representable as longs.

Fixed some minor style bugs (mainly spelling errors in comments).
2004-05-14 20:51:42 +00:00
jhb
4e9e9bbec8 Split sleepq_wakeup_thread() into two functions. sleepq_remove_thread()
removes a specific thread from a sleep queue.  sleepq_resume_thread()
resumes scheduling of a thread that has been previously removed from a
sleep queue.
- sleepq_catch_signals() just removes a thread from the queue it was just
  added to when a pending signal is found.
- sleepq_signal() and sleepq_broadcast() remove threads from a queue,
  drop the queue lock, and then resume all the previously removed threads.
  This doesn't completely fix the sched_lock <-> sleepq chain LOR, but it
  makes it a little better as we no longer call setrunnble() with a sleep
  queue lock held meaning if setrunnable() tries to wakeup the swapper we
  don't try to lock two sleep queue chains at the same time.
2004-05-13 20:00:43 +00:00
tjr
e167ef630d Eliminate a memory leak in kern_symlink() that could occur if
vn_start_write() failed.
2004-05-11 10:42:02 +00:00
julian
790c941069 Remove misplaced duplicate comment and slightly reformat the
version that was in the right place.
2004-05-09 22:29:14 +00:00
sam
c89a947571 set m_len to reflect mbuf contents on return from m_dup1; fixes an obscure
m_pullup case that contributed to breaking ipcomp in tunnel mode for kame

Submitted by:	itojun
Obtained from:	kame
2004-05-09 05:57:58 +00:00
julian
98d03d5d06 Fix rtprio() to do sensible things when called from threaded processes.
It's not quite correct from a posix Point Of view, but it is a lot better
than what was there before. This will be revisited later
when we decide what form our priority extensions will take. Posix doesn't
specify  how a system scope thread can change its priority so you need to
add non-standard extensions to be able to do it..
For now make this slightly non standard to allow it to be done.

Submitted by:	Dan Eischen originally, changed by myself.
2004-05-08 08:56:05 +00:00
alc
b0b9413238 Avoid pointless zeroing of the bogus page in vfs_bio_clrbuf().
Suggested by:	tegge@	(from October of last year)
2004-05-08 06:46:40 +00:00
rwatson
7128a3b5cc Unconditionally lock Giant in do_sendfile(), rather than locking it
conditional on debug.mpsafenet.  We can try pushing down Giant here
later, but we don't want to enter VFS without holding Giant.

Bumped into by:	kris
2004-05-08 02:24:21 +00:00
cognet
6897229d63 Compare t_brkc against (char)_POSIX_VDISABLE, not against -1.
Discussed with:	bde
2004-05-07 15:35:38 +00:00
njl
e9ec8dbd49 Move the CPU newbus attachment to i386 legacy. The acpi_cpu device will
become just "cpu" and provide attachments in the !legacy case.

Tested by:	des
2004-05-06 15:54:02 +00:00
alc
b57e5e03fd Make vm_page's PG_ZERO flag immutable between the time of the page's
allocation and deallocation.  This flag's principal use is shortly after
allocation.  For such cases, clearing the flag is pointless.  The only
unusual use of PG_ZERO is in vfs_bio_clrbuf().  However, allocbuf() never
requests a prezeroed page.  So, vfs_bio_clrbuf() never sees a prezeroed
page.

Reviewed by:	tegge@
2004-05-06 05:03:23 +00:00
rwatson
9ec8ab1c20 Add /* !MAC */ to final #endif. 2004-05-03 22:54:46 +00:00
rwatson
16bb0e59b9 Bump copyright date for NETA to 2004. 2004-05-03 20:53:27 +00:00
rwatson
a857ce2f0a Add MAC_STATIC, a kernel option that disables internal MAC Framework
synchronization protecting against dynamic load and unload of MAC
policies, and instead simply blocks load and unload.  In a static
configuration, this allows you to avoid the synchronization costs
associated with introducing dynamicism.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, McAfee Research
2004-05-03 20:53:05 +00:00
cperciva
fe4e3a8b16 Fix a race condition which could result in profprocs being decremented
more than once if stopprofclock is called multiple times on the same
process.
2004-05-03 00:48:11 +00:00
peter
929c81a05b Checkpoint commit for an alternative WIP kernel module loader that isn't
as dependent on binutils features/quirks as the current one.  This one
loads plain .o files without having to mess with shared object mode.

This happens to be essential on amd64, because binutils hasn't implemented
all the quirks/features that we need for producing the hack non-PIC shared
objects.  As it turned out, .o format isn't all that inconvenient after
all.  It looks like the ability to use the same .o files for linking
directly into a static kernel or loading as a module might be worth it.

It is still very much a work-in-progress, but it is almost usable.  Other
changes are still needed in order to use it though, these have not been
committed yet.  There is still a memory corruption/overrun bug somewhere.
For example, test modules load and work, but the machine explodes a few
minutes later in vm_forkproc() or the like.  Notable missing things
include kldxref support, and loader(8) support.  I wanted to figure out
a working baseline set of code first.
2004-04-30 16:32:40 +00:00
deischen
122d328ccb Keep track of threads waiting in kse_release() to avoid a race
condition where kse_wakeup() doesn't yet see them in (interruptible)
sleep queues.  Also add an upcall check to sleepqueue_catch_signals()
suggested by jhb.

This commit should fix recent mysql hangs.

Reviewed by:	jhb, davidxu
Mysql'd by:	Robin P. Blanchard <robin.blanchard at gactr uga edu>
2004-04-28 20:36:53 +00:00
das
9df402daa4 If the buffer supplied to kenv(KENV_DUMP, ...) isn't big enough,
return the number of bytes needed instead of 0.  The manpage claims
that we do this anyway.
2004-04-28 01:27:33 +00:00
bmilekic
6bbcc9da29 Give jail(8) the feature to allow raw sockets from within a
jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.

Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.

The patch being committed is not identical to the patch
in the PR.  The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely.  This change has also been
presented and addressed on the freebsd-hackers mailing
list.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800
2004-04-26 19:46:52 +00:00
pjd
cf34985420 Always use nd.ni_vp->v_mount as an argument for VFS_QUOTACTL(), just like
in RELENG_4.

Pointed out by:	Alex Lyashkov <umka@sevinter.net>
2004-04-26 15:44:42 +00:00
hmp
fdb8f55130 The paper "Hashed Timers and Hierarchical Wheels: Data Structures for the
Efficient Implementation of a Timer Facility" was co-author'ed by T. Lauk,
not A. Lauk.

Adjust nearby whitespace.
2004-04-25 04:10:17 +00:00
alc
c8457a17a5 Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes
kmem_alloc_wait()) for mapping the image header.  On all machines with a
direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.
2004-04-23 03:01:40 +00:00
obrien
cbc25a780d There was a thread on "unusually high load averages" when running under
sched_ule, in January 2004.  Looking at this, "pagezero" is (one of) the
culprit(s).  We had no provision for processes with P_NOLOAD set.  With
pagezero not running at PRI_ITHD, kseq_load_{add,rem} count pagezero as
another-normal-process, thus the "expected-plus-one" load reported in
the above thread.

Submitted by:	Nikos Ntarmos <ntarmos@ceid.upatras.gr>
2004-04-22 21:37:46 +00:00
pjd
f8b184242a Look out! vn_start_write() is able to return 0 and NULL 'mp'.
Submitted by:	Alex Lyashkov <shadow@psoft.net>
2004-04-22 15:40:27 +00:00
bde
ff09fc7686 Include <sys/mutex.h> and its prerequisite <sys/lock.h> instesd of depending
on namespace pollution in <sys/vnode.h>.

Sorted includes.
2004-04-21 12:10:30 +00:00
cperciva
9e96265f37 1. Remove callout_stop binary compatibility.
2. Document that this means that kernel modules must be rebuilt.
3. While I'm here, fix my sorting error in callout.h

Requested by:	many [1], scottl [2], bde [3]
2004-04-20 15:49:31 +00:00
mtm
023ebd18f8 If you're trying to find out if a thread is valid and in
the same process as the current thread it makes absolutely
no sense to lock the parent process through the pointer in
said thread.

Submitted by:	pho (with minor correction)
Pointy Hat To:	mtm
2004-04-19 14:20:01 +00:00
luigi
4d396f17cf constify the last argument of m_copyback. 2004-04-18 13:01:28 +00:00
bde
62ea68b046 Fixed some style bugs in previous commit (mainly an insertion sort error
for declarations, and poorly worded messages).

Fixed some nearby style bugs (unsorted declarations).
2004-04-17 02:46:05 +00:00
jhb
24e7821d24 - Enable (unmask) interrupt sources earlier in the ithread loop.
Specifically, we used to enable the source after locking sched_lock
  and just before we had already decided to do a context switch.
  This meant that an ithread could never process more than one interrupt
  per context switch.  Enabling earlier in the loop before sched_lock is
  acquired allows an ithread to handle multiple interrupts per context
  switch if interrupts fire very rapidly.  For the case of heavy interrupt
  load this can reduce the number of context switches (and thus overhead)
  as well as reduce interrupt latency.
- Now that we can handle multiple interrupts per context switch, add simple
  interrupt storm protection to threaded interrupts.  If X number of
  consecutive interrupts are triggered before the itherad voluntarily
  yields to another thread, then the interrupt thread will sleep with the
  associated interrupt source disabled (masked) for 1/10th of a second.
  The default value of X is 500, but it can be tweaked via the tunable/
  sysctl hw.intr_storm_threshold.  If an interrupt storm is detected, then
  a message is output to the kernel console on the first occurrence per
  interrupt thread.  Interrupt storm protection can be disabled completely
  by setting this value to 0.  There is no scientific reasoning for the
  1/10th of a second or 500 interrupts values, so they may require tweaking
  at some point in the future.

Tested by:	rwatson (an earlier version w/o the storm protection)
Tested by:	mux (reportedly made a machine with two PCI interrupts
		storming usable rather than hard locked)
Reviewed by:	imp
2004-04-16 20:25:40 +00:00
rwatson
800e19506e At some point during the history of m_getcl(), MAC support began to
unconditionally initialize the mbuf header even if cluster allocation
failed, which could result in a NULL pointer dereference in low-memory
conditions.

PR:		kern/65548
Submitted by:	Stephan Uphoff <ups@tree.com>
2004-04-16 14:35:11 +00:00
ru
2b44d13e3e Ensure that the poll_burst <= poll_burst_max constraint really holds.
Reviewed by:	luigi
2004-04-15 07:38:44 +00:00
imp
b449361466 Fix off by one error, twice.
Submitted by: Carlos Velasco (first one), jhb (second one)
2004-04-12 23:02:21 +00:00
cperciva
7eb8531271 stop() no longer needs sched_lock held; in fact, holding sched_lock causes
a LOR against sleepq.  Fix the comment, and fix ptracestop() to pick up
sched_lock after stop() rather than before.

Reported by:	Scott Sipe <cscotts@mindspring.com>
Reviewed by:	rwatson, jhb
2004-04-12 15:56:05 +00:00
mux
79217d1505 Put deprecated sysctl code inside BURN_BRIDGES. 2004-04-11 21:09:22 +00:00
alc
643a21e287 Use vm_page_hold() rather than vm_page_wire() for short-duration page
wiring.  The reason being that vm_page_hold() is cheaper.
2004-04-11 19:57:11 +00:00
mux
74cb325f5e Remove a comment that complains about the lack of %qd, to justify
truncating a rlim_t to a long.  We have %qd since some time now.
However, the correct format to use here is %jd and a cast to
intmax_t, so do this.
2004-04-10 11:08:16 +00:00
peadar
fd75a2f931 Plug minor memory leak of module_t structures when unloading a file
from the kernel.

Reviewed By: Doug Rabson (dfr@)
2004-04-09 15:27:38 +00:00
cognet
acc91f284d Spell "switches" a more conventional way. 2004-04-09 14:31:29 +00:00
rwatson
435aae62de Compare pointers with NULL rather than using pointers are booleans in
if/for statements.  Assign pointers to NULL rather than typecast 0.
Compare pointers with NULL rather than 0.
2004-04-09 13:23:51 +00:00
silby
d35a5d60b9 Fix a regression in my change which sends headers along with data; a
side effect of that change caused headers to not be sent if a 0 byte
file was passed to sendfile.  This change fixes that behavior, allowing
sendfile to send out the headers even with a 0 byte file again.

Noticed by:	Dirk Engling
2004-04-08 07:14:34 +00:00
marcel
9584da2d1f Do not assume that the initial thread (i.e. the thread with the ID
equal to the process ID) is still present when we dump a core. It
already may have been destroyed. In that case we would end up
dereferencing a NULL pointer, so specifically test for that as well.

Reported & tested by: Dan Nelson <dnelson@allantgroup.com>
2004-04-08 06:37:00 +00:00
cperciva
9c466edcbc Add whitespace before comment blocks. (reported by njl)
Remove spurious whitespace, add indent protection, fix punctuation,
remove initialization of static variables to zero, put wakeup_ctr
and wakeup_needed in the correct order. (reported by bde)

This doesn't fix all the style bugs I introduced, but the remaining
style bugs make it easier for me to understand what I did here.
2004-04-08 02:03:49 +00:00
imp
b49b7fe799 Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
cperciva
b174697b69 Fix filt_timer* races: Finish initializing a knote before we pass it to
a callout, and use the new callout_drain API to make sure that a callout
has finished before we deallocate memory it is using.

PR:		kern/64121
Discussed with:	gallatin
2004-04-07 05:59:57 +00:00
cperciva
e0793884f3 Introduce a callout_drain() function. This acts in the same manner as
callout_stop(), except that if the callout being stopped is currently
in progress, it blocks attempts to reset the callout and waits until the
callout is completed before it returns.

This makes it possible to clean up callout-using code safely, e.g.,
without potentially freeing memory which is still being used by a callout.

Reviewed by:	mux, gallatin, rwatson, jhb
2004-04-06 23:08:49 +00:00
jhb
241908535b Associate a simple count of waiters with each condition variable. The
count is protected by the mutex that protects the condition, so the count
does not require any extra locking or atomic operations.  It serves as an
optimization to avoid calling into the sleepqueue code at all if there are
no waiters.

Note that the count can get temporarily out of sync when threads sleeping
on a condition variable time out or are aborted.  However, it doesn't hurt
to call the sleepqueue code for either a signal or a broadcast when there
are no waiters, and the count is never out of sync in the opposite
direction unless we have more than INT_MAX sleeping threads.
2004-04-06 19:17:46 +00:00
jhb
7cf9a1d044 Add a new kernel option MUTEX_WAKE_ALL that changes the mutex unlock code
to awaken all waiters when a contested mutex is released instead of just
the highest priority waiter.  If the various threads are awakened in
sequence then each thread may acquire and release the lock in question
without contention resulting in fewer expensive unlock and lock
operations.  This old behavior of waking just the highest priority is
still used if this option is specified.  Making the algorithm conditional
on a kernel option will allows us to benchmark both cases later and
determine which one should be used by default.

Requested by:	tanimura-san
2004-04-06 19:12:24 +00:00
jhb
8ab84688c3 Rename turnstile_wakeup() to turnstile_broadcast() to make the naming
more consistent with other APIs. sleepq and cv's use signal/broadcast, and
msleep uses wakeup_one/wakeup.  Prior to this turnstiles were using a
signal/wakeup mixture.
2004-04-06 19:07:21 +00:00
bde
122259ad35 Removed some less than useful comments:
- don't say what a small subset of the options includes are for.
- don't mark up functions which use all their args with /* ARGSUSED */.
  The markup should have been removed when the unused retval parameter
  was removed.
- don't comment on what routine suser() checks do.  Removed nearby
  excessive vertical whitespace.
2004-04-06 10:05:02 +00:00
imp
74cf37bd00 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
dfr
b750882696 Try not to crash instantly when signalling a libthr program to death. 2004-04-05 15:06:01 +00:00
dfr
674d9507bb Regen. 2004-04-05 10:17:23 +00:00
dfr
c76397ed09 Add lgetfh(2) which is like getfh(2) but doesn't follow symlinks. 2004-04-05 10:15:53 +00:00
rwatson
40e717b764 Detatch incorrect spellings of detach. 2004-04-04 19:15:45 +00:00
jeff
3f884ddf6b - Use the proper constant in sched_interact_update(). Previously,
SCHED_INTERACT_MAX was used where SCHED_SLP_RUN_MAX was needed.  This was
   causing the interactivity scaler to lose history at a more dramatic rate
   than intended.
2004-04-04 19:12:56 +00:00
marcel
f16d24b1ae Create NT_PRSTATUS and NT_FPREGSET notes for each and every thread
in the process. This is required for proper debugging of corefiles
created by 1:1 or M:N threaded processes. Add an XXX comment where
we should actually call a function that dumps MD specific notes.
An example of a MD specific note is the NT_PRXFPREG note for SSE
registers.

Since BFD creates non-annotated pseudo-sections for the first PRSTATUS
and FPREGSET notes (non-annotated in the sense that the name of the
section does not contain the pid/tid), make sure those sections describe
the initial thread of the process (i.e. the thread which tid equals the
pid). This is not strictly necessary, but makes sure that tools that use
the non-annotated section names will not change behaviour due to this
change.

The practical upshot of this all is that one can see the threads in
the debugger when looking at a corefile. For 1:1 threading this means
that *all* threads are visible.
2004-04-03 20:25:41 +00:00
marcel
1d37410c51 Assign thread IDs to kernel threads. The purpose of the thread ID (tid)
is twofold:
1. When a 1:1 or M:N threaded process dumps core, we need to put the
   register state of each of its kernel threads in the core file.
   This can only be done by differentiating the pid field in the
   respective note. For this we need the tid.
2. When thread support is present for remote debugging the kernel
   with gdb(1), threads need to be identified by an integer due to
   limitations in the remote protocol. This requires having a tid.

To minimize the impact of having thread IDs, threads that are created
as part of a fork (i.e. the initial thread in a process) will inherit
the process ID (i.e. tid=pid). Subsequent threads will have IDs larger
than PID_MAX to avoid interference with the pid allocation algorithm.
The assignment of tids is handled by thread_new_tid().

The thread ID allocation algorithm has been written with 3 assumptions
in mind:
1. IDs need to be created as fast a possible,
2. Reuse of IDs may happen instantaneously,
3. Someone else will write a better algorithm.
2004-04-03 15:59:13 +00:00
alc
1ec4d75266 In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not.  Add a new parameter so that the caller can specify which is
the case.

Reported by:	dillon
2004-04-03 09:16:27 +00:00
kris
90df1ff5bf Add missing comment terminator. 2004-04-02 04:57:40 +00:00
julian
89d6572b1f The comment complained about not having a thread_unlink()
and did the work itself, but thread_unink() has existed for a while... use it.
2004-04-02 01:01:34 +00:00
jhb
03d6afada4 Finish fixing up Alpha to work with an MP safe ptrace():
- ptrace_single_step() is no longer called with the proc lock held, so
  don't try to unlock it and then relock it.
- Push Giant down into proc_rwmem() instead of forcing all the consumers
  (including Alpha breakpoint support) to explicitly wrap calls to
  proc_rwmem() with Giant.

Tested by:	kensmith
2004-04-01 20:56:44 +00:00
scottl
7996fbe9e3 Don't print out 'GIANT-LOCKED' for INTR_FAST drivers. 2004-04-01 07:18:42 +00:00
pjd
cda068b39a Remove sysctl kern.ps_argsopen, it is not very useful, one should use
security.bsd.see_other_uids instead.

Discussed with:	phk, rwatson
2004-04-01 00:10:45 +00:00
pjd
a8489e86d9 Remove ps_argsopen check. It is was bogus in the past and was corrected
not quite well by me - if kern.ps_argsopen was set to 0, users weren't
permitted to see arguments of even own processes.
But kern.ps_argsopen is going away, so just remove this check and leave
security checks for p_cansee() function.
2004-04-01 00:08:20 +00:00
julian
7a48fb22ac Remove unused variable. 2004-03-31 08:20:44 +00:00
rwatson
6f9005a98d In sofree(), avoid nested declaration and initialization in
declaration.  Observe that initialization in declaration is
frequently incompatible with locking, not just a bad idea
due to style(9).

Submitted by:	bde
2004-03-31 03:48:35 +00:00
rwatson
8eeaad5c11 Export uipc_connect2() from uipc_usrreq.c instead of unp_connect2(),
and consume that interface in portalfs and fifofs instead.  In the
new world order, unp_connect2() assumes that the unpcb mutex is
held, whereas uipc_connect2() validates that the passed sockets are
UNIX domain sockets, then grabs the mutex.

NB: the portalfs and fifofs code gets down and dirty with UNIX domain
sockets.  Maybe this is a bad thing.
2004-03-31 01:41:30 +00:00
alc
707630ec9c White space and wording changes to init_param3().
Mostly submitted by:	bde
2004-03-30 08:00:11 +00:00
rwatson
b5754b5316 Prefer NULL to 0 when testing and assigning pointer values. 2004-03-30 02:16:25 +00:00
peter
7957bc47f6 Shorten some XXXKSE commentry 2004-03-29 22:46:54 +00:00
peter
5e91995e52 Kill some XXXKSE's. vnlru/syncer are single threaded. 2004-03-29 22:45:33 +00:00
peter
1f224a3d83 Clean up the stub fake vnode locking implemenations. The main reason this
stuff was here (NFS) was fixed by Alfred in November.  The only remaining
consumer of the stub functions was umapfs, which is horribly horribly
broken.  It has missed out on about the last 5 years worth of maintenence
that was done on nullfs (from which umapfs is derived).  It needs major
work to bring it up to date with the vnode locking protocol.  umapfs really
needs to find a caretaker to bring it into the 21st century.

Functions GC'ed:
vop_noislocked, vop_nolock, vop_nounlock, vop_sharedlock.
2004-03-29 22:41:21 +00:00
rwatson
b39ce8f898 Use a common return path for filt_soread() and filt_sowrite() to
simplify the impact of locking on these functions.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
2004-03-29 18:06:15 +00:00
rwatson
4edcdd352b In sofree(), moving caching of 'head' from 'so->so_head' to later in
the function once it has been determined to be non-NULL to simplify
locking on an earlier return.
2004-03-29 17:57:43 +00:00
rwatson
f31d099747 If debug.mpsafenet, initialize UNIX domain socket timeouts as MPSAFE;
otherwise, assert Giant in the callouts.
2004-03-29 17:00:05 +00:00
rwatson
7360350f89 Conditionally acquire Giant when entering the sockets layer via the
socket-specific system calls based on debug.mpsafenet, rather than
acquiring Giant unconditionally.
2004-03-29 02:21:56 +00:00
rwatson
3b7dc3c3f7 Conditionally acquire Giant when entering the socket layer via file
descriptor operations based on debug.mpsafenet, rather than acquiring
Giant unconditionally.
2004-03-29 01:55:32 +00:00
rwatson
52c072609b When validating that the length sum in recvit(), we fail to release
Giant on an error.  Add a Giant acquisition.

Reviewed by:	sam, bms
2004-03-29 01:37:06 +00:00
rwatson
0feec33757 Conditionally assert Giant in fputsock() based on the value of
debug.mpsafenet.
2004-03-29 00:33:02 +00:00
alc
521fa57364 Revise the direct or optimized case to use uiomove_fromphys() by the reader
instead of ephemeral mappings using pmap_qenter() by the writer.  The
writer is still, however, responsible for wiring the pages, just not
mapping them.  Consequently, the allocation of KVA for the direct case is
unnecessary.  Remove it and the sysctls limiting it, i.e.,
kern.ipc.maxpipekvawired and kern.ipc.amountpipekvawired.  The number
of temporarily wired pages is still, however, limited by
kern.ipc.maxpipekva.

Note: On platforms lacking a direct virtual-to-physical mapping,
uiomove_fromphys() uses sf_bufs to cache ephemeral mappings.  Thus,
the number of available sf_bufs can influence the performance of pipes
on platforms such i386.  Surprisingly, I saw the greatest gain from this
change on such a machine: lmbench's pipe bandwidth result increased from
~1050MB/s to ~1850MB/s on my 2.4GHz, 400MHz FSB P4 Xeon.
2004-03-27 19:50:23 +00:00
marcel
fb520fa860 Change the type of the various CPU masks to cpumask_t. Note that as
long as there are still explicit uses of int, whether in types or
in function names (such as atomic_set_int() in sched_ule.c), we can
not change cpumask_t to be anything other than u_int. See also the
commit log for sys/sys/types.h, revision 1.84.
2004-03-27 18:21:24 +00:00
mtm
02e9e2319a Regen for libthr thread synchronization syscalls. 2004-03-27 14:34:17 +00:00
mtm
873aa62c96 Use the proc lock to sleep on a libthr umtx. 2004-03-27 14:32:03 +00:00
mtm
adb111ed69 Separate thread synchronization from signals in libthr. Instead
use msleep() and wakeup_one().

Discussed with: jhb, peter, tjr
2004-03-27 14:30:43 +00:00
pjd
deaccb5ae7 - Add a description for vfs.usermount sysctl.
- Add the vfs_equalopts() function for mount options comparsion.
  Now it looks much more clear.
- Style fixed.

In co-operation with:	bde
2004-03-27 08:39:28 +00:00
pjd
b04fb12275 - Loudly disallow MNT_SUIDDIR mount flag for unprivileged users mounts.
- Style fixed.

Submitted by:	bde
2004-03-27 08:09:00 +00:00
pjd
b05f0288da We probably shouldn't allow users to mount file systems with MNT_SUIDDIR.
There should be not shell access when SUIDDIR is compiled in, but
better be sure.

Reviewed by:	rwatson
2004-03-26 21:12:14 +00:00
alc
cab68d38d2 Use uiomove_fromphys() instead of pmap_qenter() and pmap_qremove() in
proc_rwmem().
2004-03-24 23:35:04 +00:00
imp
5d9dc609bb Conform to local file sytle and prefer (a && (b & flag)). 2004-03-24 16:49:37 +00:00
obrien
e182cf9fee Change the !MPSAFE boot string to something that doesn't potentially
scare users that the kernel won't run on MP systems.
2004-03-23 01:58:09 +00:00
alfred
fbfae479f4 Emit a traceback when witness_trace is set and witness_warn() is
called and triggers (typically caused by sleeping with a non-sleepable
lock).

Reviewed by: jhb
2004-03-23 00:32:27 +00:00
obrien
71b6e14bc8 Rather than display which interrupts are MPSAFE, display those that aren't.
This way we can take stock of the work to be done.  boot -v will note those
interrupts that are MPSAFE.
2004-03-22 22:36:11 +00:00
ps
e230d44ca7 Remove some netbsd debug code that crept into rev 1.116 2004-03-22 10:17:40 +00:00
obrien
7b9a2bdb17 Give a more reasonable CPU time to the threads which are using scheduler
activation (i.e., applications are using libpthread).  This is because
SCHED_ULE sometimes puts P_SA processes into ksq_next unnecessarily.
Which doesn't give fair amount of CPU time to processes which are
using scheduler-activation-based threads when other (semi-)CPU-intensive,
non-P_SA processes are running.

Further work will no doubt be done by jeffr at a later date.

Submitted by:	Taku YAMAMOTO <taku@cent.saitama-u.ac.jp>
Reviewed by:	rwatson, freebsd-current@
2004-03-21 18:53:29 +00:00
julian
5e0a5420a9 Massively up the (artificial) limit on system scope threads
in a process from 50 to 500

Also up the number of process scope threads allowed to be in the kernel
at one time from 150 to 1500 (per process)
2004-03-21 09:22:38 +00:00
green
aae79f39c1 Add the missing Giant when doing anything with VFS -- in this case,
releasing the ktrace vnode.
2004-03-18 18:15:58 +00:00
nectar
97b3d4b119 Verify more bits of the ELF header: the program header table
entry size and the ELF version.  Also, avoid a potential integer
overflow when determining whether the ELF header fits entirely
within the first page.

Reviewed by:	jdp

A panic when attempting to execute an ELF binary with a bogus program
header table entry size was

Reported by:	Christer Öberg <christer.oberg@texonet.com>
2004-03-18 16:33:05 +00:00
alc
d48f3c1617 Revise socow_iodone() in light of recent sf_buf changes. Specifically,
use sf_buf_free() instead of sf_buf_mext() to consolidate all actions
that require the page queues lock in one critical section.  While I'm
here remove unnecessary splvm() and splx() calls.
2004-03-17 23:25:04 +00:00
jhb
275240297d - Replace wait1() with a kern_wait() function that accepts the pid,
options, status pointer and rusage pointer as arguments.  It is up to
  the caller to copyout the status and rusage to userland if needed.  This
  lets us axe the 'compat' argument and hide all that functionality in
  owait(), by the way.  This also cleans up some locking in kern_wait()
  since it no longer has to drop locks around copyout() since all the
  copyout()'s are deferred.
- Convert owait(), wait4(), and the various ABI compat wait() syscalls to
  use kern_wait() rather than wait1() or wait4().  This removes a bit
  more stackgap usage.

Tested on:	i386
Compiled on:	i386, alpha, amd64
2004-03-17 20:00:00 +00:00
pjd
11852bf574 Fix information leakage.
Without this fix it is possible to cheat policies like:
- sysctl security.bsd.see_other_[gu]ids=0,
- mac_seeotheruids(4),
- jail(2)
and get full processes list with their arguments.

This problem exists from revision 1.62 of kern_proc.c when it was
introduced.

Reviewed by:	nectar, rwatson.
2004-03-17 13:19:43 +00:00
cperciva
b9e38dc622 Adjust the number of processes waiting on a semaphore properly if we're
woken up in the middle of sleeping.

PR:		misc/64347
Reviewed by:	tjr
MFC after:	7 days
2004-03-17 09:37:13 +00:00
alc
a2e820d27b Refactor the existing machine-dependent sf_buf_free() into a machine-
dependent function by the same name and a machine-independent function,
sf_buf_mext().  Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical.  Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page.  That is now the purpose of
sf_buf_mext().  Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.
2004-03-16 19:04:28 +00:00
jhb
71c3a1c44c Remove a bogus assertion and readd it in a more correct location. A thread
might be enqueued on a sleep queue but not be asleep when the timeout fires
if it is blocked on a lock trying to check for pending signals before going
to sleep.  In the case of fixing up the TDF_TIMEOUT race, however, the
thread must be marked asleep.

Reported by:	kan (the bogus one)
2004-03-16 18:56:22 +00:00
grehan
84ca086b14 Add powerpc to temporary fix. The new cpu device claims all
'generic' OpenFirmware nexus nodes, since it uses bus_generic_probe.
Maybe the cpu device probe should be MD.
2004-03-16 13:34:50 +00:00
dwmalone
116755fef7 Nudge Giant as far as I can into kern_open(). Mark open() as MPSAFE.
Use kern_open() to implement creat() rather than taking the long route
through open(). Mark creat as MPSAFE.

While I'm at it, mark nosys() (syscall 0) as MPSAFE, for all the
difference it will make.
2004-03-16 10:46:42 +00:00
dwmalone
bfd66a78ad Get ready to mark open, creat and nosys as MPSAFE. 2004-03-16 10:41:23 +00:00
tjr
d37f036c18 Make vfs_nmount() public. The Linux emulator needs this in order to mount
linprocfs filesystems.
2004-03-16 08:59:37 +00:00
truckman
3c4fad0869 Rename the wiredlen member of struct sysctl_req to validlen and always
set it to avoid the need for a bunch of code that tests whether or
not the lock member is set to REQ_WIRED in order to determine which
length member should be used.

Fix another bug in the oldlen return value code.

Fix a potential wired memory leak if a sysctl handler uses
sysctl_wire_old_buffer() and returns an EAGAIN error to trigger
a retry.
2004-03-16 06:53:03 +00:00
truckman
2b0bda9870 Don't bother calling vslock() and vsunlock() if oldlen is zero.
If vslock() returns ENOMEM, sysctl_wire_old_buffer() should set
wiredlen to zero and return zero (success) so that the handler will
operate according to sysctl(3):
     The size of the buffer is given by the location specified by
     oldlenp before the call, and that location gives the amount
     of data copied after a successful call and after a call that
     returns with the error code ENOMEM.
The handler will return an ENOMEM error because the zero length
buffer will overflow.
2004-03-16 01:28:45 +00:00
jhb
1ed25bd5c1 Regen for ptrace being safe again. 2004-03-15 18:50:06 +00:00
jhb
d8445e0c8d Drop the proc lock around calls to the MD functions ptrace_single_step(),
ptrace_set_pc(), and cpu_ptrace() so that those functions are free to
acquire Giant, sleep, etc.  We already do a PHOLD/PRELE around them so
that it is safe to sleep inside of these routines if necessary.  This
allows ptrace() to be marked MP safe again as it no longer triggers lock
order reversals on Alpha.

Tested by:	wilko
2004-03-15 18:48:28 +00:00
pjd
aae1ea0f99 Remove sysctl security.jail.list_allowed.
This functionality was a misfeature, sysctl was added and turned off by
default just to check if nobody complains.

Reviewed by:	rwatson
2004-03-15 12:10:34 +00:00
truckman
b7a6af3cc9 Revert to the original vslock() and vsunlock() API with the following
exceptions:
	Retain the recently added vslock() error return.

	The type of the len argument should be size_t, not u_int.

Suggested by:	bde
2004-03-15 06:42:40 +00:00
phk
7a8abdd585 Annual NTP kernel code spring-cleaning:
Use int64_t rather than long long for the fixpoint type.

Don't discard fractional nanosecond frequency correction.
2004-03-14 15:23:05 +00:00
peter
7a96caafd4 Set default HZ to 1024 for amd64. The comment in kern/tty.c doesn't
apply here because we have 64 bit longs and don't suffer the hz > 169
overflows.
2004-03-14 05:49:31 +00:00
peter
bd5efd4600 Make the process_exit eventhandler run without Giant. Add Giant hooks
in the two consumers that need it.. processes using AIO and netncp.
Update docs.  Say that process_exec is called with Giant, but not to
depend on it.  All our consumers can handle it without Giant.
2004-03-14 02:06:28 +00:00
peter
963c36c195 Move the process_fork event out from under Giant. This one is easy,
since there are no consumers in the tree.  Document this.
2004-03-14 01:48:32 +00:00
peter
c59ea86c9f Regen for mpsafe kse_create() 2004-03-13 22:32:17 +00:00
peter
1cb95fd2b7 Push Giant down a little further:
- no longer serialize on Giant for thread_single*() and family in fork,
  exit and exec
- thread_wait() is mpsafe, assert no Giant
- reduce scope of Giant in exit to not cover thread_wait and just do
  vm_waitproc().
- assert that thread_single() family are not called with Giant
- remove the DROP/PICKUP_GIANT macros from thread_single() family
- assert that thread_suspend_check() s not called with Giant
- remove manual drop_giant hack in thread_suspend_check since we know it
  isn't held.
- remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family
- mark kse_create() mpsafe
2004-03-13 22:31:39 +00:00
rwatson
6a1193a64c Add annotations to mtx_lock(&Giant) in kern_select() and poll() that
we always grab Giant, even if we're actually only polling objects that
don't require giant.  Once socket locking is merged, there will be
strong motivation to fix this.
2004-03-13 05:58:57 +00:00
bde
58d250bc29 Align the offset in vn_rdwr_inchunks() so that at most the first and
the last chunk are misaligned relative to a MAXBSIZE byte boundary.
vn_rdwr_inchunks() is used mainly for elf core dumps, and elf sections
are usually perfectly misaligned relative to MAXBSIZE, and chunking
prevents the file system from doing much realigning.

This gives a surprisingly large speedup for core dumps -- from 50 to
13 seconds for a 512MB core dump here.  The pessimization was mostly
from an interaction of the misalignment with IO_DIRECT.  It increased
the number of i/o's for each chunk by a factor of 5 (3 writes and 2
read-before-writes instead of 1 write).
2004-03-13 02:56:27 +00:00
trhodes
dfcfecd6e4 These are changes to allow to use the Intel C/C++ compiler (lang/icc)
to build the kernel. It doesn't affect the operation if gcc.

Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.

Additional changes:
 - in_cksum.[ch]:
   * use a generic C version instead of the assembly version in the !gcc
     case (ASM code breaks with the optimizations icc does)
     -> no bad checksums with an icc compiled kernel
     Help from:		andre, grehan, das
     Stolen from: 	alpha version via ppc version
     The entire checksum code should IMHO be replaced with the DragonFly
     version (because it isn't guaranteed future revisions of gcc will
     include similar optimizations) as in:
        ---snip---
          Revision  Changes    Path
          1.12      +1 -0      src/sys/conf/files.i386
          1.4       +142 -558  src/sys/i386/i386/in_cksum.c
          1.5       +33 -69    src/sys/i386/include/in_cksum.h
          1.5       +2 -0      src/sys/netinet/igmp.c
          1.6       +0 -1      src/sys/netinet/in.h
          1.6       +2 -0      src/sys/netinet/ip_icmp.c

          1.4       +3 -4      src/contrib/ipfilter/ip_compat.h
          1.3       +1 -2      src/sbin/natd/icmp.c
          1.4       +0 -1      src/sbin/natd/natd.c
          1.48      +1 -0      src/sys/conf/files
          1.2       +0 -1      src/sys/conf/files.amd64
          1.13      +0 -1      src/sys/conf/files.i386
          1.5       +0 -1      src/sys/conf/files.pc98
          1.7       +1 -1      src/sys/contrib/ipfilter/netinet/fil.c
          1.10      +2 -3      src/sys/contrib/ipfilter/netinet/ip_compat.h
          1.10      +1 -1      src/sys/contrib/ipfilter/netinet/ip_fil.c
          1.7       +1 -1      src/sys/dev/netif/txp/if_txp.c
          1.7       +1 -1      src/sys/net/ip_mroute/ip_mroute.c
          1.7       +1 -2      src/sys/net/ipfw/ip_fw2.c
          1.6       +1 -2      src/sys/netinet/igmp.c
          1.4       +158 -116  src/sys/netinet/in_cksum.c
          1.6       +1 -1      src/sys/netinet/ip_gre.c
          1.7       +1 -2      src/sys/netinet/ip_icmp.c
          1.10      +1 -1      src/sys/netinet/ip_input.c
          1.10      +1 -2      src/sys/netinet/ip_output.c
          1.13      +1 -2      src/sys/netinet/tcp_input.c
          1.9       +1 -2      src/sys/netinet/tcp_output.c
          1.10      +1 -1      src/sys/netinet/tcp_subr.c
          1.10      +1 -1      src/sys/netinet/tcp_syncache.c
          1.9       +1 -2      src/sys/netinet/udp_usrreq.c

          1.5       +1 -2      src/sys/netinet6/ipsec.c
          1.5       +1 -2      src/sys/netproto/ipsec/ipsec.c
          1.5       +1 -1      src/sys/netproto/ipsec/ipsec_input.c
          1.4       +1 -2      src/sys/netproto/ipsec/ipsec_output.c

          and finally remove
            sys/i386/i386        in_cksum.c
            sys/i386/include     in_cksum.h
        ---snip---
 - endian.h:
   * DTRT in C++ mode
 - quad.h:
   * we don't use gcc v1 anymore, remove support for it
   Suggested by:	bde (long ago)
 - assym.h:
   * avoid zero-length arrays (remove dependency on a gcc specific
     feature)
     This change changes the contents of the object file, but as it's
     only used to generate some values for a header, and the generator
     knows how to handle this, there's no impact in the gcc case.
   Explained by:	bde
   Submitted by:	Marius Strobl <marius@alchemy.franken.de>
 - aicasm.c:
   * minor change to teach it about the way icc spells "-nostdinc"
   Not approved by:	gibbs (no reply to my mail)
 - bump __FreeBSD_version (lang/icc needs to know about the changes)

Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.

Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.

Reviewed by:	-arch
Submitted by:	netchild
2004-03-12 21:45:33 +00:00
ru
50a11e8dfd Do what the execve(2) manpage says and enforce what a Strictly
Conforming POSIX application should do by disallowing the argv
argument to be NULL.

PR:		kern/33738
Submitted by:	Marc Olzheim, Serge van den Boom
OK'ed by:	nectar
2004-03-12 21:06:20 +00:00
kensmith
2cb0d83a6c This is a temporary fix to solve a regression issue on sparc64 that
is caused by the way sparc64 registers its CPUs.  Nate will work on
a real fix shortly.

Approved by:	njl
2004-03-12 20:35:21 +00:00
jhb
c754b5af47 - Remove old sleep queues.
- Remove sleepqueue argument from sleepq_set_timeout() since it is not
  used.
2004-03-12 19:06:18 +00:00
jhb
6103cfbeb5 Fixup a comment. 2004-03-12 19:05:46 +00:00
des
a77c8e8035 Replace a manual check of a VMIO candidate with vn_canvmio(). This
silences an annoying warning in getblk() when VMIO'ing on a directory
vnode, which can happen when vfs.vmiodirenable is 1.

Bring the warning message in line with reality at the same time.

Submitted by:	hmp
2004-03-12 12:02:12 +00:00
phk
5c532f7fd4 When I was a kid my work table was one cluttered mess an cleaning it up
were a rather overwhelming task.  I soon learned that if you don't know
where you're going to store something, at least try to pile it next to
something slightly related in the hope that a pattern emerges.

Apply the same principle to the ffs/snapshot/softupdates code which have
leaked into specfs:  Add yet a buf-quasi-method and call it from the
only two places I can see it can make a difference and implement the
magic in ffs_softdep.c where it belongs.

It's not pretty, but at least it's one less layer violated.
2004-03-11 18:50:33 +00:00
phk
2a5e157787 Properly vector all bwrite() and BUF_WRITE() calls through the same path
and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
2004-03-11 18:02:36 +00:00
phk
9ba3cede82 Remove unused mnt_reservedvnlist field. 2004-03-11 16:59:57 +00:00
phk
eeb7579130 Remove unused second arg to vfinddev().
Don't call addaliasu() on VBLK nodes.
2004-03-11 16:33:11 +00:00
phk
7ad97e57ad Correctly account for extra bits in unit numbers when looking for
next free unit.
2004-03-11 14:11:02 +00:00
phk
fdd216910f Add clone_setup() function rather than rely on lazy initialization.
Requested by:	rwatson
2004-03-11 12:58:55 +00:00
jmg
cf1b8bdb72 make sure we had the filedesc lock when calling fdinit when RFCFDG is set
on call to rfork.

Submitted by:	Brian Buchanan
Semi-Reviewed by: rwatson
2004-03-10 00:27:36 +00:00
njl
89565a7301 Hook CPUs up to newbus. CPUs will ultimately be a bus driver so that
multiple CPU-specific drivers can attach.  This is a work in progress
so children aren't supported yet.

Help from:	jhb
2004-03-09 03:37:21 +00:00
rwatson
8ff4e76430 Mark loadaverage callout as CALLOUT_MPSAFE.
Reviewed by:	jhb
2004-03-08 22:01:19 +00:00
pjd
b2d1dcd936 Add two new sysctls:
- security.bsd.hardlink_check_uid, when set, means, that unprivileged
		users are not permitted to create hard links to files not
		owned by them,
	- security.bsd.hardlink_check_gid, when set, means, that unprivileged
		users are not permitted to create hard links to files owned
		by group they don't belong to.

OK'ed by:	rwatson
2004-03-08 20:37:25 +00:00
peter
836666b0d7 Move a vref call outside of proc locks and Giant. By virtue of the fact
that we (p1) are currently running, we hold a reference on p_textvp which
means the vnode cannot go away.  p2 cannot run yet (and hence cannot exit)
so this should be safe to do at this point.  As a bonus, it removes a
block of under-Giant code that was there to support the vref.
2004-03-08 00:32:34 +00:00
alc
6b2e0639b2 Remove GIANT_REQUIRED from vunmapbuf(). 2004-03-07 00:37:18 +00:00
alc
94f567f9bb Giant is not required for vm_thread_new_altkstack(). 2004-03-07 00:06:32 +00:00
kan
e795b7939d Always call vn_finished_write after vn_start_write was called. All
occurences of 'goto done' after vn_start_write invocation were cleaning
up incompletely.
2004-03-06 04:09:54 +00:00
peter
8ac8c686e1 Add a missing part of jhb's previous commit. It looks like he had a
patch chunk rejected that he missed.  This would manifest as a lock
assertion panic at boot (Giant not locked in kern_fork.c).

Obtained from:  jhb
2004-03-06 00:44:59 +00:00
jhb
2642ed4029 kthread_exit() no longer requires Giant, so don't force callers to acquire
Giant just to call kthread_exit().

Requested by:	many
2004-03-05 22:42:17 +00:00
jhb
4e1bd1e348 - Push down Giant in exit() and wait().
- Push Giant down a bit in coredump() and call coredump() with the proc
  lock already held rather than unlocking it only to turn around and
  relock it.

Requested by:	peter
2004-03-05 22:39:53 +00:00
jhb
445ca63264 Lock Giant around the single threading code in exec() to satisfy an
assertion in the single threading code.
2004-03-05 22:38:26 +00:00
jhb
af72c48e5f - Grab a share lock of the proctree lock while looking for a pid due to the
process group and session dereferences.  Also, check that p_pgrp and
  p_sesssion are NULL before dereferencing them.
- Push down Giant in fork1().

Requested by:	peter
2004-03-05 22:37:32 +00:00
truckman
367b608998 Undo the merger of mlock()/vslock and munlock()/vsunlock() and the
introduction of kern_mlock() and kern_munlock() in
        src/sys/kern/kern_sysctl.c      1.150
        src/sys/vm/vm_extern.h          1.69
        src/sys/vm/vm_glue.c            1.190
        src/sys/vm/vm_mmap.c            1.179
because different resource limits are appropriate for transient and
"permanent" page wiring requests.

Retain the kern_mlock() and kern_munlock() API in the revived
vslock() and vsunlock() functions.

Combine the best parts of each of the original sets of implementations
with further code cleanup.  Make the mclock() and vslock()
implementations as similar as possible.

Retain the RLIMIT_MEMLOCK check in mlock().  Move the most strigent
test, which can return EAGAIN, last so that requests that have no
hope of ever being satisfied will not be retried unnecessarily.

Disable the test that can return EAGAIN in the vslock() implementation
because it will cause the sysctl code to wedge.

Tested by:	Cy Schubert <Cy.Schubert AT komquats.com>
2004-03-05 22:03:11 +00:00
rwatson
702f89fc5d The roundrobin callout from sched_4bsd is MPSAFE, so set up the
callout as MPSAFE to avoid grabbing Giant.

Reviewed by:	jhb
2004-03-05 19:27:04 +00:00
rwatson
e2aad13d33 Put "failed to set signal flags properly for ast()" check under
DIAGNOSTIC instead of INVARIANTS.  INVARIANTS is intended for tests
that don't substantially change code flow or behavior (passive), but
this test required locking both the proc lock and scheduler lock
in order to execute.  It also appears to be a very advisory diagnostic
as opposed to an invariant violation.

Following discussion with:	bde
2004-03-05 17:35:28 +00:00
phk
e215fa23b4 Just because the timecounter reads the same value on two samples
after each other doesn't mean that nothing happened.
2004-03-04 14:14:23 +00:00
bde
9c7937d9cf Fixed some style bugs (mainly English usage errors in comments). 2004-03-04 09:56:29 +00:00
bde
a3f844af36 Fixed some style bugs (mainly misplaced comments, and totally disordered
declarations in acct_process()).
2004-03-04 09:47:09 +00:00
rwatson
48d4fe5ea4 Remove unneeded label 'done2' from socket(). We now grab Giant
only around socreate(), and don't need it for file descriptor
accesses.

Submitted by:	sam
2004-03-04 01:57:48 +00:00
des
e6b61c95ad Use different dummy wait channels to avoid panic in msleep().
Reviewed by:	jhb
2004-03-03 23:03:18 +00:00
jhb
286e504b8f Always assert that the passed in lock is the same as the saved lock in the
sleep queue now that the one abnormal case has been fixed.
2004-03-02 15:02:08 +00:00
jhb
93c4123deb Correct handling of PDROP in msleep() to just skip the mtx_lock() rather
than clear the lock pointer so that sleepq_add() still gets the correct
lock pointer and doesn't bogusly trip an assertion.
2004-03-02 14:58:33 +00:00
jhb
86e3aa5b6c Check for TDF_SINTR before calling sleepq_abort() as there is a narrow
race in between sleepq_add() and sleepq_catch_signals() in that setting
td_wchan and TDF_SINTR is not atomic to sched_lock but only to the sleepq
lock.  This band-aid will stop assertion failures, but there is perhaps a
larger problem with the sleepq_add/sleepq_catch_signals race that I am not
sure how to solve.  For the signals case the race is harmless because we
always call cursig() after setting TDF_SINTR.  However, KSE doesn't do
anything in sleepq_catch_signals() to check that this race was lost, so I
am unsure if this race is harmful for this specific abort.
2004-03-01 23:07:58 +00:00
rwatson
b0b5f961bd Rename dup_sockaddr() to sodupsockaddr() for consistency with other
functions in kern_socket.c.

Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT
in from the caller context rather than "1" or "0".

Correct mflags pass into mac_init_socket() from previous commit to not
include M_ZERO.

Submitted by:	sam
2004-03-01 03:14:23 +00:00
scottl
48b0575f79 Convert the other use of flags to mflags in soalloc(). 2004-03-01 01:14:28 +00:00
rwatson
94d29f7426 Modify soalloc() API so that it accepts a malloc flags argument rather
than a "waitok" argument.  Callers now passing M_WAITOK or M_NOWAIT
rather than 0 or 1.  This simplifies the soalloc() logic, and also
makes the waiting behavior of soalloc() more clear in the calling
context.

Submitted by:	sam
2004-02-29 17:54:05 +00:00
phk
fdda2333fa Loudly announce WITNESS and DIAGNOSTIC options and warn about reduced
performance.
2004-02-29 16:56:54 +00:00
phk
c45bc64148 Make sure to disable the watchdog if we cannot honour the timeout. 2004-02-28 22:01:19 +00:00
phk
1ea4f5b08e Rename the WATCHDOG option to SW_WATCHDOG and make it use the
generic watchdoc(9) interface.

Make watchdogd(8) perform as watchdog(8) as well, and make it
possible to specify a check command to run, timeout and sleep
periods.

Update watchdog(4) to talk about the generic interface and add
new watchdog(8) page.
2004-02-28 20:56:35 +00:00
jhb
d25301c858 Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
  and condition variables.  Having sleep qeueus in a hash table avoids
  having to allocate a queue head for each wait channel.  Thus, struct cv
  has shrunk down to just a single char * pointer now.  However, the
  hash table does not hold threads directly, but queue heads.  This means
  that once you have located a queue in the hash bucket, you no longer have
  to walk the rest of the hash chain looking for threads.  Instead, you have
  a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
  differentiates between cv's and sleep/wakeup.  For example, calls to
  abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
  Thus, the TDF_CVWAITQ flag is removed.  Also, calls to unsleep() and
  cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
  sleep's no longer inherently bump the priority.  Instead, this is soley
  a propery of msleep() which explicitly calls sched_prio() before
  blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used.  The
  associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
  dropped and replaced with a single explicit clearing of td_wchan.
  TD_SET_ONSLEEPQ() would really have only made sense if it had taken
  the wait channel and message as arguments anyway.  Now that that only
  happens in one place, a macro would be overkill.
2004-02-27 18:52:44 +00:00
jhb
d76d631711 Drop sched_lock around the wakeup of the parent process after setting
the process state to zombie when a process exits to avoid a lock order
reversal with the sleepqueue locks.  This appears to be the only place
that we call wakeup() with sched_lock held.
2004-02-27 18:39:09 +00:00
jhb
d07a9130c6 Add an implementation of a generic sleep queue abstraction that is used
to queue threads sleeping on a wait channel similar to how turnstiles are
used to queue threads waiting for a lock.  This subsystem will be used as
the backend for sleep/wakeup and condition variables initially.  Eventually
it will also be used to replace the ithread-specific iwait thread
inhibitor.

Sleep queues are also not locked by sched_lock, so this splits sched_lock
up a bit further increasing concurrency within the scheduler.  Sleep queues
also natively support timeouts on sleeps and interruptible sleeps allowing
for the reduction of a lot of duplicated code between the sleep/wakeup and
condition variable implementations.  For more details on the sleep queue
implementation, check the comments in sys/sleepqueue.h and
kern/subr_sleepqueue.c.
2004-02-27 18:33:09 +00:00
des
51a4ce2dff Add sysctl_move_oid() which reparents an existing OID. 2004-02-27 17:13:23 +00:00
jhb
b23d8371fa Clarify and tweak some comments. 2004-02-27 16:14:27 +00:00
jhb
b7ab1db7c3 Fix _sx_assert() to panic() rather than printf() when an assertion fails
and ignore assertions if we have already paniced.
2004-02-27 16:13:44 +00:00
jhb
b9e0b0f9af Replace the ktrace queue's semaphore with a condition variable instead as
it is slightly more efficient since we already have a mutex to protect the
queue.  Ktrace originally used a semaphore more as a proof of concept.
2004-02-26 19:30:22 +00:00