6353 Commits

Author SHA1 Message Date
phk
bc9e7b689e Remove return after panic.
Found by:       FlexeLint
2003-05-31 20:18:23 +00:00
phk
11faebdb1a Remove needless return
Found by:       FlexeLint
2003-05-31 20:16:44 +00:00
phk
108e10f649 Add a couple of XXX comments where the intent is not clear.
Found by:       FlexeLint
2003-05-31 20:13:58 +00:00
phk
145891d899 Remove unused variable(s).
Remove break after goto

Found by:       FlexeLint
2003-05-31 20:11:33 +00:00
phk
d876d5ad8d Remove return after panic.
Found by:       FlexeLint
2003-05-31 20:09:42 +00:00
phk
ddccb1d287 Remove unused variable and now unbalanced call to splbio();
Found by:       FlexeLint
2003-05-31 20:09:01 +00:00
marcel
9bba923f2e Fix ia32 compat on ia64. Recent ia64 MD changes caused the garbage on
the stack to be changed in a way incompatible with elf32_map_insert()
where we used data_buf without initializing it for when the partial
mapping resulting in a misaligned image (typical when the page size
implied by the image is not the same as the page size in use by the
kernel). Since data_buf is passed by reference to vm_map_find(), the
compiler cannot warn about it.

While here, move all local variables to the top of the function.
2003-05-31 19:55:05 +00:00
phk
3953441927 "break" rather than fall through to a break in the default clause.
Found by:       FlexeLint
2003-05-31 16:53:16 +00:00
phk
383c23d209 Introduce {be,le}_uuid_{enc,dec}() functions for explicitly encoding
and decoding UUID's in big endian and little endian binary format.
2003-05-31 16:47:07 +00:00
phk
0129a20107 The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
2003-05-31 16:42:45 +00:00
peter
6c537c22b4 Add __amd64__ to the ifdefs that introduce the "pcicfg" spinlock to
witness.

Approved by:  re (safe amd64 support)
2003-05-31 06:42:37 +00:00
mux
2618a96d5e When loading a module that contains a sysctl which is already compiled
in the kernel, the sysctl_register() call would fail, as expected.
However, when unloading this module again, the kernel would then panic
in sysctl_unregister().  Print a message error instead.

Submitted by:	Nicolai Petri <nicolai@catpipe.net>
Reviewed by:	imp
Approved by:	re@ (jhb)
2003-05-29 21:19:18 +00:00
dwmalone
2220e22eed Add an INVARIENTS only check to make sure Giant is held if mbuf
allocation is attempted with M_TRYWAIT.

Reviewed by:	bmilekic
Approved by:	re (scottl)
2003-05-29 18:38:24 +00:00
dwmalone
a02706ac93 Grab giant in sendit rather than kern_sendit because sockargs may
allocate mbufs with M_TRYWAIT, which may require Giant.

Reviewed by:	bmilekic
Approved by:	re (scottl)
2003-05-29 18:36:26 +00:00
iedowse
0253303b1a In cluster_wbuild(), initialise b_iocmd to BIO_WRITE before calling
buf_start() to avoid triggering a panic in softdep_disk_io_initiation()
if b_iocmd happened to be BIO_READ. The later initialisation of
b_iocmd in cluster_wbuild() could probably be moved to before the
buf_start() call, but this patch keeps the change as simple as
possible.

This is reported to fix occasional "softdep_disk_io_initiation: read"
panics, especially on NFS servers.

Reported by:	Nick Hilliard <nick@netability.ie>
Tested by:	Nick Hilliard <nick@netability.ie>
Approved by:	re (rwatson)
2003-05-28 13:22:10 +00:00
peter
54155a49a5 Copy the va_list in sbuf_vprintf() before passing it to vsnprintf(),
because we could fail due to a small buffer and loop and rerun.  If this
happens, then the vsnprintf() will have already taken the arguments off
the va_list.  For i386 and others, this doesn't matter because the
va_list type is a passed as a copy.  But on powerpc and amd64, this is
fatal because the va_list is a reference to an external structure that
keeps the vararg state due to the more complicated argument passing system.
On amd64, arguments can be passed as follows:
First 6 int/pointer type arguments go in registers, the rest go on
  the memory stack.
Float and double are similar, except using SSE registers.
long double (80 bit precision) are similar except using the x87 stack.
Where the 'next argument' comes from depends on how many have been
processed so far and what type it is.  For amd64, gcc keeps this state
somewhere that is referenced by the va_list.

I found a description that showed the va_copy was required here:
http://mirrors.ccs.neu.edu/cgi-bin/unixhelp/man-cgi?va_end+9
The single unix spec doesn't mention va_copy() at all.

Anyway, the problem was that the sysctl kern.geom.conf* nodes would panic
due to walking off the end of the va_arg lists in vsnprintf.  A better fix
would be to have sbuf_vprintf() use a single pass and call kvprintf()
with a callback function that stored the results and grew the buffer
as needed.

Approved by:	re (scottl)
2003-05-25 19:03:08 +00:00
jeff
4c8aa154ff - Create a new lock, umtx_lock, for use instead of the proc lock for
protecting the umtx queues.  We can't use the proc lock because we need
   to hold the lock across calls to casuptr, which can fault.

Approved by:	re
2003-05-25 18:18:32 +00:00
jeff
a4b79b551b - Reset the free ent to NULL if we have consumed the last free entry. This
fixes a problem where we would overwrite old data if we ran out of free
   entries.

Submitted by:	sam
Approved by:	re (scottl)
2003-05-25 08:48:42 +00:00
alc
53638c7027 Make the maximum number of vnodes a function of both the physical memory
size and the kernel's heap size, specifically, vm_kmem_size.  This
function allows a maximum of 40% of the vm_kmem_size to be used for
vnodes and vm objects.  This is a conservative bound based upon recent
problem reports.  (In other words, a slight increase in this percentage
may be safe.)

Finally, machines with less than ~3GB of RAM should be unaffected
by this change, i.e., the maximum number of vnodes should remain
the same.  If necessary, machines with 3GB or more of RAM can increase
the maximum number of vnodes by increasing vm_kmem_size.

Desired by:	scottl
Tested by:	jake
Approved by:	re (rwatson,scottl)
2003-05-23 19:54:02 +00:00
julian
117dadd4fc When we are spilling threads out of the run queue during panic, make sure we
keep the thread state variable consistent with its real state.
i.e. Don't say it's on the run queue when it isn't.

Also clarify the associated comment.

Turns a double panic back to a single panic :-/

Approved by:	re@ (jhb)
2003-05-21 18:53:25 +00:00
marcel
5d3af2c5ab Revamp of the syscall path, exception and context handling. The
prime objectives are:
o  Implement a syscall path based on the epc inststruction (see
   sys/ia64/ia64/syscall.s).
o  Revisit the places were we need to save and restore registers
   and define those contexts in terms of the register sets (see
   sys/ia64/include/_regset.h).

Secundairy objectives:
o  Remove the requirement to use contigmalloc for kernel stacks.
o  Better handling of the high FP registers for SMP systems.
o  Switch to the new cpu_switch() and cpu_throw() semantics.
o  Add a good unwinder to reconstruct contexts for the rare
   cases we need to (see sys/contrib/ia64/libuwx)

Many files are affected by this change. Functionally it boils
down to:
o  The EPC syscall doesn't preserve registers it does not need
   to preserve and places the arguments differently on the stack.
   This affects libc and truss.
o  The address of the kernel page directory (kptdir) had to
   be unstaticized for use by the nested TLB fault handler.
   The name has been changed to ia64_kptdir to avoid conflicts.
   The renaming affects libkvm.
o  The trapframe only contains the special registers and the
   scratch registers. For syscalls using the EPC syscall path
   no scratch registers are saved. This affects all places where
   the trapframe is accessed. Most notably the unaligned access
   handler, the signal delivery code and the debugger.
o  Context switching only partly saves the special registers
   and the preserved registers. This affects cpu_switch() and
   triggered the move to the new semantics, which additionally
   affects cpu_throw().
o  The high FP registers are either in the PCB or on some
   CPU. context switching for them is done lazily. This affects
   trap().
o  The mcontext has room for all registers, but not all of them
   have to be defined in all cases. This mostly affects signal
   delivery code now. The *context syscalls are as of yet still
   unimplemented.

Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.

Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.

This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.

Approved by: re@ (jhb)
2003-05-16 21:26:42 +00:00
truckman
80040f21a3 Detect that a vnode has been reclaimed while vflush() was waiting to lock
the vnode and restart the loop.  Vflush() is vulnerable since it does not
hold a reference to the vnode and it holds no other locks while waiting
for the vnode lock.  The vnode will no longer be on the list when the
loop is restarted.

Approved by:	re (rwatson)
2003-05-16 19:46:51 +00:00
obrien
384dc4a2a3 Fix long standing bug that prevents the PT_CONTINUE, PT_KILL and
PT_DETACH ptrace(2) requests from functioning as advertised in the
manual page.  As described in kern/35175, the PT_DETACH request will,
under certain circumstances, pass an unwanted signal on to the traced
process upan detaching from it.  The PT_CONTINUE request will
sometimes fail if you make it pass a signal that has "properties" that
differ from the properties of the signal that origionally caused the
traced process to be stopped.  Since PT_KILL is nothing than
PT_CONTINUE with SIGKILL, it is broken too.  In the PT_KILL case, this
leads to an unkillable process.

PR:		44011
Submitted by:	Mark Kettenis <kettenis@chello.nl>
Approved by:	re(jhb)
2003-05-16 01:34:23 +00:00
rwatson
1db54a2d45 VOP_PATHCONF() requires a vnode lock; this patch adds locking to
fpathconf(). The lock is held for direct calls to VOP_PATHCONF() in
pathconf() already.

Approved by:	re (jhb)
Pointed out by:	DEBUG_VFS_LOCKS
2003-05-15 21:13:08 +00:00
bmilekic
f48bcc48de Make the mb_alloc low-watermark sysctl-tunable read-only and make
netstat(1) not display it for now because its effects are not yet
completely implemented and we're about to cut 5.2-RELEASE.
This is temporary.

Approved by: re (scottl, rwatson)
2003-05-15 19:05:28 +00:00
ps
00084d3dc9 p_sigignore moved into struct sigacts. move one which was missed.
Approved by:	re (scottl)
2003-05-14 00:03:55 +00:00
jhb
89a4eb17de - Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
  M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
  sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
  that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
  and thread_stopped() are now MP safe.

Reviewed by:	arch@
Approved by:	re (rwatson)
2003-05-13 20:36:02 +00:00
jhb
824931292d In setitimer(2), if the it_value of the new itimer value is clear, then
don't add the current time to it, but leave it as clear so that when the
timer is disabled, the it_value is always clear.

Reviewed by:	bde
Approved by:	re (rwatson)
2003-05-13 19:21:46 +00:00
alc
0422418ef4 Optimize the use of splay in gbincore(). During a "make buildworld" the
desired buffer is found at one of the roots more than 60% of the time.
Thus, checking both roots before performing either splay eliminates
unnecessary splays on the first tree splayed.

Approved by:	re (jhb)
2003-05-13 04:36:02 +00:00
phk
d38d01f72d Bail out if there were not two loadable sections. Add XXX comment about
one other issue.

Approved by:	re/rwatson.
2003-05-12 15:08:10 +00:00
rwatson
c21d149f29 Remove bogus locking from DDB's "show lockedvnods" command: using
synchronization primitives from inside DDB is generally a bad idea,
and in this case it frequently results in panics due to DDB commands
being executed from the sio fast interrupt context on a serial
console.  Replace the locking with a note that a lack of locking
means that DDB may get see inconsistent views of the mount and vnode
lists, which could also result in a panic.  More frequently,
though, this avoids a panic than causes it.

Discussed with ages ago:	bde
Approved by:			re (scottl)
2003-05-12 14:37:47 +00:00
phk
587d476cf9 Don't pass NULL pointer to memset if we are compiled with DIAGNOSTIC
Approved by:	re/rwatson
2003-05-12 05:09:56 +00:00
bmilekic
988354fc06 Make m_freem() just use m_free() instead of duplicating the code. The
reason for the duplication was that m_freem() was meant to eventually
be optimized to hold the lock of the cache being freed to as long as
possible across frees but the difficulty of implementing said
optimization right now is too high, given that in some cases (see MAC
and non-cluster external buffers), we need to call into other subsytems,
something not permissible when the cache lock is held.

This change minimizes code duplication while keeping at least the
atomic mbuf+cluster free optimization.

Suggested by: luigi
2003-05-10 18:08:23 +00:00
jhb
9efb8e111e Remove Giant from kern_sigsuspend() and osigsuspend() as these should now
be MP safe.

Approved by:	re (scottl)
2003-05-09 19:11:32 +00:00
rwatson
1a81aef457 Rename MAC_MAX_POLICIES to MAC_MAX_SLOTS, since the variables and
constants in question refer to the number of label slots, not the
maximum number of policies that may be loaded.  This should reduce
confusion regarding an element in the MAC sysctl MIB, as well as
make it more clear what the affect of changing the compile-time
constants is.

Approved by:	re (jhb)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-08 19:49:42 +00:00
rwatson
cdd2c23d20 Clean up locking for the MAC Framework:
(1) Accept that we're now going to use mutexes, so don't attempt
    to avoid treating them as mutexes.  This cleans up locking
    accessor function names some.

(2) Rename variables to _mtx, _cv, _count, simplifying the naming.

(3) Add a new form of the _busy() primitive that conditionally
    makes the list busy: if there are entries on the list, bump
    the busy count.  If there are no entries, don't bump the busy
    count.  Return a boolean indicating whether or not the busy
    count was bumped.

(4) Break mac_policy_list into two lists: one with the same name
    holding dynamic policies, and a new list, mac_static_policy_list,
    which holds policies loaded before mac_late and without the
    unload flag set.  The static list may be accessed without
    holding the busy count, since it can't change at run-time.

(5) In general, prefer making the list busy conditionally, meaning
    we pay only one mutex lock per entry point if all modules are
    on the static list, rather than two (since we don't have to
    lower the busy count when we're done with the framework).  For
    systems running just Biba or MLS, this will halve the mutex
    accesses in the network stack, and may offer a substantial
    performance benefits.

(6) Lay the groundwork for a dynamic-free kernel option which
    eliminates all locking associated with dynamically loaded or
    unloaded policies, for pre-configured systems requiring
    maximum performance but less run-time flexibility.

These changes have been running for a few weeks on MAC development
branch systems.

Approved by:	re (jhb)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-07 17:49:24 +00:00
alc
d18bfec38d Lock the vm_object when performing vm_pager_deallocate(). 2003-05-06 02:45:28 +00:00
jhb
ad3e75f51e Tweak the clearing of TDF_DEADLKTREAT so that we only bother grabbing the
lock and clearing the flag if it was clear when uiomove() was called.
2003-05-05 21:27:29 +00:00
jhb
65572963c9 Mostly sort the includes. 2003-05-05 21:26:25 +00:00
jhb
099389efb0 Lock the proc lock around calls to tdsignal() in the sigwait() family of
syscalls.
2003-05-05 21:18:10 +00:00
jhb
755cc1e549 Make issignal() private to kern_sig.c since it is only called from cursig()
and cursig() is now a function rather than a macro.
2003-05-05 21:16:28 +00:00
jhb
828797f029 Remove TD_ON_RUNQ() from a check to make sure Giant is not held when
calling mi_switch().  The kernel would panic on an earlier KASSERT() in
mi_switch() if TD_ON_RUNQ() was true.
2003-05-05 21:12:36 +00:00
dwmalone
86e87d5e2d Split sendit into two parts. The first part, still called sendit, that
does the copyin stuff and then calls the second part kern_sendit to do
the hard work. Don't bother holding Giant during the copyin phase.

The intent of this is to allow the Linux emulator to impliment send*
syscalls without using the stackgap.
2003-05-05 20:33:38 +00:00
mbr
98d5255d63 Change the semantics of sysv shm emulation to take a additional
argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}.
The BSD semantics didn't allow the usage of shared segment after
being marked for removal through IPC_RMID.

The patch involves the following functions:
  - shmat
  - shmctl
  - shm_find_segment_by_shmid
  - shm_find_segment_by_shmidx
  - linux_shmat
  - linux_shmctl

Submitted by:	Orlando Bassotto <orlando.bassotto@ieo-research.it>
Reviewed by:	marcel
2003-05-05 09:22:58 +00:00
phk
091a67c527 Add two KASSERTS which trigger if free(9) would drag the "memuse" statistic
for a malloc bucket under zero.  This typically happens if you malloc(9)
from one bucket and free to another.
2003-05-05 08:32:53 +00:00
phk
d8ee9fd2c6 Use le32dec() instead of le32toh() because we are not guaranteed to have
a word aligned input.
2003-05-05 07:22:35 +00:00
alc
410b675ed9 - Revert kern/vfs_subr.c revision 1.444. The vm_object's size isn't
trustworthy for vnode-backed objects.
 - Restore the old behavior of vm_object_page_remove() when the end
   of the given range is zero.  Add a comment to vm_object_page_remove()
   regarding this behavior.

Reported by:	iedowse
2003-05-03 08:09:24 +00:00
alc
d282523893 Lock access to the vm_object's flags in vop_stdcreatevobject(). 2003-05-02 19:33:21 +00:00
julian
910b47d3e7 Fix typo in last commit 2003-05-02 06:18:55 +00:00
silby
f449396167 Add the M_FREELIST flag, which is used to detect whenever a
double free of a mbuf occurs and cause an immediate panic, rather
than allowing free list corruption to occur.

This code is trapped under INVARIANTS, so it should not cause any
change in default performance.

Reviewed by:	a bunch of people on -net
MFC after:	1 week
2003-05-02 03:43:40 +00:00