Commit Graph

9638 Commits

Author SHA1 Message Date
David Xu
81273e0632 Make sure we get new m_owner value if we can not unlock it in
uncontested case. Reorder statements in do_unlock_umutex.
2006-09-02 02:41:33 +00:00
Wayne Salamon
ae1078d657 Audit the argv and env vectors passed in on exec:
Add the argument auditing functions for argv and env.
  Add kernel-specific versions of the tokenizer functions for the
  arg and env represented as a char array.
  Implement the AUDIT_ARGV and AUDIT_ARGE audit policy commands to
  enable/disable argv/env auditing.
  Call the argument auditing from the exec system calls.

Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)
2006-09-01 11:45:40 +00:00
David Xu
8a156460bf Reorder some statments. Fix typo and remove stale comments. 2006-08-30 23:59:45 +00:00
David Xu
a324b5ecd3 Update comments about interrupted mutex locking. 2006-08-28 07:09:27 +00:00
David Xu
cd42ca3c27 Regenerate. 2006-08-28 04:28:25 +00:00
David Xu
d10183d94d This is initial version of POSIX priority mutex support, a new userland
mutex structure is added as following:
struct umutex {
        __lwpid_t       m_owner;
        uint32_t        m_flags;
        uint32_t        m_ceilings[2];
        uint32_t        m_spare[4];
};
The m_owner represents owner thread, it is a thread id, in non-contested
case, userland can simply use atomic_cmpset_int to lock the mutex, if the
mutex is contested, high order bit will be set, and userland should do locking
and unlocking via kernel syscall. Flag UMUTEX_PRIO_INHERIT represents
pthread's PTHREAD_PRIO_INHERIT mutex, which when contention happens, kernel
should do priority propagating. Flag UMUTEX_PRIO_PROTECT indicates it is
pthread's PTHREAD_PRIO_PROTECT mutex, userland should initialize m_owner
to contested state UMUTEX_CONTESTED, then atomic_cmpset_int will be failure
and kernel syscall should be invoked to do locking, this becauses
for such a mutex, kernel should always boost the thread's priority before
it can lock the mutex, m_ceilings is used by PTHREAD_PRIO_PROTECT mutex,
the first element is used to boost thread's priority when it locked the mutex,
second element is used when the mutex is unlocked, the PTHREAD_PRIO_PROTECT
mutex's link list is kept in userland, the m_ceiling[1] is managed by thread
library so kernel needn't allocate memory to keep the link list, when such
a mutex is unlocked, kernel reset m_owner to UMUTEX_CONTESTED.
Flag USYNC_PROCESS_SHARED indicate if the synchronization object is process
shared, if the flag is not set, it saves a vm_map_lookup() call.

The umtx chain is still used as a sleep queue, when a thread is blocked on
PTHREAD_PRIO_INHERIT mutex, a umtx_pi is allocated to support priority
propagating, it is dynamically allocated and reference count is used,
it is not optimized but works well in my tests, while the umtx chain has
its own locking protocol, the priority propagating protocol are all protected
by sched_lock because priority propagating function is called with sched_lock
held from scheduler.

No visible performance degradation is found which these changes. Some parameter
names in _umtx_op syscall are renamed.
2006-08-28 04:24:51 +00:00
Marius Strobl
aed760ef8a Fix another bug introduced with rev. 1.204; in vfs_donmount() if
the 'vfs_getopt(optlist, "errmsg", (void **)&errmsg, &errmsg_len)'
call fails, 'errmsg' is left uninitialized, making the later tests
against NULL meaningless, and the uses bogus. Thus initialize
'errmsg' to NULL beforehand. [1]
While at it, remove the superfluous assignment of 0 to 'errmsg_len'
if the above mentioned call fails as it's already initialized to 0.

Submitted by:	Michael Plass [1]
2006-08-26 16:28:19 +00:00
Suleiman Souhlal
bec31a8fee The "taskqueue_fast" spinlocks were renamed to "fast_taskqueue" in
subr_taskqueue.c:r1.32

Reported by:	rdivacky
2006-08-26 11:21:25 +00:00
Pawel Jakub Dawidek
bebabf24bb Fix comment. 2006-08-25 15:13:49 +00:00
David Xu
fd4a6d10a4 Same as previous change, the user provided priority should be reversed
too.
2006-08-25 10:05:30 +00:00
David Xu
4386313871 Initialize kg_base_user_pri. 2006-08-25 06:29:16 +00:00
David Xu
3db720fdce Add user priority loaning code to support priority propagation for
1:1 threading's POSIX priority mutexes, the code is no-op unless
priority-aware umtx code is committed.
2006-08-25 06:12:53 +00:00
Marius Strobl
3a30d178fe Fix a bug introduced with rev. 1.204; in vfs_donmount() use
copyout(9) instead of copystr(9) for copying the errmsg from
kernel- to user-space. This fixes a panic on sparc64 when
using the nmount(2)-converted mountd(8).
While at it, use bcopy(3) instead of strncpy(3) in the kernel-
to kernel-space case for consistency with vfs_buildopts() and
between kernel- to user-space and kernel- to kernel-space case.
2006-08-24 18:52:28 +00:00
David Xu
de08f4ee5c POSIX requires that higher numerical values for the priority represent
higher priorities, so we should reverse the passed value here.
2006-08-23 07:22:25 +00:00
Colin Percival
23a28f3a0d Fix a signedness bug.
MFC after:	3 days
Security:	Local DoS
2006-08-20 10:29:08 +00:00
George V. Neville-Neil
daa5817e92 Fix a kernel panic based on receiving an ICMPv6 Packet too Big message.
PR:		99779
Submitted by:	Jinmei Tatuya
Reviewed by:	clement, rwatson
MFC after:	1 week
2006-08-18 14:05:13 +00:00
Peter Wemm
bad9a7a5f9 Grab two syscall numbers. One is used to emulate functionality that linux
has in its procfs (do a readlink of /proc/self/fd/<nn> to find the pathname
that corresponds to a given file descriptor).  Valgrind-3.x needs this
functionality.  This is a placeholder only at this time.
2006-08-16 22:32:50 +00:00
Colin Percival
e2d70dbae1 Swap the names "sem_exithook" and "sem_exechook" in the previous commit to
match up with reality and the prototype definitions.

Register the sem_exechook as the "process_exec" event handler, not
sem_exithook.

Submitted by:	rdivacky
Sponsored by:	SoC 2006
2006-08-16 08:25:40 +00:00
John Baldwin
462a7add8e Add a new 'show sleepchain' ddb command similar to 'show lockchain' except
that it operates on lockmgr and sx locks.  This can be useful for tracking
down vnode deadlocks in VFS for example.  Note that this command is a bit
more fragile than 'show lockchain' as we have to poke around at the
wait channel of a thread to see if it points to either a struct lock or
a condition variable inside of a struct sx.  If td_wchan points to
something unmapped, then this command will terminate early due to a fault,
but no harm will be done.
2006-08-15 18:29:01 +00:00
John Baldwin
0fa2168b19 - When spinning on a spin lock, if the debugger is active or we are in a
panic, go ahead and do the longer DELAY(1) spin wait.
- If we panic due to spinning too long, print out a few more details
  including the pointer to the mutex in question and the tid of the owning
  thread.
2006-08-15 18:26:12 +00:00
John Baldwin
f8f1f7fb85 Regen to propogate <prefix>_AUE_<mumble> changes as well as the earlier
systrace changes.
2006-08-15 17:37:01 +00:00
John Baldwin
52a79796c4 Add a new set of macros <prefix>_AUE_<syscallname> to sysproto.h that
map to the audit event associated with a specific system call.  For
example, SYS_AUE___semctl would be set to AUE_SEMCTL in sys/sysproto.h.
2006-08-15 17:09:32 +00:00
John Baldwin
589201fd4e - Use NOSTD rather than NOIMPL for nfssvc() to match other syscalls
provided via klds.
- Correct audit identifier for nfssvc().
2006-08-15 16:45:41 +00:00
John Baldwin
77e662683b Rename 'show lockchain' to 'show locktree' and 'show threadchain' to
'show lockchain'.  The churn is because I'm about to add a new
'show sleepchain' similar to 'show lockchain' for sleep locks (lockmgr and
sx) and 'show threadchain' was a bit ambiguous as both commands show
a chain of thread dependencies, 'lockchain' is for non-sleepable locks
(mtx and rw) and 'sleepchain' is for sleepable locks.
2006-08-15 16:44:18 +00:00
John Baldwin
be6847d729 Add a 'show lockmgr' command that dumps the relevant details of a lockmgr
lock.
2006-08-15 16:42:16 +00:00
Alexander Leidinger
993182e57c - Change process_exec function handlers prototype to include struct
image_params arg.
- Change struct image_params to include struct sysentvec pointer and
  initialize it.
- Change all consumers of process_exit/process_exec eventhandlers to
  new prototypes (includes splitting up into distinct exec/exit functions).
- Add eventhandler to userret.

Sponsored by:		Google SoC 2006
Submitted by:		rdivacky
Parts suggested by:	jhb (on hackers@)
2006-08-15 12:10:57 +00:00
Robert Watson
b7e2f3ec76 Minor white space tweaks. 2006-08-13 23:16:59 +00:00
Alan Cox
5d1445cdf2 Reduce the scope of the page queues lock in vm_pgmoveco() now that
vm_page_sleep_if_busy() no longer requires the page queue lock to be held.

Correctly spell "TRUE".
2006-08-12 19:47:49 +00:00
Robert Watson
79ad81c06d Before performing a sodealloc() when pru_attach() fails, assert that
the socket refcount remains 1, and then drop to 0 before freeing the
socket.

PR:		101763
Reported by:	Gleb Kozyrev <gkozyrev at ukr dot net>
2006-08-11 23:03:10 +00:00
Pawel Jakub Dawidek
04d9e255df getnewvnode() can be called with NULL mp.
Found by:	Coverity Prevent (tm)
Coverity ID:	1521
Confirmed by:	phk
2006-08-10 08:56:03 +00:00
Alan Cox
5786be7cc7 Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags.  Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
2006-08-09 17:43:27 +00:00
Pawel Jakub Dawidek
13c85d339d Add a bandaid to avoid a deadlock in a situation, when we are trying to suspend
a file system, but need to obtain a vnode. We may not be able to do it, because
all vnodes could be already in use and other processes cannot release them,
because they are waiting in "suspfs" state.

In such situation, we allow to allocate a vnode anyway.

This is a temporary fix - there is no backpressure to free vnodes allocated in
those circumstances.

MFC after:	1 week
Reviewed by:	tegge
2006-08-09 12:47:30 +00:00
Alan Cox
ab83ac429d Reduce the scope of the page queues lock in vfs_busy_pages() now that
vm_page_sleep_if_busy() no longer requires the caller to hold the page
queues lock.
2006-08-08 06:00:49 +00:00
Robert Watson
e4445a031f Move definition of UNIX domain socket protosw and domain entries from
uipc_proto.c to uipc_usrreq.c, making localdomain static.  Remove
uipc_proto.c as it's no longer used.  With this change, UNIX domain
sockets are entirely encapsulated in uipc_usrreq.c.
2006-08-07 12:02:43 +00:00
Robert Watson
ccdebe46bd Improve commenting of vaccess(), making sure to be clear that the ifdef
capabilities code is there for reference and never actually used.  Slight
style tweak.
2006-08-06 10:43:35 +00:00
Robert Watson
52b384621e Don't set pru_sosend, pru_soreceive, pru_sopoll to default values, as they
are already set to default values.
2006-08-06 10:39:21 +00:00
Alan Cox
7c4b7ecc4c Reduce the scope of the page queues lock in kern_sendfile() now that
vm_page_sleep_if_busy() no longer requires the caller to hold the page
queues lock.
2006-08-06 01:00:09 +00:00
Robert Watson
5111b5e180 Remove register, use ANSI function headers. 2006-08-05 21:40:59 +00:00
Robert Watson
12de451046 We now spell "inode" as "vnode" in the VFS layer, so update comment
for new world order.

MFC after:	3 days
Pointed out by:	mckusick
2006-08-05 21:08:47 +00:00
John Birrell
a4bc5ae534 Add support for the generated file systrace_args.c. 2006-08-05 19:25:14 +00:00
Yaroslav Tykhiy
776fc0e90e Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR:		misc/101245
Submitted by:	Darren Pilgrim <darren pilgrim bitfreak org>
Tested by:	md5(1)
MFC after:	1 week
2006-08-04 07:56:35 +00:00
Alan Cox
10c09f3f61 The page queues lock is no longer required by vm_page_io_start(). Reduce
the scope of the page queues lock in kern_sendfile() accordingly.
2006-08-04 05:53:20 +00:00
John Birrell
2826f17433 Report the correct function name in a DPRINTF. 2006-08-03 21:19:13 +00:00
John Birrell
b9279e66e4 Regen.
Note the addition of the extra file now generated.
2006-08-03 05:32:43 +00:00
John Birrell
1533c33fd4 Generate another file called systrace_args.c. This will be compiled
into systrace and is used to map the syscall arguments into the 64-bit
parameter array.
2006-08-03 05:29:09 +00:00
Robert Watson
9126410f4b Move destroying kqueue state from above pru_detach to below it in
sofree(), as a number of protocols expect to be able to call
soisdisconnected() during detach.  That may not be a good assumption,
but until I'm sure if it's a good assumption or not, allow it.
2006-08-02 18:37:44 +00:00
Robert Watson
92716fe04e Change two XXX's to two notes: the fact that SOCK_LOCK(so) ==
SOCKBUF_LOCK(&so->so_rcv) is encoded, which is worth noting, but not a
bug.
2006-08-02 16:23:52 +00:00
John Baldwin
9802d04ce0 Fix some bugs in the previous revision (1.419). Don't perform extra
vfs_rel() on the mountpoint if the MAC checks fail in kern_statfs() and
kern_fstatfs().  Similarly, don't perform an extra vfs_rel() if we get
a doomed vnode in kern_fstatfs(), and handle the case of mp being NULL
(for some doomed vnodes) by conditionalizing the vfs_rel() in
kern_fstatfs() on mp != NULL.

CID:		1517
Found by:	Coverity Prevent (tm) (kern_fstatfs())
Pointy hat to:	jhb
2006-08-02 15:27:48 +00:00
Robert Watson
f8b20fb6d6 Remove now unneeded ENOTCONN clause from SOCK_DGRAM side of uipc_send():
we have to check it regardless of the target address, so don't check it
twice.
2006-08-02 14:30:58 +00:00
Robert Watson
050ac26521 Remove 'register'.
Use ANSI C prototypes/function headers.
More deterministically line wrap comments.
2006-08-02 13:01:58 +00:00
David Xu
64511d2abc Don't include sys/thr.h and umtx.h in sys/sysproto.h, it is unnecessary. 2006-08-02 08:09:24 +00:00
David Xu
aff5bcb1b2 INT_MAX is defined in file sys/limits.h, include the file now. 2006-08-02 07:34:51 +00:00
Robert Watson
c0e1415d51 Move updated of 'numopensockets' from bottom of sodealloc() to the top,
eliminating a second set of identical mutex operations at the bottom.
This allows brief exceeding of the max sockets limit, but only by
sockets in the last stages of being torn down.
2006-08-02 00:45:27 +00:00
John Baldwin
03e161fdb1 Make system call modules a bit more robust:
- If we fail to register the system call during MOD_LOAD, then note that
  so that we don't try to deregister it or invoke the chained event handler
  during the subsequent MOD_UNLOAD event.  Doing the deregister when the
  register failed could result in trashing system call entries.
- Add a SI_SUB_SYSCALLS just before starting up init and use that to
  register syscall modules instead of SI_SUB_DRIVERS.  Registering system
  calls as late as possible increases the chances that any other module
  event handlers or SYSINITs in a module are executed to initialize the
  data in a kld before a syscall dependent on that data is able to be
  invoked.

MFC after:	3 days
2006-08-01 16:32:20 +00:00
John Baldwin
38affe135a Don't lock each of the processes while looking for a pid. The allproc and
proctree locks that we already hold provide sufficient protection.
2006-08-01 15:30:56 +00:00
Robert Watson
eaa6dfbcc2 Reimplement socket buffer tear-down in sofree(): as the socket is no
longer referenced by other threads (hence our freeing it), we don't need
to set the can't send and can't receive flags, wake up the consumers,
perform two levels of locking, etc.  Implement a fast-path teardown,
sbdestroy(), which flushes and releases each socket buffer.  A manual
dom_dispose of the receive buffer is still required explicitly to GC
any in-flight file descriptors, etc, before flushing the buffer.

This results in a 9% UP performance improvement and 16% SMP performance
improvement on a tight loop of socket();close(); in micro-benchmarking,
but will likely also affect CPU-bound macro-benchmark performance.
2006-08-01 10:30:26 +00:00
Robert Watson
b5ff091431 Close a race that occurs when using sendto() to connect and send on a
UNIX domain socket at the same time as the remote host is closing the
new connections as quickly as they open.  Since the connect() and
send() paths are non-atomic with respect to another, it is possible
for the second thread's close() call to disconnect the two sockets
as connect() returns, leading to the consumer (which plans to send())
with a NULL kernel pointer to its proposed peer.  As a result, after
acquiring the UNIX domain socket subsystem lock, we need to revalidate
the connection pointers even though connect() has technically succeed,
and reurn an error to say that there's no connection on which to
perform the send.

We might want to rethink the specific errno number, perhaps ECONNRESET
would be better.

PR:		100940
Reported by:	Young Hyun <youngh at caida dot org>
MFC after:	2 weeks
MFC note:	Some adaptation will be required
2006-07-31 23:00:05 +00:00
John Baldwin
53c9158f24 Trim an obsolete comment. ktrgenio() stopped doing crazy gymnastics when
ktrace was redone to be mostly synchronous again.
2006-07-31 15:31:43 +00:00
John Baldwin
91ce2694d1 Regen for MPSAFE flag removal. 2006-07-28 19:08:37 +00:00
John Baldwin
af5bf12239 Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
  syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.
2006-07-28 19:05:28 +00:00
John Baldwin
e0b4add8d8 Various fixes to comments in the syscall master files including removing
cruft from the audit import and adding mention of COMPAT4 to freebsd32.
2006-07-28 18:55:18 +00:00
John Baldwin
764e4d54e9 Adjust td_locks for non-spin mutexes, rwlocks, and sx locks so that it is
a count of all non-spin locks, not just lockmgr locks.  This can give us a
much cheaper way to see if we have any locks held (such as when returning
to userland via userret()) without requiring WITNESS.

MFC after:	1 week
2006-07-27 21:45:55 +00:00
John Baldwin
ea175645b4 Hold the reference on the mountpoint slightly longer in kern_statfs() and
kern_fstatfs() so that it is still held when prison_enforce_statfs() is
called (since that function likes to poke and prod at the mountpoint
structure).

MFC after:	3 days
2006-07-27 20:00:27 +00:00
John Baldwin
186abbd727 Write a magic value into mtx_lock when destroying a mutex that will force
all other mtx_lock() operations to block.  Previously, when the mutex was
destroyed, it would still have a valid value in mtx_lock(): either the
unowned cookie, which would allow a subsequent mtx_lock() to succeed, or a
pointer to the thread who destroyed the mutex if the mutex was locked when
it was destroyed.

MFC after:	3 days
2006-07-27 19:58:18 +00:00
John Baldwin
f30e89ced3 Fix a file descriptor race I reintroduced when I split accept1() up into
kern_accept() and accept1().  If another thread closed the new file
descriptor and the first thread later got an error trying to copyout the
socket address, then it would attempt to close the wrong file object.  To
fix, add a struct file ** argument to kern_accept().  If it is non-NULL,
then on success kern_accept() will store a pointer to the new file object
there and not release any of the references.  It is up to the calling code
to drop the references appropriately (including a call to fdclose() in case
of error to safely handle the aforementioned race).  While I'm at it, go
ahead and fix the svr4 streams code to not leak the accept fd if it gets an
error trying to copyout the streams structures.
2006-07-27 19:54:41 +00:00
Robert Watson
0075d85869 Remove call to soisdisconnected() in uipc_detach(), since it will already
have been invoked by uipc_close() or uipc_abort(), and the socket is in a
state of being torn down by the time we get to this point, so kqueue
state frobbed by soisdisconnected() is not available, so frobbing it will
result in a panic.

Reported by:	Munehiro Matsuda <haro at h4 dot dion dot ne dot jp>
2006-07-26 19:16:34 +00:00
Robert Watson
f14cce87dc Remove non-socket buffer routines from uipc_sockbuf.c, and socket buffer
specific routines from uipc_socket2.c following repo-copy.  We might
rethink the location of one or two at some point, but the division was
relatively clean.  uipc_sockbuf.c is now the home of routines that
manipulate socket buffers.
2006-07-24 16:21:31 +00:00
Robert Watson
b0668f7151 soreceive_generic(), and sopoll_generic(). Add new functions sosend(),
soreceive(), and sopoll(), which are wrappers for pru_sosend,
pru_soreceive, and pru_sopoll, and are now used univerally by socket
consumers rather than either directly invoking the old so*() functions
or directly invoking the protocol switch method (about an even split
prior to this commit).

This completes an architectural change that was begun in 1996 to permit
protocols to provide substitute implementations, as now used by UDP.
Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to
perform these operations on sockets -- in particular, distributed file
systems and socket system calls.

Architectural head nod:	sam, gnn, wollman
2006-07-24 15:20:08 +00:00
Robert Watson
ca948c5e93 Remove duplicate 'or'.
Submitted by:	ru
2006-07-23 21:01:09 +00:00
Robert Watson
809c2b789c Update various uipc_socket.c comments, and reformat others. 2006-07-23 20:36:04 +00:00
Robert Watson
f23929fbc5 Add additional comments to the top of the UNIX domain socket implementation
providing some high level pointers regarding the implementation.
2006-07-23 20:06:45 +00:00
Robert Watson
4b19d603c4 Remove old kern.malloc sysctl, which generated a text representation of
the kernel malloc(9) state for vmstat -m.  libmemstat is now used to
generate a machine-readable version which is converged by vmstat -m
into a human-readable version.

Not for MFC.
2006-07-23 19:55:41 +00:00
Robert Watson
0ce3f16dbb Expand comments for malloc(9) to better describe the design and
statistics / memory types model.
2006-07-23 19:51:39 +00:00
Robert Watson
fb6d736d14 Update and reformat comments for POSIX.1e ACL utility routines. 2006-07-23 19:35:10 +00:00
Robert Watson
4f1f0ef523 Add two new unpcb flags, UNP_BINDING and UNP_CONNECTING, which will be
used to mark UNIX domain sockets as being in the process of binding or
connecting.  Use these to prevent simultaneous bind or connect
operations by multiple threads or processes on the same socket at the
same time, which closes race conditions present in the UNIX domain
socket implementation since inception.
2006-07-23 12:01:14 +00:00
Robert Watson
dd47f5ca9c Merge unp_bind() into uipc_bind(), as it is called only from uipc_bind(). 2006-07-23 11:02:12 +00:00
Robert Watson
6d32873c29 Since unp_attach() and unp_detach() are now called only from uipc_attach()
and uipc_detach(), merge them into their calling functions.
2006-07-23 10:25:28 +00:00
Robert Watson
7e711c3aae Move various UNIX socket global variables and sysctls from the middle of
the file to the top.
2006-07-23 10:19:04 +00:00
Robert Watson
f3f49bbbe8 In uipc_send() and uipc_rcvd(), store unp->unp_conn pointer in unp2
while working with the second unpcb to make the code more clear.
2006-07-22 18:41:42 +00:00
Robert Watson
1c381b19ff Re-wrap and other minor formatting and punctuation fixes for UNIX domain
socket comments.
2006-07-22 17:24:55 +00:00
John Baldwin
b04aff773e Add a comment to explain what fdclose() does and what it's purpose is
since the subtlety eluded me when I looked at it last week.
2006-07-21 20:24:00 +00:00
Robert Watson
a152f8a361 Change semantics of socket close and detach. Add a new protocol switch
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket.  pru_abort is now a
notification of close also, and no longer detaches.  pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket.  This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.

This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree().  With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.

Reviewed by:	gnn
2006-07-21 17:11:15 +00:00
Alan Cox
af51d7bf57 Eliminate OBJ_WRITEABLE. It hasn't been used in a long time. 2006-07-21 06:40:29 +00:00
John Baldwin
9079458ad2 Add a mutex to protect the list of interrupt config hooks. We do assume
that the only remove hook operation that can occur while processing the
hooks is to remove the currently executing hook.  This should be safe as
the existing code has assumed this already for a long time now.

Reviewed by:	scottl
MFC after:	1 week
2006-07-19 18:53:56 +00:00
John Baldwin
2f198e899a Call change_dir() instead of duplicating the code in fchdir(). 2006-07-19 18:30:33 +00:00
John Baldwin
b33887ea31 Don't free the sockaddr in kern_bind() and kern_connect() as not all
callers pass a sockaddr allocated via malloc() from M_SONAME anymore.
Instead, free it in the callers when necessary.
2006-07-19 18:28:52 +00:00
Stefan Farfeleder
c6e0a843cf Separate functions with a newline. 2006-07-17 21:00:42 +00:00
Poul-Henning Kamp
9c499ad92f Remove the NDEVFSINO and NDEVFSOVERFLOW options which no longer exists in
DEVFS.

Remove the opt_devfs.h file now that it is empty.
2006-07-17 09:07:02 +00:00
Robert Watson
5cd1a27145 Change comment on soabort() to more accurately describe how/when
soabort() is used.  Remove trailing white space.
2006-07-16 23:09:39 +00:00
Alan Cox
27ea29536c Enable debug.mpsafevfs by default on arm. Since every architecture except
powerpc has debug.mpsafevfs enabled by default, it is shorter to enumerate
the architectures on which debug.mpsafevfs is off.

Tested by: cognet@
2006-07-15 06:44:27 +00:00
Jung-uk Kim
8120ddb4c4 Let native elf class be registered earlier. 2006-07-14 22:39:18 +00:00
Pawel Jakub Dawidek
338ae5268b Remove duplicated #include. 2006-07-14 17:55:36 +00:00
David Xu
24af5900eb Backout the feature which can change thread's scheduling option, I really
don't want to mix process and thread scheduling options together in these
functions, now the thread scheduling option is implemented in new thr
syscalls.
2006-07-13 06:41:26 +00:00
David Xu
ba493ceb6b regenerate. 2006-07-13 06:32:55 +00:00
David Xu
60088160c9 Add syscalls thr_setscheduler, thr_getscheduler, and thr_setschedparam,
these syscalls are designed to set thread's scheduling parameters and
policy, because each syscall contains a size parameter, it is possible
to support future scheduling option, e.g SCHED_SPORADIC, this option
needs other fields in structure sched_param, current they are not
avaiblable.
2006-07-13 06:26:43 +00:00
John Baldwin
fed7988436 Honor db_pager_quit in 'show threadchain', 'show allchains', and
'show lockchain'.  This is especially helpful for the first 2 as a
threadchain could get stuck in an infinite loop during a mutex deadlock.
2006-07-12 21:25:24 +00:00
John Baldwin
19e9205a23 Simplify the pager support in DDB. Allowing different db commands to
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer).  So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt.  Also, now that it's easy to do so,
enable paging by default for all ddb commands.  Any command that wishes to
honor the quit flag can do so by checking db_pager_quit.  Note that the
pager can also be effectively disabled by setting $lines to 0.

Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
  terminates early.
- 'show intr' now actually checks the quit flag and terminates early.
2006-07-12 21:22:44 +00:00
Konstantin Belousov
3097d55a39 Use proper format specifier for pointers in debug printfs (turned off
by default).

Approved by:	pjd (mentor)
MFC after:	2 weeks
2006-07-12 11:41:53 +00:00
David Xu
a94d3e1f8a Use newkg to check if SCHED_OTHER is already inherited. 2006-07-12 07:02:28 +00:00
David Xu
c3ab507fcd Return priority range 0..PRI_MAX_TIMESHARE-PRI_MIN_TIMESHARE for
SCHED_OTHER, the same range as rtprio() is using. In old code,
it returns nice range -20 .. 20, nice should be treated as process
weight, it is really managed by getpriority() and setpriority()
syscalls, they are different.
2006-07-12 05:54:17 +00:00
Robert Watson
5908c617bb Several protocol switch functions (pru_abort, pru_detach, pru_sosetlabel)
return void, so don't implement no-op versions of these functions.
Instead, consistently check if those switch pointers are NULL before
invoking them.
2006-07-11 23:18:28 +00:00
Robert Watson
f949ae9b31 When pru_attach() fails, call sodealloc() on the socket rather than
using sorele() and the full tear-down path.  Since protocol state
allocation failed, this is not required (and is arguably undesirable).
This matches the behavior of sonewconn() under the same circumstances.
2006-07-11 21:56:58 +00:00
Robert Watson
337cc6b60e Reduce periods of simultaneous acquisition of various socket buffer
locks and the unplock during uipc_rcvd() and uipc_send() by caching
certain values from one structure while its locks are held, and
applying them to a second structure while its locks are held.  If
done carefully, this should be correct, and will reduce the amount
of work done with the global unp lock held.

Tested by:	kris (earlier version)
2006-07-11 21:49:54 +00:00
John Baldwin
90aff9de2d Regen. 2006-07-11 20:55:23 +00:00
John Baldwin
be5747d5b5 - Add conditional VFS Giant locking to getdents_common() (linux ABIs),
ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(),
  and svr4_sys_getdents64() similar to that in getdirentries().
- Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(),
  linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and
  svr4_sys_getdents64() MPSAFE.
2006-07-11 20:52:08 +00:00
David Xu
2dca4ca723 Don't forget to check invalid policy! 2006-07-11 08:19:57 +00:00
David Xu
006faeb831 Oops, remove debugger line. 2006-07-11 06:15:46 +00:00
David Xu
65343c788c Extended the POSIX scheduler APIs to accept lwpid as well, we've already
done this in ptrace syscall, when a pid is large than PID_MAX, the syscall
will search a thread in current process. It permits 1:1 thread library to
get and set a thread's scheduler attributes.
2006-07-11 06:11:34 +00:00
David Xu
2f26f4c66c For SCHED_OTHER, we always inherit current thread's interactive priority
unless current thread is realtime thread, in such case, we set a new zero
priority for it, notice we don't have per-thread nice, the priority
passed by userland is ignored here.
2006-07-11 06:01:14 +00:00
David Xu
a0712c99d0 Add POSIX scheduler parameters support to thr_new syscall, this permits
privileged process to create realtime thread.
2006-07-11 05:34:35 +00:00
David Xu
adc9c950af Create thread in separated ksegrp, so they always get correct user level
priority.
2006-07-10 23:14:07 +00:00
John Baldwin
c870740e09 - Split out kern_accept(), kern_getpeername(), and kern_getsockname() for
use by ABI emulators.
- Alter the interface of kern_recvit() somewhat.  Specifically, go ahead
  and hard code UIO_USERSPACE in the uio as that's what all the callers
  specify.  In place, add a new uioseg to indicate what type of pointer
  is in mp->msg_name.  Previously it was always a userland address, but
  ABI emulators may pass in kernel-side sockaddrs.  Also, remove the
  namelenp field and instead require the two places that used it to
  explicitly copy mp->msg_namelen out to userland.
- Use the patched kern_recvit() to replace svr4_recvit() and the stock
  kern_sendit() to replace svr4_sendit().
- Use kern_bind() instead of stackgap use in ti_bind().
- Use kern_getpeername() and kern_getsockname() instead of stackgap in
  svr4_stream_ti_ioctl().
- Use kern_connect() instead of stackgap in svr4_do_putmsg().
- Use kern_getpeername() and kern_accept() instead of stackgap in
  svr4_do_getmsg().
- Retire the stackgap from SVR4 compat as it is no longer used.
2006-07-10 21:38:17 +00:00
John Baldwin
0f8e0c3dd4 Explicitly use STAILQ_REMOVE_HEAD() when we know we are removing the head
element to avoid confusing Coverity.  It's now also easier for humans to
parse as well.

Found by:	Coverity Prevent(tm)
CID:		1201
2006-07-10 19:28:57 +00:00
John Baldwin
0bf8969c60 Fix two more instances of using a linker_file_t object in TAILQ() macros
after free'ing it.

Found by:	Coverity Prevent(tm)
CID:		1435
2006-07-10 19:13:45 +00:00
John Baldwin
6b5b470aea Don't try to reuse the linker_file structure after we've freed it when
throwing out the kld's loaded by the loader that didn't successfully link.

Found by:	Coverity Prevent(tm)
CID:		1435
2006-07-10 19:06:01 +00:00
Scott Long
e3546a7549 Use a sleep mutex instead of an sx lock for the kernel environment. This
allows greater flexibility for drivers that want to query the environment.

Reviewed by: jhb, mux
2006-07-09 21:42:58 +00:00
John Baldwin
d9f4623307 - Split ioctl() up into ioctl() and kern_ioctl(). The kern_ioctl() assumes
that the 'data' pointer is already setup to point to a valid KVM buffer
  or contains the copied-in data from userland as appropriate (ioctl(2)
  still does this).  kern_ioctl() takes care of looking up a file pointer,
  implementing FIONCLEX and FIOCLEX, and calling fi_ioctl().
- Use kern_ioctl() to implement xenix_rdchk() instead of using the stackgap
  and mark xenix_rdchk() MPSAFE.
2006-07-08 20:12:14 +00:00
John Baldwin
c1cccebe8b Add a kern_close() so that the ABIs can close a file descriptor w/o having
to populate a close_args struct and change some of the places that do.
2006-07-08 20:03:39 +00:00
John Baldwin
b1ee5b654d Rework kern_semctl a bit to always assume the UIO_SYSSPACE case. This
mostly consists of pushing a few copyin's and copyout's up into
__semctl() as all the other callers were already doing the UIO_SYSSPACE
case.  This also changes kern_semctl() to set the return value in a passed
in pointer to a register_t rather than td->td_retval[0] directly so that
callers can only set td->td_retval[0] if all the various copyout's succeed.

As a result of these changes, kern_semctl() no longer does copyin/copyout
(except for GETALL/SETALL) so simplify the locking to acquire the semakptr
mutex before the MAC check and hold it all the way until the end of the
big switch statement.  The GETALL/SETALL cases have to temporarily drop it
while they do copyin/malloc and copyout.  Also, simplify the SETALL case to
remove handling for a non-existent race condition.
2006-07-08 19:51:38 +00:00
Warner Losh
db2bc1bb82 Create bus_enumerate_hinted_children. This routine will allow drivers
to use the hinted child system.  Bus drivers that use this need to
implmenet the bus_hinted_child method, where they actually add the
child to their bus, as they see fit.  The bus is repsonsible for
getting the attribtues for the child, adding it in the right order,
etc.  ISA hinting will be updated to use this method.

MFC After: 3 days
2006-07-08 17:06:15 +00:00
Robert Watson
e4256d1e8d Move POSIX.1e-specific utility routines from kern_acl.c to
subr_acl_posix1e.c, leaving kern_acl.c containing only ACL system
calls and utility routines common across ACL types.

Add subr_acl_posix1e.c to the build.

Obtained from:	TrustedBSD Project
2006-07-06 23:37:39 +00:00
John Baldwin
398c993b2a - Explicitly acquire Giant around SYSINIT's and SYSUNINIT's since they are
not all known to be MPSAFE yet.
- Actually remove Giant from the kernel linker by taking it out of the
  KLD_LOCK() and KLD_UNLOCK() macros.

Pointy hat to:	jhb (2)
2006-07-06 21:39:39 +00:00
John Baldwin
3cb83e714d Add kern_setgroups() and kern_getgroups() and use them to implement
ibcs2_[gs]etgroups() rather than using the stackgap.  This also makes
ibcs2_[gs]etgroups() MPSAFE.  Also, it cleans up one bit of weirdness in
the old setgroups() where it allocated an entire credential just so it had
a place to copy the group list into.  Now setgroups just allocates a
NGROUPS_MAX array on the stack that it copies into and then passes to
kern_setgroups().
2006-07-06 21:32:20 +00:00
Wayne Salamon
65ee602e0c Audit the remaining parameters to the extattr system calls. Generate
the audit records for those calls.

Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)
2006-07-06 19:33:38 +00:00
Robert Watson
6435cdafa3 Remove now unneeded opt_mac.h and mac.h includes. 2006-07-06 13:25:51 +00:00
Wayne Salamon
761aed363f Regen the system calls files, picking up the extended attr events, and some
mount-related changes done previously.

Approved by: rwatson (mentor)
2006-07-05 19:24:14 +00:00
Konstantin Belousov
c8d3bc1fa3 Back out my rev. 1.674. The better fix (rev. 1.637) is already in tree.
Approved by:	kan (mentor)
2006-07-05 16:33:25 +00:00
Wayne Salamon
bbe5d0318d Add audit events for the extended attribute system calls.
Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)
2006-07-05 15:46:02 +00:00
Maxim Konovalov
1f36c876a1 o Fix grammar in the comment, indent macros. No functional changes. 2006-07-02 20:53:52 +00:00
Maxim Konovalov
75d960eb2e o Remove rev. 1.57 leftover, not reached code. 2006-07-02 20:49:46 +00:00
Maxim Konovalov
e2668f5563 o Fix typo in the comment.
PR:		kern/99632
Submitted by:	clsung
2006-06-30 08:10:55 +00:00
David E. O'Brien
2e4db89cfc Fix building with GCC 4.2: define data types before referring to them. 2006-06-29 19:37:31 +00:00
John Baldwin
fe95c76276 Fix semctl(2) breakage from the previous commit. Previously __semctl() had
a local 'semid' variable which was the array index and used uap->semid
as the original IPC id.  During the kern_semctl() conversion those two
variables were collapsed into a single 'semid' variable breaking the
places that needed the original IPC ID.  To fix, add a new 'semidx'
variable to hold the array index and leave 'semid' unmolested as the IPC
id.  While I'm here, explicitly document that the (undocumented, at least
in semctl(2)) SEM_STAT command curiously expects an array index in the
'semid' parameter rather than an IPC id.

Submitted by:	maxim
2006-06-29 13:58:36 +00:00
David Xu
5151eeb194 Fix a bug when accumulating run time, if a thread calls yield() syscall,
its run time may be lost.
2006-06-29 12:29:20 +00:00
David Xu
d29a8ce69b Fix system load count (noticed by dephij). Remove incorrect
comment.
2006-06-29 09:49:00 +00:00
David Xu
0922ef0c42 Remove unused function declaration.
Add else statement in sched_calc_pri.
Fix a bug when checking interrupt thread in sched_add.
2006-06-29 05:59:36 +00:00
David Xu
d60003a2e4 Remove load balancer code, since it has serious priority inversion problem
which really hurts performance on FreeBSD.
2006-06-29 05:36:34 +00:00
John Baldwin
49d409a108 - Add a kern_semctl() helper function for __semctl(). It accepts a pointer
to a copied-in copy of the 'union semun' and a uioseg to indicate which
  memory space the 'buf' pointer of the union points to.  This is then used
  in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap.
- Mark linux_ipc() and svr4_sys_semsys() MPSAFE.
2006-06-27 18:28:50 +00:00
John Baldwin
597d608f86 - Expand the scope of Giant some in mount(2) to protect the vfsp structure
from going away.  mount(2) is now MPSAFE.
- Expand the scope of Giant some in unmount(2) to protect the mp structure
  (or rather, to handle concurrent unmount races) from going away.
  umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount().
- nmount(2) and linux_mount() were already MPSAFE.
2006-06-27 14:46:31 +00:00
Pawel Jakub Dawidek
0bd645ae0c Compress direct cr_ruid comparsion and jailed() call to suser_cred(9).
Reviewed by:	rwatson
2006-06-27 11:32:08 +00:00
Pawel Jakub Dawidek
8838c27693 Use suser_cred(9) instead of checking cr_uid directly.
Reviewed by:	rwatson
2006-06-27 11:29:38 +00:00
Pawel Jakub Dawidek
2905ade228 - Use suser_cred(9) instead of checking cr_ruid directly.
- For privileged processes safe two mutex operations.

We may want to consider if this is good idea to use SUSER_ALLOWJAIL here,
but for now I didn't wanted to change the original behaviour.

Reviewed by:	rwatson
2006-06-27 11:28:50 +00:00
Sergey Babkin
d81175c738 Backed out the change by request from rwatson.
PR:		kern/14584
2006-06-26 22:03:22 +00:00
John Baldwin
c94ce032df Address a problem I missed in removing Giant from the kernel linker. Not
all of the module event handlers are MP safe yet, so always acquire Giant
for now when invoking module event handlers.  Eventually we can add an
MPSAFE flag or some such and add appropriate locking to all module event
handlers.
2006-06-26 18:34:45 +00:00
John Baldwin
322fb40cbf Remove duplicate security checks already performed in kern_kldload(). 2006-06-26 18:33:32 +00:00
Robert Watson
e83b30bdcb Trim basically unused 'unp' in uipc_connect(). 2006-06-26 16:18:22 +00:00
Sergey Babkin
7a799f1ef0 The common UID/GID space implementation. It has been discussed on -arch
in 1999, and there are changes to the sysctl names compared to PR,
according to that discussion. The description is in sys/conf/NOTES.
Lines in the GENERIC files are added in commented-out form.
I'll attach the test script I've used to PR.

PR:		kern/14584
Submitted by:	babkin
2006-06-25 18:37:44 +00:00
Ian Dowse
450ec4ed45 If linker_release_module() fails then we still hold a reference on
the linker_file, so record this by restoring the linker_file pointer
in fp->file.
2006-06-25 12:36:21 +00:00
Pawel Jakub Dawidek
92c0849935 Simplify the code and remove two mutex operations.
MFC after:	2 weeks
2006-06-24 22:55:43 +00:00
John Baldwin
70f3778827 Replace the kld_mtx mutex with a kld_sx sx lock and expand it's scope to
protect all linker-related data structures including the contents of
linker file objects and the any linker class data as well. Considering how
rarely the linker is used I just went with the simple solution of
single-threading the whole thing rather than expending a lot of effor on
something more fine-grained and complex.  Giant is still explicitly
acquired while registering and deregistering sysctl's as well as in the
elf linker class while calling kmupetext().  The rest of the linker runs
without Giant unless it has to acquire Giant while loading files from a
non-MPSAFE filesystem.
2006-06-21 20:42:08 +00:00
John Baldwin
cbda6f950b - Push down Giant in kldfind() and kldsym().
- Remove several goto's by either using direct return's or else clauses.
2006-06-21 20:15:36 +00:00
John Baldwin
d36e739a0c Whoops, revert accidental commit. 2006-06-21 17:48:59 +00:00
John Baldwin
9dd44bd79e Fix two comments and a style fix. 2006-06-21 17:48:03 +00:00
John Baldwin
0df2972736 Various whitespace fixes. 2006-06-21 17:47:45 +00:00
John Baldwin
62d615d508 Conditionally acquire Giant around VFS operations. 2006-06-20 21:31:38 +00:00
John Baldwin
aeeb017bd6 - Push Giant down into linker_reference_module().
- Add a new function linker_release_module() as a more intuitive complement
  to linker_reference_module() that wraps linker_file_unload().
  linker_release_module() can either take the module name and version info
  passed to linker_reference_module() or it can accept the linker file
  object returned by linker_reference_module().
2006-06-20 20:54:13 +00:00
John Baldwin
f462ce3edd Make linker_find_file_by_name() and linker_find_file_by_id() static to
simplify linker locking.  The only external consumers now use
linker_file_foreach().
2006-06-20 20:41:15 +00:00
John Baldwin
932151064a - Add a new linker_file_foreach() function that walks the list of linker
file objects calling a user-specified predicate function on each object.
  The iteration terminates either when the entire list has been iterated
  over or the predicate function returns a non-zero value.
  linker_file_foreach() returns the value returned by the last invocation
  of the predicate function.  It also accepts a void * context pointer that
  is passed to the predicate function as well.  Using an iterator function
  avoids exposing linker internals to the rest of the kernel making locking
  simpler.
- Use linker_file_foreach() instead of walking the list of linker files
  manually to lookup ndis files in ndis(4).
- Use linker_file_foreach() to implement linker_hwpmc_list_objects().
2006-06-20 20:37:17 +00:00
John Baldwin
aaf3170501 Make linker_file_add_dependency() and linker_load_module() static since
only the linker uses them.
2006-06-20 20:18:42 +00:00
John Baldwin
e767366f99 Don't check if malloc(M_WAITOK) returns NULL. 2006-06-20 20:11:00 +00:00
John Baldwin
e5bb3a01d7 Use 'else' to remove another goto. 2006-06-20 19:49:28 +00:00
John Baldwin
73a2437a83 - Remove some useless variable initializations.
- Make some conditional free()'s where the condition was always true
  unconditional.
2006-06-20 19:32:10 +00:00
George V. Neville-Neil
fb11be62a2 Properly cast the values of valsize (the size of the value passed in)
in setsockopt so that they can be compared correctly against negative
values.  Passing in a negative value had a rather negative effect
on our socket code, making it impossible to open new sockets.

PR:		98858
Submitted by:	James.Juran@baesystems.com
MFC after:	1 week
2006-06-20 12:36:40 +00:00
Robert Watson
721150ad8f When retrieving SO_ERROR via getsockopt(), hold the socket lock around
the retrieval and replacement with 0.

MFC after:	1 week
2006-06-18 19:02:49 +00:00
Yaroslav Tykhiy
42ccd54fec Add a funny sysctl: debug.kdb.trap_code .
It is similar to debug.kdb.trap, except for it tries to cause a page fault
via a call to an invalid pointer.  This can highlight differences between
a fault on data access vs. a fault on code call some CPUs might have.

This appeared as a test for a work \
Sponsored by: RiNet (Cronyx Plus LLC)
2006-06-18 12:27:59 +00:00
Robert Watson
cd3a3a269f Remove sbinsertoob(), sbinsertoob_locked(). They violate (and have
basically always violated) invariannts of soreceive(), which assume
that the first mbuf pointer in a receive socket buffer can't change
while the SB_LOCK sleepable lock is held on the socket buffer,
which is precisely what these functions do.  No current protocols
invoke these functions, and removing them will help discourage them
from ever being used.  I should have removed them years ago, but
lost track of it.

MFC after:	1 week
Prodded almost by accident by:	peter
2006-06-17 22:48:34 +00:00
Ed Maste
374875fa56 Add a description for sysctl -d. 2006-06-17 02:58:18 +00:00
Robert Watson
9a44cbf19c Remove unused (and ifdef'd) unp_abort() and unp_drain().
MFC after:	1 month
2006-06-16 22:11:49 +00:00
David Malone
93ef14a74b Add a kern.timecounter.tc sysctl tree that contains the mask,
frequency, quality and current value of each available time counter.

At the moment all of these are read-only, but it might make sense to
make some of these read-write in the future.

MFC after:	3 months
2006-06-16 20:29:05 +00:00
Yaroslav Tykhiy
be70abccba Kill an XXX remark that has been untrue since rev. 1.150 of this file. 2006-06-16 07:36:18 +00:00
Christian S.J. Peron
4f0840f348 Axe Giant from vn_fullpath(9). The vnode -> pathname lookup should be
filesystem agnostic. We are not touching any file system specific functions
in this code path. Since we have a cache lock, there is really no need to
keep Giant around here.

This eliminates Giant acquisitions for any syscall which is auditing pathnames.

Discussed with:	jeff
2006-06-16 05:09:28 +00:00
Maxim Konovalov
059d68dea6 o Expand an exclusive lock scope to prevent a race between two
simultaneous module_register().

Original work done by:	Alex Lyashkov
Reviewed by:		jhb
MFC after:		2 weeks
2006-06-15 08:53:09 +00:00
David Xu
7bb561fbb9 Use scheduler API sched_relinquish() to implement yield() syscall. 2006-06-15 06:41:57 +00:00
David Xu
36ec198bd5 Add scheduler API sched_relinquish(), the API is used to implement
yield() and sched_yield() syscalls. Every scheduler has its own way
to relinquish cpu, the ULE and CORE schedulers have two internal run-
queues, a timesharing thread which calls yield() syscall should be
moved to inactive queue.
2006-06-15 06:37:39 +00:00
David Xu
c2c1ab1858 Clear ke_runq before calling maybe_preempt, this avoids a
KASSERT(ke->ke_runq == NULL) panic when the sched_add is recursively
called by maybe_preempt.

Reported by: Wojciech A. Koszek < dunstan at freebsd dot czest dot pl >
2006-06-14 03:46:03 +00:00
Xin LI
6ad26d8376 Unexpand an instance of TAILQ_EMPTY() 2006-06-14 03:14:26 +00:00
Marcel Moolenaar
e1684acf38 Unbreak 64-bit architectures. The 3rd argument to kern_kldload() is
a pointer to an integer and td->td_retval[0] is of type register_t.
On 64-bit architectures register_t is wider than an integer.
2006-06-14 03:01:06 +00:00
David Xu
2c7cae8042 Fox a typo in sched_is_timeshare. 2006-06-13 23:45:59 +00:00
David Xu
e15abbf251 Pass boolean value to __predict_false. Try to keep KSE slot count
correct for migrating thread, the count is a bit mess.
2006-06-13 23:01:50 +00:00
John Baldwin
edd32c2da2 Use kern_kldload() and kern_kldunload() to load and unload modules when
we intend for the user to be able to unload them later via kldunload(2)
instead of calling linker_load_module() and then directly adjusting the
ref count on the linker file structure.  This makes the resulting
consumer code simpler and cleaner and better hides the linker internals
making it possible to sanely lock the linker.
2006-06-13 21:36:23 +00:00
John Baldwin
b21c9288ce A couple of minor style tweaks. 2006-06-13 21:34:12 +00:00
John Baldwin
d53885879d - Add a kern_kldload() that is most of the previous kldload() and push
Giant down in it.
- Push Giant down in kern_kldunload() and reorganize it slightly to avoid
  using gotos.  Also, expose this function to the rest of the kernel.
2006-06-13 21:28:18 +00:00
John Baldwin
6b3d277ad4 - Push down Giant some in kldstat().
- Use a 'struct kld_file_stat' on the stack to read data under the lock
  and then do one copyout() w/o holding the lock at the end to push the
  data out to userland.
2006-06-13 21:11:12 +00:00
John Baldwin
b904477c68 Unexpand TAILQ_FOREACH() and TAILQ_FOREACH_SAFE(). 2006-06-13 20:49:07 +00:00
John Baldwin
3a600aeabc Remove some more pointless goto's and don't check to see if
malloc(M_WAITOK) returns NULL.
2006-06-13 20:27:23 +00:00
John Baldwin
2fa6cc80d7 Handle the simple case of just dropping a reference near the start of
linker_file_unload() instead of in the middle of a bunch of code for
the case of dropping the last reference to improve readability and sanity.
While I'm here, remove pointless goto's that were just jumping to a
return statement.
2006-06-13 19:45:08 +00:00
Maxim Konovalov
70df31f4de o There are two methods to get a process credentials over the unix
sockets:

1) A sender sends SCM_CREDS message to a reciever, struct cmsgcred;
2) A reciever sets LOCAL_CREDS socket option and gets sender
credentials in control message, struct sockcred.

Both methods use the same control message type SCM_CREDS with the
same control message level SOL_SOCKET, so they are indistinguishable
for the receiver.  A difference in struct cmsgcred and struct sockcred
layouts may lead to unwanted effects.

Now for sockets with LOCAL_CREDS option remove all previous linked
SCM_CREDS control messages and then add a control message with
struct sockcred so the process specifically asked for the peer
credentials by LOCAL_CREDS option always gets struct sockcred.

PR:		kern/90800
Submitted by:	Andrey Simonenko
Regres. tests:	tools/regression/sockets/unix_cmsg/
MFC after:	1 month
2006-06-13 14:33:35 +00:00
David Xu
b41f1452d9 Add scheduler CORE, the work I have done half a year ago, recent,
I picked it up again. The scheduler is forked from ULE, but the
algorithm to detect an interactive process is almost completely
different with ULE, it comes from Linux paper "Understanding the
Linux 2.6.8.1 CPU Scheduler", although I still use same word
"score" as a priority boost in ULE scheduler.

Briefly, the scheduler has following characteristic:
1. Timesharing process's nice value is seriously respected,
   timeslice and interaction detecting algorithm are based
   on nice value.
2. per-cpu scheduling queue and load balancing.
3. O(1) scheduling.
4. Some cpu affinity code in wakeup path.
5. Support POSIX SCHED_FIFO and SCHED_RR.
Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler
uses 256 priority queues. Unlike ULE which using pull and push, the
scheduelr uses pull method, the main reason is to let relative idle
cpu do the work, but current the whole scheduler is protected by the
big sched_lock, so the benefit is not visible, it really can be worse
than nothing because all other cpu are locked out when we are doing
balancing work, which the 4BSD scheduelr does not have this problem.
The scheduler does not support hyperthreading very well, in fact,
the scheduler does not make the difference between physical CPU and
logical CPU, this should be improved in feature. The scheduler has
priority inversion problem on MP machine, it is not good for
realtime scheduling, it can cause realtime process starving.
As a result, it seems the MySQL super-smack runs better on my
Pentium-D machine when using libthr, despite on UP or SMP kernel.
2006-06-13 13:12:56 +00:00
John Baldwin
5c69ad8374 Use fget() in kqueue_register() instead of doing all the work by hand. 2006-06-12 21:46:23 +00:00
Warner Losh
ccdc8d9bff Add a convenience function rman_init_from_resource for initializing
a rman from a resource.

Also, include _bus.h since the implementation of bus_space isn't
needed here, just the definitions of the types.
2006-06-12 04:06:21 +00:00
Ian Dowse
eb1030c4fd Keep firmware images on the list until they have been unregistered
with firmware_unregister(). Previously when the last driver reference
had been dropped we would clear the list entry under the assumption
that the firmware module was about to be unloaded, but this was not
true if the firmware image had been loaded manually with kldload.

This makes it possible to manually kldload firmware images as a
workaround for drivers such as ipw that attempt to load firmware
while resuming after a suspend.

Reviewed by:	mlaier (an earlier version of the patch)
2006-06-10 17:04:07 +00:00
Robert Watson
b37ffd3189 Move some functions and definitions from uipc_socket2.c to uipc_socket.c:
- Move sonewconn(), which creates new sockets for incoming connections on
  listen sockets, so that all socket allocate code is together in
  uipc_socket.c.

- Move 'maxsockets' and associated sysctls to uipc_socket.c with the
  socket allocation code.

- Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it
  to sysctl.h and remove lots of scattered implementations in various
  IPC modules.

- Sort sodealloc() after soalloc() in uipc_socket.c for dependency order
  reasons.  Statisticize soalloc() and sodealloc() as they are now
  required only in uipc_socket.c, and are internal to the socket
  implementation.

After this change, socket allocation and deallocation is entirely
centralized in one file, and uipc_socket2.c consists entirely of socket
buffer manipulation and default protocol switch functions.

MFC after:	1 month
2006-06-10 14:34:07 +00:00
Robert Watson
e02421f3fb Rearrange code in soalloc() so that it's less indented by returning
early if uma_zalloc() from the socket zone fails.  No functional
change.

MFC after:	1 week
2006-06-08 22:33:18 +00:00
Konstantin Belousov
55aef2632f Fix the LOR that occurs when the MAC compiled into the kernel
and vnode is destroyed.

Reviewed by:	rwatson
LOR:		189
MFC after:	2 weeks
Approved by:	kan (mentor)
2006-06-08 07:55:10 +00:00
David Xu
0ae716e5ee Make ke_rqindex unsigned. 2006-06-06 12:26:17 +00:00
Robert Watson
7ebfc8df78 Audit some arguments to nmount(), mount(), umount().
Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2006-06-05 15:32:07 +00:00
Robert Watson
6e79e6f805 Audit command, uid arguments for quotactl().
Audit the mode argument to mkfifo().
Audit the target path passed to symlink().

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2006-06-05 13:34:23 +00:00
Robert Watson
d3778141bf Audit path passed to the acct() system call.
Obtained from:	TrustedBSD Project
2006-06-05 13:02:34 +00:00
John Baldwin
49b94bfc54 Bah, fix fat finger in last. Invert the ~ on MTX_FLAGMASK as it's
non-intuitive for the ~ to be built into the mask.  All the users now
explicitly ~ the mask.  In addition, add MTX_UNOWNED to the mask even
though it technically isn't a flag.  This should unbreak mtx_owner().

Quickly spotted by:	kris
2006-06-03 21:11:33 +00:00
John Baldwin
3ce3f44293 In the case of reentering the debugger due to an attempt to perform a
context switch while in the debugger, reenter the debugger sooner before
performing any statistics updates.
2006-06-03 20:49:44 +00:00