9239 Commits

Author SHA1 Message Date
rwatson
36f0dbe4c4 Add new fields to process-related data structures:
- td_ar to struct thread, which holds the in-progress audit record during
  a system call.

- p_au to struct proc, which holds per-process audit state, such as the
  audit identifier, audit terminal, and process audit masks.

In the earlier implementation, td_ar was added to the zero'd section of
struct thread.  In order to facilitate merging to RELENG_6, it has been
moved to the end of the data structure, requiring explicit
initalization in the thread constructor.

Much help from:	wsalamon
Obtained from:	TrustedBSD Project
2006-02-02 00:37:05 +00:00
jeff
b2e03708cb - Solve a problem where a vput could be called on an outgoing directory
without Giant held.  Do this by tracking the vfslocked state for
   the directory seperate from the child.  This is only important
   in the case where we cross a mountpoint.

Sponsored by:	Isilon Systems, Inc.
MFC After:	3 days
2006-02-01 09:34:32 +00:00
jeff
3f7fc03f58 - chroot and chdir need to lock giant as appropriate for the outgoing vp
as well as the new vp.

Sponsored by:	Isilon Systems, Inc.
MFC After:	3 days
2006-02-01 09:30:44 +00:00
scottl
84425bc82d Fix another compile problem. If I find any more, this file is going in the
Attic until it is properly fixed.
2006-02-01 04:18:07 +00:00
jeff
47857ecfe1 - Solve a race where we could lose a call to VOP_INACTIVE. If vget() waiting
on a lock held the last usecount ref on a vnode and the lock failed we
   would not call INACTIVE.  Solve this by only holding a holdcnt to prevent
   the vnode from disappearing while we wait on vn_lock.  Other callers
   may now VOP_INACTIVE while we are waiting on the lock, however this race
   is acceptable, while losing INACTIVE is not.

Discussed with:	kan, pjd
Tested by:	kkenn
Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-01 00:30:05 +00:00
jeff
30a231055b - Reorder calls to vrele() after calls to vput() when the vrele is a
directory.  vrele() may lock the passed vnode, which in these cases would
   give an invalid lock order of child -> parent.  These situations are
   deadlock prone although do not typically deadlock because the vrele
   is typically not releasing the last reference to the vnode.  Users of
   vrele must consider it as a call to vn_lock() and order it appropriately.

MFC After: 	1 week
Sponsored by:	Isilon Systems, Inc.
Tested by:	kkenn
2006-02-01 00:25:26 +00:00
csjp
a75160be2b Allow root to open prison pts devices too.
Pointed out by:	rwatson
2006-01-31 22:19:37 +00:00
csjp
1e1959a627 Allow root in the host environment to open ptys within jailed environments.
This logic change was introduced in revision 1.74:

Correct an oversight in jail() that allowed processes in jail to access
ptys in ways that might be unethical, especially towards processes not in
jail, or in other jails.

It should be fine to allow root in the host environment to do this. This
allows for more effective monitoring of prisons from the host environment.

Discussed with:	rwatson
MFC after:	1 week
2006-01-31 17:17:45 +00:00
pjd
645dd7b662 Add buffer corruption protection (RedZone) for kernel's malloc(9).
It detects both: buffer underflows and buffer overflows bugs at runtime
(on free(9) and realloc(9)) and prints backtraces from where memory was
allocated and from where it was freed.

Tested by:	kris
2006-01-31 11:09:21 +00:00
scottl
064daa7fd1 Regroup order of operations to better reflect what was probably intended.
Submitted by: Peter Jeremy
2006-01-30 19:25:52 +00:00
glebius
19f8b36e66 Merge the //depot/user/yar/vlan branch into CVS. It contains some collective
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.

The most important changes:

o   Instead of global linked list of all vlan softc use a per-trunk
  hash. The size of hash is dynamically adjusted, depending on
  number of entries. This changes struct ifnet, replacing counter
  of vlans with a pointer to trunk structure. This change is an
  improvement for setups with big number of VLANs, several interfaces
  and several CPUs. It is a small regression for a setup with a single
  VLAN interface.
    An alternative to dynamic hash is a per-trunk static array with
  4096 entries, which is a compile time option - VLAN_ARRAY. In my
  experiments the array is not an improvement, probably because such
  a big trunk structure doesn't fit into CPU cache.
o   Introduce an UMA zone for VLAN tags. Since drivers depend on it,
  the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
  This change is a big improvement for any setup utilizing vlan(4).
o   Use rwlock(9) instead of mutex(9) for locking. We are the first
  ones to do this! :)
o   Some drivers can do hardware VLAN tagging + hardware checksum
  offloading. Add an infrastructure for this. Whenever vlan(4) is
  attached to a parent or parent configuration is changed, the flags
  on vlan(4) interface are updated.

In collaboration with:	yar, thompsa
In collaboration with:	Mihail Balikov <mihail.balikov interbgc.com>
2006-01-30 13:45:15 +00:00
rwatson
3485717ebc Move pts master devices into /dev/pty/ instead of littering /dev with them;
this is more consistent with the placement of slaves in /dev/pts.  The
actual name doesn't matter as it's not part of the exposed API or used by
libc.  In some sense, it would be nice if these device nodes didn't have to
have names in devfs at all.

Suggested by:	Stephen McKay <smckay at internode dot on dot net>
2006-01-30 11:59:19 +00:00
glebius
e8ec4c3a0a - In pipe() return the error returned by pipe_create(), rather then
hardcoded ENFILES, which is incorrect. pipe_create() can fail due
  to ENOMEM.
- Update manual page, describing ENOMEM return code.

Reviewed by:	arch
2006-01-30 08:25:04 +00:00
jeff
cafb7bd7e0 - Add a comment warning about an anomalous condition where we VOP_UNLOCK
and then vrele rather than vput because we would like to VOP_UNLOCK with
   a specific thread.
2006-01-30 08:21:23 +00:00
jeff
822d3c8355 - Lock access to vrele() with VFS_LOCK_GIANT() rather than mtx_lock(&Giant).
Sponsored by:	Isilon Systems, Inc.
2006-01-30 08:19:01 +00:00
scottl
66db92c03a Take a stab at making this compile when WITNESS is not defined. gcc can't
figure out the order of operations at line 519, and neither can I, but this
is my best guess.  Also correct a number of typos and syntax errors.
2006-01-29 20:48:25 +00:00
mlaier
719dd1ebed firmware(9) is a subsystem to load binary data into the kernel via a
specially crafted module.  There are several handrolled sollutions to this
problem in the tree already which will be replaced with this.  They include
iwi(4), ipw(4), ispfw(4) and digi(4).

No objection from:	arch
MFC after:		2 weeks
X-MFC after:		some drivers have been converted
2006-01-29 02:52:42 +00:00
mlaier
00e2fdfae2 Unbreak on archs where %d doesn't print uintptr_t arithmetic. 2006-01-29 02:35:22 +00:00
rwatson
252cbc4973 Rename use_old_pty variable to use_pts, as this more accurately reflects
the sense of the variable.

Suggested by:	dwhite
2006-01-28 23:31:19 +00:00
ssouhlal
0ee4f07a23 Don't try to load KLDs if we're mounting the root. We'd otherwise panic.
Tested by:	kris
MFC after:	3 days
2006-01-28 22:58:39 +00:00
kris
a70f9992d4 Back out r1.653; it turns out that the race (or at least the printf) is
actually not hard to trigger, and it can cause a lot of console spam.

Approved by:	kan
2006-01-28 03:06:35 +00:00
imp
7b10eaebae lock unused when INVARIANTS not defined, so don't declare it then 2006-01-28 00:49:31 +00:00
jhb
bf16c50390 Add a basic reader/writer lock implementation to the kernel. This
implementation is by no means perfect as far as some of the algorithms
that it uses and the fact that it is missing some functionality (try
locks and upgrades/downgrades are not there yet), however it does seem
to work in my local testing.  There is more detail in the comments in the
code, but the short version follows.

A reader/writer lock is very much like a regular mutex: it cannot be held
across a voluntary sleep; it can be acquired in an interrupt thread; if
the lock is held by a writer then the priority of any threads that block
on the lock will be lent to the owner; the simple case lock operations all
are done in a single atomic op.  It also shares some similiarities
with sx locks: it supports reader/writer semantics (multiple readers,
but single writers); readers are allowed to recurse, but writers are not.

We can extend this implementation further by either improving algorithms
or adding new functionality, but this should at least give us a base to
work with now.

Reviewed by:	arch (in theory)
Tested on:	i386 (4 cpu box with a kernel module that used 4 threads
		that randomly chose between read locks and write locks
		that ran w/o panicing for over a day solid.  It usually
		panic'd within a few seconds when there were bugs during
		testing. :)  The kernel module source is available on
		request.)
2006-01-27 23:13:26 +00:00
jhb
d9899f5d16 Whitespace. 2006-01-27 23:06:08 +00:00
jhb
752dede518 - Add support for having both a shared and exclusive queue of threads in
each turnstile.  Also, allow for the owner thread pointer of a turnstile
  to be NULL.  This is needed for the upcoming reader/writer lock
  implementation.
- Add a new ddb command 'show turnstile' that will look up the turnstile
  associated with the given lock argument and display useful information
  like the list of threads blocked on each queue, etc.  If there isn't an
  active turnstile for a lock at the specified address, then the function
  will see if there is an active turnstile at the specified address and
  display info about it if so.
- Adjust the mutex code to handle the turnstile API changes.

Tested on:	i386 (all), alpha, amd64, sparc64 (1 and 3)
2006-01-27 22:42:12 +00:00
jhb
6160bb7d84 Add a new ddb command 'show sleepq'. It takes a wait channel as an
argument and looks for a sleep queue associated with that wait channel.
If it finds one it will display information such as the list of threads
sleeping on that queue.  If it can't find a sleep queue for that wait
channel, then it will see if that address matches any of the active
sleep queues.  If so, it will display information about the sleepq at the
specified address.
2006-01-27 22:24:07 +00:00
jhb
a7d556d07e Add a new sysctl, debug.ktr.clear. If you write a non-zero value to this
sysctl then it will clear the KTR buffer.  Note that if you have active
KTR traces at the same time as a clear operation the behavior is undefined,
though it shouldn't panic.
2006-01-27 22:17:31 +00:00
cognet
b1d0cfd746 Merge a bunch of changes that where done in tty_pty.c after tty_pts.c was
forked from it, but missed from some reason.
2006-01-27 15:13:40 +00:00
pjd
26c98c76f9 Grr. Backout previous change. vn_open_cred() will call NDFREE() on failure. 2006-01-27 11:25:06 +00:00
pjd
bc66ded70e Don't forget to call NDFREE(9) in case of vn_open_cred() failure.
MFC after:	3 days
2006-01-27 11:19:53 +00:00
davidxu
a8cf0ae648 Just like dofilewrite(), call bwillwrite before fo_write. 2006-01-27 08:02:25 +00:00
davidxu
7a2375b9e4 return final error code in aio_return rather than a hardcoded 0. 2006-01-27 04:14:16 +00:00
cognet
3299c77864 Take into account that bits 0x0000ff00 can't be used for minor. 2006-01-27 00:21:48 +00:00
cognet
998c2ee892 Don't attempt to re-create the /dev entry for the slave part if it already
exist when opening the master. This can happen if one open the master, then
open the slave, then close and re-open the master.

Reported by:	Peter Holm
2006-01-26 20:54:49 +00:00
davidxu
e2e1fa02bc in aio_aqueue, store same return code into job->_aiocb_private.error.
in aio_return, unlock proc lock before suword.
2006-01-26 08:37:02 +00:00
cognet
aff5d6bf80 Bring in a sysv-style pts implementation, as found in the rwatson_pts perforce branch. It works the same as its SysV/linux counterpart : You obtain a fd to the master pseudo terminal by opening /dev/ptmx, which craetes a node for the master as /dev/pty[num] and a node for the slave as /dev/pts/[num].
It should play nicely with the existing BSD ptys.
By default, the system will use the BSD ptys, one can set the sysctl
kern.pts.enable to 1 to make it use the new pts system.
The max number of pty that can be allocated on a system can be changed with the
sysctl kern.pts.max. It defaults to 1000, and can be increased, but it is not
recommanded, as any pty with a number > 999 won't be handled by whatever uses
utmp(5).
2006-01-26 01:30:34 +00:00
jhb
39a7d62e79 Axe KTR_ALQ_MASK now that KTR_WITNESS is off unless you hack an #ifdef
in subr_witness.c.  I did add a comment in subr_witness.c noting that
KTR_WITNESS is incompatible with KTR_ALQ.
2006-01-25 14:57:23 +00:00
ups
181ac11e19 Back out changes made in rev. 1.151.
They were bogus.

Cluebat applied by: jhb@
2006-01-25 02:05:47 +00:00
truckman
9f8a407ccb Touch all the pages wired by sysctl_wire_old_buffer() to avoid PTE
modified bit emulation traps on Alpha while holding locks in the
sysctl handler.

A better solution would be to pass a hint to the Alpha pmap code to
tell mark these pages as modified when they as they are being wired,
but that appears to be more difficult to implement.

Suggested by: jhb
MFC after:	3 days
2006-01-25 01:03:34 +00:00
jhb
4730881b84 Whitespace fix. 2006-01-24 22:24:05 +00:00
jhb
35abe809de - Add a new KTR_SUBSYS in place of KTR_SPARE1 to serve as a subsystem
placeholder similar to KTR_DEV.  Explain the use of KTR_DEV and
  KTR_SUBSYS in a comment as well.
- Retire KTR_WITNESS and instead have KTR_WITNESS default to off but use
  KTR_SUBSYS if it is enabled.
2006-01-24 22:23:45 +00:00
davidxu
1f5105d8a7 Add locking annotation and comments about socket, pipe, fifo problem.
Temporarily fix a locking problem for socket I/O.
2006-01-24 07:24:24 +00:00
davidxu
edbd69603d Er, rescure a deleted comment line. 2006-01-24 02:50:42 +00:00
davidxu
acb3d897fb More cleanup for aio code:
1) unregsiter kqueue filter for EVFILT_LIO.
2) free uma_zones.
3) call setsid directly to enter another session rather than
   implementing by itself.

Submitted by: jhb
2006-01-24 02:46:15 +00:00
davidxu
4aec3469f3 Add bracket. 2006-01-23 23:46:30 +00:00
jhb
0c769dbc8b Fix a vnode reference leak in the ktrace code. We always grab a reference
to the vnode at the start of ktr_writerequest() but were missing the
corresponding vrele() after we finished the write operation.

Reported by:	jasone
2006-01-23 21:45:32 +00:00
ups
18ba9270dc Hopefully fix the "calcru: runtime went backwards from ..." problem by
keeping the resource values locked (where needed) while we use them
for calculations.

MFC after:	3 days
2006-01-23 19:15:13 +00:00
andre
b7dae7ac6c In mb_zinit_pack() explicitly ignore the return value of uma_zalloc_arg().
The success of the cluster allocation is checked through a field in the
mbuf structure.  This change is non-functional but properly silences code
inspection tools.

Found by:	Coverity Prevent(tm)
Coverity ID:	CID807
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-01-23 15:49:01 +00:00
davidxu
14cd6d7f49 Verify all supported notification types. 2006-01-23 10:27:15 +00:00
davidxu
47e7cba205 1) Merge _aio_aqueue and aio_aqueue, check quota in aio_aqueue,
so that lio_listio won't exceed the quota.
2) Remove lio_ref_count, it is no longer used.
2006-01-23 02:49:34 +00:00