Commit Graph

9026 Commits

Author SHA1 Message Date
Gleb Smirnoff
75ee267c22 Merge the //depot/user/yar/vlan branch into CVS. It contains some collective
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.

The most important changes:

o   Instead of global linked list of all vlan softc use a per-trunk
  hash. The size of hash is dynamically adjusted, depending on
  number of entries. This changes struct ifnet, replacing counter
  of vlans with a pointer to trunk structure. This change is an
  improvement for setups with big number of VLANs, several interfaces
  and several CPUs. It is a small regression for a setup with a single
  VLAN interface.
    An alternative to dynamic hash is a per-trunk static array with
  4096 entries, which is a compile time option - VLAN_ARRAY. In my
  experiments the array is not an improvement, probably because such
  a big trunk structure doesn't fit into CPU cache.
o   Introduce an UMA zone for VLAN tags. Since drivers depend on it,
  the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
  This change is a big improvement for any setup utilizing vlan(4).
o   Use rwlock(9) instead of mutex(9) for locking. We are the first
  ones to do this! :)
o   Some drivers can do hardware VLAN tagging + hardware checksum
  offloading. Add an infrastructure for this. Whenever vlan(4) is
  attached to a parent or parent configuration is changed, the flags
  on vlan(4) interface are updated.

In collaboration with:	yar, thompsa
In collaboration with:	Mihail Balikov <mihail.balikov interbgc.com>
2006-01-30 13:45:15 +00:00
Robert Watson
4c0b19957f Move pts master devices into /dev/pty/ instead of littering /dev with them;
this is more consistent with the placement of slaves in /dev/pts.  The
actual name doesn't matter as it's not part of the exposed API or used by
libc.  In some sense, it would be nice if these device nodes didn't have to
have names in devfs at all.

Suggested by:	Stephen McKay <smckay at internode dot on dot net>
2006-01-30 11:59:19 +00:00
Gleb Smirnoff
61fb9bd80c - In pipe() return the error returned by pipe_create(), rather then
hardcoded ENFILES, which is incorrect. pipe_create() can fail due
  to ENOMEM.
- Update manual page, describing ENOMEM return code.

Reviewed by:	arch
2006-01-30 08:25:04 +00:00
Jeff Roberson
608c95d341 - Add a comment warning about an anomalous condition where we VOP_UNLOCK
and then vrele rather than vput because we would like to VOP_UNLOCK with
   a specific thread.
2006-01-30 08:21:23 +00:00
Jeff Roberson
033eb86e52 - Lock access to vrele() with VFS_LOCK_GIANT() rather than mtx_lock(&Giant).
Sponsored by:	Isilon Systems, Inc.
2006-01-30 08:19:01 +00:00
Scott Long
8ad6b7ab7c Take a stab at making this compile when WITNESS is not defined. gcc can't
figure out the order of operations at line 519, and neither can I, but this
is my best guess.  Also correct a number of typos and syntax errors.
2006-01-29 20:48:25 +00:00
Max Laier
6aec1278dc firmware(9) is a subsystem to load binary data into the kernel via a
specially crafted module.  There are several handrolled sollutions to this
problem in the tree already which will be replaced with this.  They include
iwi(4), ipw(4), ispfw(4) and digi(4).

No objection from:	arch
MFC after:		2 weeks
X-MFC after:		some drivers have been converted
2006-01-29 02:52:42 +00:00
Max Laier
69e99c5d4c Unbreak on archs where %d doesn't print uintptr_t arithmetic. 2006-01-29 02:35:22 +00:00
Robert Watson
5276d7471f Rename use_old_pty variable to use_pts, as this more accurately reflects
the sense of the variable.

Suggested by:	dwhite
2006-01-28 23:31:19 +00:00
Suleiman Souhlal
c270875f7c Don't try to load KLDs if we're mounting the root. We'd otherwise panic.
Tested by:	kris
MFC after:	3 days
2006-01-28 22:58:39 +00:00
Kris Kennaway
d5e5528afe Back out r1.653; it turns out that the race (or at least the printf) is
actually not hard to trigger, and it can cause a lot of console spam.

Approved by:	kan
2006-01-28 03:06:35 +00:00
Warner Losh
6229621e2c lock unused when INVARIANTS not defined, so don't declare it then 2006-01-28 00:49:31 +00:00
John Baldwin
3f08bd8bce Add a basic reader/writer lock implementation to the kernel. This
implementation is by no means perfect as far as some of the algorithms
that it uses and the fact that it is missing some functionality (try
locks and upgrades/downgrades are not there yet), however it does seem
to work in my local testing.  There is more detail in the comments in the
code, but the short version follows.

A reader/writer lock is very much like a regular mutex: it cannot be held
across a voluntary sleep; it can be acquired in an interrupt thread; if
the lock is held by a writer then the priority of any threads that block
on the lock will be lent to the owner; the simple case lock operations all
are done in a single atomic op.  It also shares some similiarities
with sx locks: it supports reader/writer semantics (multiple readers,
but single writers); readers are allowed to recurse, but writers are not.

We can extend this implementation further by either improving algorithms
or adding new functionality, but this should at least give us a base to
work with now.

Reviewed by:	arch (in theory)
Tested on:	i386 (4 cpu box with a kernel module that used 4 threads
		that randomly chose between read locks and write locks
		that ran w/o panicing for over a day solid.  It usually
		panic'd within a few seconds when there were bugs during
		testing. :)  The kernel module source is available on
		request.)
2006-01-27 23:13:26 +00:00
John Baldwin
135161049e Whitespace. 2006-01-27 23:06:08 +00:00
John Baldwin
7aa4f6852a - Add support for having both a shared and exclusive queue of threads in
each turnstile.  Also, allow for the owner thread pointer of a turnstile
  to be NULL.  This is needed for the upcoming reader/writer lock
  implementation.
- Add a new ddb command 'show turnstile' that will look up the turnstile
  associated with the given lock argument and display useful information
  like the list of threads blocked on each queue, etc.  If there isn't an
  active turnstile for a lock at the specified address, then the function
  will see if there is an active turnstile at the specified address and
  display info about it if so.
- Adjust the mutex code to handle the turnstile API changes.

Tested on:	i386 (all), alpha, amd64, sparc64 (1 and 3)
2006-01-27 22:42:12 +00:00
John Baldwin
f126e754e0 Add a new ddb command 'show sleepq'. It takes a wait channel as an
argument and looks for a sleep queue associated with that wait channel.
If it finds one it will display information such as the list of threads
sleeping on that queue.  If it can't find a sleep queue for that wait
channel, then it will see if that address matches any of the active
sleep queues.  If so, it will display information about the sleepq at the
specified address.
2006-01-27 22:24:07 +00:00
John Baldwin
bef4bf1adf Add a new sysctl, debug.ktr.clear. If you write a non-zero value to this
sysctl then it will clear the KTR buffer.  Note that if you have active
KTR traces at the same time as a clear operation the behavior is undefined,
though it shouldn't panic.
2006-01-27 22:17:31 +00:00
Olivier Houchard
23c15e6437 Merge a bunch of changes that where done in tty_pty.c after tty_pts.c was
forked from it, but missed from some reason.
2006-01-27 15:13:40 +00:00
Pawel Jakub Dawidek
f220f7afa6 Grr. Backout previous change. vn_open_cred() will call NDFREE() on failure. 2006-01-27 11:25:06 +00:00
Pawel Jakub Dawidek
970c7ca2ef Don't forget to call NDFREE(9) in case of vn_open_cred() failure.
MFC after:	3 days
2006-01-27 11:19:53 +00:00
David Xu
6d53aa6297 Just like dofilewrite(), call bwillwrite before fo_write. 2006-01-27 08:02:25 +00:00
David Xu
03d66b36c7 return final error code in aio_return rather than a hardcoded 0. 2006-01-27 04:14:16 +00:00
Olivier Houchard
f94cf2b10b Take into account that bits 0x0000ff00 can't be used for minor. 2006-01-27 00:21:48 +00:00
Olivier Houchard
169c44907a Don't attempt to re-create the /dev entry for the slave part if it already
exist when opening the master. This can happen if one open the master, then
open the slave, then close and re-open the master.

Reported by:	Peter Holm
2006-01-26 20:54:49 +00:00
David Xu
55a122bf28 in aio_aqueue, store same return code into job->_aiocb_private.error.
in aio_return, unlock proc lock before suword.
2006-01-26 08:37:02 +00:00
Olivier Houchard
12af2a0f4f Bring in a sysv-style pts implementation, as found in the rwatson_pts perforce branch. It works the same as its SysV/linux counterpart : You obtain a fd to the master pseudo terminal by opening /dev/ptmx, which craetes a node for the master as /dev/pty[num] and a node for the slave as /dev/pts/[num].
It should play nicely with the existing BSD ptys.
By default, the system will use the BSD ptys, one can set the sysctl
kern.pts.enable to 1 to make it use the new pts system.
The max number of pty that can be allocated on a system can be changed with the
sysctl kern.pts.max. It defaults to 1000, and can be increased, but it is not
recommanded, as any pty with a number > 999 won't be handled by whatever uses
utmp(5).
2006-01-26 01:30:34 +00:00
John Baldwin
6b81555744 Axe KTR_ALQ_MASK now that KTR_WITNESS is off unless you hack an #ifdef
in subr_witness.c.  I did add a comment in subr_witness.c noting that
KTR_WITNESS is incompatible with KTR_ALQ.
2006-01-25 14:57:23 +00:00
Stephan Uphoff
6807424d19 Back out changes made in rev. 1.151.
They were bogus.

Cluebat applied by: jhb@
2006-01-25 02:05:47 +00:00
Don Lewis
f4af687a3b Touch all the pages wired by sysctl_wire_old_buffer() to avoid PTE
modified bit emulation traps on Alpha while holding locks in the
sysctl handler.

A better solution would be to pass a hint to the Alpha pmap code to
tell mark these pages as modified when they as they are being wired,
but that appears to be more difficult to implement.

Suggested by: jhb
MFC after:	3 days
2006-01-25 01:03:34 +00:00
John Baldwin
67f7fe8c01 Whitespace fix. 2006-01-24 22:24:05 +00:00
John Baldwin
2b604e82b2 - Add a new KTR_SUBSYS in place of KTR_SPARE1 to serve as a subsystem
placeholder similar to KTR_DEV.  Explain the use of KTR_DEV and
  KTR_SUBSYS in a comment as well.
- Retire KTR_WITNESS and instead have KTR_WITNESS default to off but use
  KTR_SUBSYS if it is enabled.
2006-01-24 22:23:45 +00:00
David Xu
1aa4c324ee Add locking annotation and comments about socket, pipe, fifo problem.
Temporarily fix a locking problem for socket I/O.
2006-01-24 07:24:24 +00:00
David Xu
e6bdc05ff7 Er, rescure a deleted comment line. 2006-01-24 02:50:42 +00:00
David Xu
bd793be3c6 More cleanup for aio code:
1) unregsiter kqueue filter for EVFILT_LIO.
2) free uma_zones.
3) call setsid directly to enter another session rather than
   implementing by itself.

Submitted by: jhb
2006-01-24 02:46:15 +00:00
David Xu
7f34b521c7 Add bracket. 2006-01-23 23:46:30 +00:00
John Baldwin
704c9f00fb Fix a vnode reference leak in the ktrace code. We always grab a reference
to the vnode at the start of ktr_writerequest() but were missing the
corresponding vrele() after we finished the write operation.

Reported by:	jasone
2006-01-23 21:45:32 +00:00
Stephan Uphoff
03001f59c8 Hopefully fix the "calcru: runtime went backwards from ..." problem by
keeping the resource values locked (where needed) while we use them
for calculations.

MFC after:	3 days
2006-01-23 19:15:13 +00:00
Andre Oppermann
fd2413398c In mb_zinit_pack() explicitly ignore the return value of uma_zalloc_arg().
The success of the cluster allocation is checked through a field in the
mbuf structure.  This change is non-functional but properly silences code
inspection tools.

Found by:	Coverity Prevent(tm)
Coverity ID:	CID807
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-01-23 15:49:01 +00:00
David Xu
68d7111884 Verify all supported notification types. 2006-01-23 10:27:15 +00:00
David Xu
a9bf5e37ae 1) Merge _aio_aqueue and aio_aqueue, check quota in aio_aqueue,
so that lio_listio won't exceed the quota.
2) Remove lio_ref_count, it is no longer used.
2006-01-23 02:49:34 +00:00
Alan Cox
bb53e2bf27 Remove an unnecessary call to pmap_remove_all(). The given page is not
mapped because its contents are invalid.

Reviewed by: tegge
2006-01-23 00:00:45 +00:00
Don Lewis
1dd5fc0fde Tweak previous vfs_lookup.c commit to return an EINVAL error from
lookup() instead of EPERM when a DELETE or RENAME operation is
attempted on "..".

In kern_unlink(), remap EINVAL errors returned from namei() to EPERM
to match existing (and POSIX required) behaviour.

Discussed with:	bde
MFC after:	3 days
2006-01-22 19:37:02 +00:00
David Xu
8c0d9af5bf Fix a bogus panic. 2006-01-22 09:39:59 +00:00
David Xu
9b84335c84 Decrease kaio_active_count first, because user process may go away
after we notified it.
2006-01-22 09:25:52 +00:00
David Xu
4ca4c9ee68 Regen. 2006-01-22 06:01:48 +00:00
David Xu
1ce9182407 Make aio code MP safe. 2006-01-22 05:59:27 +00:00
Nate Lawson
a2b31c5b4f Add a devd(8) event that is sent after the system resumes. This can be
used by utilities to reset moused(8), for example.  The syntax is:

    !system=kern subsystem=power type=resume

Note that it would be nice to have notification of suspend, but it's more
difficult since there would have to be a method of doing request/ack
to userland and automatically timing out if no response.  apm(4) has a
similar mechanism.

MFC after:	2 weeks
2006-01-22 01:06:25 +00:00
Robert Watson
c1250af683 Convert remaining functions to ANSI C function declarations.
MFC after:	1 week
2006-01-22 00:30:46 +00:00
Alan Cox
e5e6093ba9 Avoid a vm object reference leak in a rarely used code path.
An executable contains at most one PT_INTERP program header.  Therefore,
the loop that searches for it can terminate after it is found rather than
iterating over the entire set of program headers.

Eliminate an unneeded initialization.

Reviewed by: tegge
2006-01-21 20:11:49 +00:00
Don Lewis
bea7a8d75c Return EPERM from lookup() if cn_nameiop is DELETE or RENAME and
the last component of the path name is "..".  This keeps VOP_LOOKUP()
from locking vnodes in reverse order.

Tested by:	Denis Shaposhnikov <dsh AT vlink DOT ru>
MFC after:	3 days
2006-01-21 19:57:56 +00:00