Commit Graph

91 Commits

Author SHA1 Message Date
Alfred Perlstein
f132072368 Redo the sigio locking.
Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.
2002-05-01 20:44:46 +00:00
Seigo Tanimura
960ed29c4b Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.
Requested by:	bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.
2002-04-30 01:54:54 +00:00
Seigo Tanimura
acbbcc5f1d Fix the code fragment clobbered in my last commit. 2002-04-27 09:33:49 +00:00
Seigo Tanimura
d48d4b2501 Add a global sx sigio_lock to protect the pointer to the sigio object
of a socket.  This avoids lock order reversal caused by locking a
process in pgsigio().

sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now
require sigio_lock to be locked.  Provide sowwakeup_locked(),
soisconnected_locked(), and so on in case where we have to modify a
socket and wake up a process atomically.
2002-04-27 08:24:29 +00:00
Mike Silbersack
e1f1827f98 Make sure that sockets undergoing accept filtering are aborted in a
LRU fashion when the listen queue fills up.  Previously, there was
no mechanism to kick out old sockets, leading to an easy DoS of
daemons using accept filtering.

Reviewed by:	alfred
MFC after:	3 days
2002-04-26 02:07:46 +00:00
Mike Silbersack
c473d3e406 Remove sodropablereq - this function hasn't been used since the
syncache went in.

MFC after:	3 days
2002-04-24 04:11:08 +00:00
Jeff Roberson
54d77689ed Backout part of my previous commit; I was wrong about vm_zone's handling of
limits on zones w/o objects.
2002-03-20 04:39:32 +00:00
Jeff Roberson
c897b81311 Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets.  If you look carefully you'll notice that the old
zone allocator never honored this anyway.
2002-03-20 04:09:59 +00:00
Matthew Dillon
ecde8f7c29 Get rid of the twisted MFREE() macro entirely.
Reviewed by:	dg, bmilekic
MFC after:	3 days
2002-02-05 02:00:56 +00:00
Mike Silbersack
9f5193ca0b Revert 1.81; 1.19 fixed this already in a different way. 2002-01-09 01:45:17 +00:00
Mike Silbersack
5213c50d83 Reorder a calculation in sbreserve so that it does not overflow
with multi-megabyte socket buffer sizes.

PR:		7420
MFC after:	3 weeks
2002-01-06 06:50:54 +00:00
Alfred Perlstein
21d56e9c33 Make AIO a loadable module.
Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO
will use at_exit(9).

Add functions at_exec(9), rm_at_exec(9) which function nearly the
same as at_exec(9) and rm_at_exec(9), these functions are called
on behalf of modules at the time of execve(2) after the image
activator has run.

Use a modified version of tegge's suggestion via at_exec(9) to close
an exploitable race in AIO.

Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral,
the problem was that one had to pass it a paramater indicating the
number of arguments which were actually the number of "int".  Fix
it by using an inline version of the AS macro against the syscall
arguments.  (AS should be available globally but we'll get to that
later.)

Add a primative system for dynamically adding kqueue ops, it's really
not as sophisticated as it should be, but I'll discuss with jlemon when
he's around.
2001-12-29 07:13:47 +00:00
Peter Wemm
205b2b6107 Avoid an interaction between syncache and accept filters. The syncache
code only passed up the connection to the tcp stack when it was complete,
so it went directly into the so_comp (complete) queue.  However, with
accept filters, there is an additional phase before calling it "complete".

Reviewed by: jlemon
2001-12-21 04:30:49 +00:00
Robert Watson
f8cf411e49 o Back out portions of 1.50 and 1.47, eliminating sonewconn3() and
always deriving the credential for a newly accepted connection from
  the listen socket.  Previously, the selection of the credential
  depended on the protocol: UNIX domain sockets would use the
  connecting process's credential, and protocols supporting a creation
  of the socket before the receiving end called accept() would use
  the listening socket.  After this change, it is always the listening
  credential.

Reviewed by:	green
2001-12-13 22:09:37 +00:00
Matthew Dillon
b1e4abd246 Give struct socket structures a ref counting interface similar to
vnodes.  This will hopefully serve as a base from which we can
expand the MP code.  We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
2001-11-17 03:07:11 +00:00
John Baldwin
bd78cece5d Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
David Malone
59bdd40568 Allow sbcreatecontrol to make cluster sized control messages. 2001-10-04 12:59:53 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Jonathan Lemon
84241bd0dc Fix up indentation. 2001-06-29 04:01:38 +00:00
Peter Wemm
0978669829 "Fix" the previous initial attempt at fixing TUNABLE_INT(). This time
around, use a common function for looking up and extracting the tunables
from the kernel environment.  This saves duplicating the same function
over and over again.  This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
2001-06-08 05:24:21 +00:00
Peter Wemm
4422746fdf Back out part of my previous commit. This was a last minute change
and I botched testing.  This is a perfect example of how NOT to do
this sort of thing. :-(
2001-06-07 03:17:26 +00:00
Peter Wemm
81930014ef Make the TUNABLE_*() macros look and behave more consistantly like the
SYSCTL_*() macros.  TUNABLE_INT_DECL() was an odd name because it didn't
actually declare the int, which is what the name suggests it would do.
2001-06-06 22:17:08 +00:00
Jesper Skriver
5b86eac4e5 Revert the last bits of my bogus move of NMBCLUSTERS
to <sys/param.h>
2001-06-01 21:47:34 +00:00
Jesper Skriver
e916d96e64 Move the definition of NMBCLUSTERS from src/sys/kern/uipc_mbuf.c
to <sys/param.h>, so it's available to src/sys/netinet/ip_input.c,
and remove the now unneeded includes of "opt_param.h".

MFC after:	1 week
2001-05-31 21:56:44 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
David Malone
32af0d74f0 Make sbcompress use the new M_WRITABLE macro. Previously sbcompress
could not compress into clusters. This could result in lots of
wasted clusters while recieving small packets from an interface
that uses clusters for all it's packets.

Patch is partially from BSDi (limiting the size of the copy) and
based on a patch for 4.1 by Ian Dowse <iedowse@maths.tcd.ie> and
myself.

Reviewed by:	bmilekic
Obtained From:	BSDi
Submitted by:	iedowse
2000-11-19 22:22:47 +00:00
Don Lewis
f535380cb6 Remove uidinfo hash table lookup and maintenance out of chgproccnt() and
chgsbsize(), which are called rather frequently and may be called from an
interrupt context in the case of chgsbsize().  Instead, do the hash table
lookup and maintenance when credentials are changed, which is a lot less
frequent.  Add pointers to the uidinfo structures to the ucred and pcred
structures for fast access.  Pass a pointer to the credential to chgproccnt()
and chgsbsize() instead of passing the uid.  Add a reference count to the
uidinfo structure and use it to decide when to free the structure rather
than freeing the structure when the resource consumption drops to zero.
Move the resource tracking code from kern_proc.c to kern_resource.c.  Move
some duplicate code sequences in kern_prot.c to separate helper functions.
Change KASSERTs in this code to unconditional tests and calls to panic().
2000-09-05 22:11:13 +00:00
Brian Feldman
b6240737d5 Fix hangs caused by overzealous code removal.
Thanks, Nickolay, for figuring out this is the problem.

Submitted by:	Nickolay Dudorov <nnd@mail.nsk.ru>
2000-08-31 11:31:58 +00:00
Brian Feldman
343079d9b2 Remove an extraneous setting of sb_hiwat. 2000-08-30 00:09:57 +00:00
Brian Feldman
6aef685fbb Remove any possibility of hiwat-related race conditions by changing
the chgsbsize() call to use a "subject" pointer (&sb.sb_hiwat) and
a u_long target to set it to.  The whole thing is splnet().

This fixes a problem that jdp has been able to provoke.
2000-08-29 11:28:06 +00:00
Paul Saab
030f7b3faa Remove unnecessary call to splnet when setting an accept filter
since we are already at splnet.
2000-07-31 08:23:43 +00:00
Alfred Perlstein
c636255150 fix races in the uidinfo subsystem, several problems existed:
1) while allocating a uidinfo struct malloc is called with M_WAITOK,
   it's possible that while asleep another process by the same user
   could have woken up earlier and inserted an entry into the uid
   hash table.  Having redundant entries causes inconsistancies that
   we can't handle.

   fix: do a non-waiting malloc, and if that fails then do a blocking
   malloc, after waking up check that no one else has inserted an entry
   for us already.

2) Because many checks for sbsize were done as "test then set" in a non
   atomic manner it was possible to exceed the limits put up via races.

   fix: instead of querying the count then setting, we just attempt to
   set the count and leave it up to the function to return success or
   failure.

3) The uidinfo code was inlining and repeating, lookups and insertions
   and deletions needed to be in their own functions for clarity.

Reviewed by: green
2000-06-22 22:27:16 +00:00
Alfred Perlstein
a79b71281c return of the accept filter part II
accept filters are now loadable as well as able to be compiled into
the kernel.

two accept filters are provided, one that returns sockets when data
arrives the other when an http request is completed (doesn't work
with 0.9 requests)

Reviewed by: jmg
2000-06-20 01:09:23 +00:00
Alfred Perlstein
a72fda7154 backout accept optimizations.
Requested by: jmg, dcs, jdp, nate
2000-06-18 08:49:13 +00:00
Alfred Perlstein
8f4e4aa5f1 add socketoptions DELAYACCEPT and HTTPACCEPT which will not allow an accept()
until the incoming connection has either data waiting or what looks like a
HTTP request header already in the socketbuffer.  This ought to reduce
the context switch time and overhead for processing requests.

The initial idea and code for HTTPACCEPT came from Yahoo engineers and has
been cleaned up and a more lightweight DELAYACCEPT for non-http servers
has been added

Reviewed by: silence on hackers.
2000-06-15 18:18:43 +00:00
Jonathan Lemon
cb679c385e Introduce kqueue() and kevent(), a kernel event notification facility. 2000-04-16 18:53:38 +00:00
Yoshinobu Inoue
7d0d8dc306 CMSG_XXX macros alignment fixes to follow RFC2292.
Approved by: jkh

Submitted by: Partly from tech@openbsd
Reviewed by: itojun
2000-03-03 11:13:12 +00:00
Yoshinobu Inoue
0b97e97cd2 Add length check to sbcreatecontrol().
Now this check is necessary because IPv6 source routing might use
  control data bigger than MLEN. (e.g. 16bytes IPv6 addr x 23 hops)
  Actually mbuf cluster should be used in uipc_socket.c:sbcreatecontrol()
  and uipc_syscalls.c:sockargs() when data size is bigger then MLEN,
  and such patches were already in KAME environment and have been
  confirmed to work well. I just forgot to merge them into 4.0, sorry.

  For safety, I'll postpone such patches until after 4.0 release.
  The effect of postponement is followings.
    -Ping6 source routing hops are limitted to around 6 or so.
    -If some apps do setsockopt IPV6_RTHDR and try to receive
     incoming IPv6 source routing info, it can't receive more
     than 6 hops source routing info.
     (But currently, no apps seems to be doing it.)

Approved by: jkh
2000-02-24 19:21:26 +00:00
Jason Evans
bfbbc4aa44 Add aio_waitcomplete(). Make aio work correctly for socket descriptors.
Make gratuitous style(9) fixes (me, not the submitter) to make the aio
code more readable.

PR:		kern/12053
Submitted by:	Chris Sedore <cmsedore@maxwell.syr.edu>
2000-01-14 02:53:29 +00:00
Brian Feldman
ecf723083f Implement RLIMIT_SBSIZE in the kernel. This is a per-uid sockbuf total
usage limit.
1999-10-09 20:42:17 +00:00
Pierre Beyssac
23f84772ca In sbflush(), don't exit the while loop too early: this can cause
an empty mbuf to stay in the queue, then causing a needless panic
because sb_cc == 0 and sb_mbcnt != 0.

But we still need to panic rather than endlessly looping if, for
some reason, sb_cc == 0 and there are non-empty mbufs in the queue.

PR:		kern/11988
Reviewed by:	fenner
1999-09-28 12:59:18 +00:00
Brian Feldman
2f9a21326c Change so_cred's type to a ucred, not a pcred. THis makes more sense, actually.
Make a sonewconn3() which takes an extra argument (proc) so new sockets created
with sonewconn() from a user's system call get the correct credentials, not
just the parent's credentials.
1999-09-19 02:17:02 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Mike Smith
134c934ce7 Move the initialisation/tuning of nmbclusters from param.c/machdep.c
into uipc_mbuf.c.  This reduces three sets of identical tunable code to
one set, and puts the initialisation with the mbuf code proper.

Make NMBUFs tunable as well.

Move the nmbclusters sysctl here as well.

Move the initialisation of maxsockets from param.c to uipc_socket2.c,
next to its corresponding sysctl.

Use the new tunable macros for the kern.vm.kmem.size tunable (this should have
been in a separate commit, whoops).
1999-07-05 08:52:54 +00:00
Brian Feldman
f29be02190 Reviewed by: the cast of thousands
This is the change to struct sockets that gets rid of so_uid and replaces
it with a much more useful struct pcred *so_cred. This is here to be able
to do socket-level credential checks (i.e. IPFW uid/gid support, to be added
to HEAD soon). Along with this comes an update to pidentd which greatly
simplifies the code necessary to get a uid from a socket. Soon to come:
a sysctl() interface to finding individual sockets' credentials.
1999-06-17 23:54:50 +00:00
Peter Wemm
dc97381e37 Update one set of comments.. s/so_q0/so_incomp/ and s/so_q/so_comp/ (that's
incomplete and complete connections I think)
1999-05-10 18:15:40 +00:00
Bill Fumerola
3d177f465a Add sysctl descriptions to many SYSCTL_XXXs
PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)
1999-05-03 23:57:32 +00:00
Bill Fenner
527b7a14a5 Port NetBSD's 19990120-accept bug fix. This works around the race condition
where select(2) can return that a listening socket has a connected socket
queued, the connection is broken, and the user calls accept(2), which then
blocks because there are no connections queued.

Reviewed by:	wollman
Obtained from:	NetBSD
(ftp://ftp.NetBSD.ORG/pub/NetBSD/misc/security/patches/19990120-accept)
1999-01-25 16:58:56 +00:00
Archie Cobbs
f1d19042b0 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
Don Lewis
9d2b090975 We can't call fsetown() from sonewconn() because sonewconn() is be called
from an interrupt context and fsetown() wants to peek at curproc, call
malloc(..., M_WAITOK), and fiddle with various unprotected data structures.
The fix is to move the code that duplicates the F_SETOWN/FIOSETOWN state
of the original socket to the new socket from sonewconn() to accept1(),
since accept1() runs in the correct context.  Deferring this until the
process calls accept() is harmless since the process can't do anything
useful with SIGIO on the new socket until it has the descriptor for that
socket.

One could make the case for not bothering to duplicate the
F_SETOWN/FIOSETOWN state and requiring the process to explicitly make the
fcntl() or ioctl() call on the new socket, but this would be incompatible
with the previous implementation and might break programs which rely on
the old semantics.

This bug was discovered by Andrew Gallatin <gallatin@cs.duke.edu>.
1998-11-23 00:45:39 +00:00