104 Commits

Author SHA1 Message Date
phk
63d87674c8 Use m_length() instead of home-rolled versions. 2002-09-18 19:44:14 +00:00
dg
343fb29631 Further improved the performance of sbreserve() by moving the calculation
of the adjusted sb_max into a sysctl handler for sb_max and assigning it to
a variable that is used instead. This eliminates the 32bit multiply and
divide from the fast path that was being done previously.
2002-08-16 18:41:48 +00:00
dg
fc04ab53c9 Rewrote the space check algorithm in sbreserve() so that the extremely
expensive (!) 64bit multiply, divide, and comparison aren't necessary
(this came in originally from rev 1.19 to fix an overflow with large
sb_max or MCLBYTES).
The 64bit math in this function was measured in some kernel profiles as
being as much as 5-8% of the total overhead of the TCP/IP stack and
is eliminated with this commit. There is a harmless rounding error (of
about .4% with the standard values) introduced with this change,
however this is in the conservative direction (downward toward a
slightly smaller maximum socket buffer size).

MFC after:	3 days
2002-08-16 05:08:46 +00:00
rwatson
a5dcc1fd3d Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by:	bde
2002-08-01 17:47:56 +00:00
rwatson
e9b7aa2f59 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on sockets.
In particular, invoke entry points during socket allocation and
destruction, as well as creation by a process or during an
accept-scenario (sonewconn).  For UNIX domain sockets, also assign
a peer label.  As the socket code isn't locked down yet, locking
interactions are not yet clear.  Various protocol stack socket
operations (such as peer label assignment for IPv4) will follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 03:03:22 +00:00
dwmalone
ef1d91be31 If a socket is disconnected for some reason (like a TCP connection
not responding) then drop any data on the outgoing queue in
soisdisconnected because there is no way to get it to its destination
any longer.

The only objection to this patch I got on -net was from Terry, who
wasn't sure that the condition in question could arise, so I provided
some example code.
2002-07-27 23:06:52 +00:00
robert
38a65faa39 Fix -Werror build for sparc64: Use the appropriate conversion
specifier for an 'unsigned int' argument.
2002-07-26 12:57:57 +00:00
alfred
8cd894ca70 More caddr_t removal.
Change struct knote's kn_hook from caddr_t to void *.
2002-06-29 00:29:12 +00:00
tanimura
cb3347e926 Remove so*_locked(), which were backed out by mistake. 2002-06-18 07:42:02 +00:00
tanimura
e6fa9b9e92 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
silby
85e17a3398 Subtle fix to the accept filter LRU code. In some cases, a newly
initialized socket with no qlimit was being passed in.  In order
to handle this case properly, we must not use >= when comparing
queue sizes to qlimit.  As a result of this improper handling,
a panic could result in certain cases.

PR:		38325
MFC after:	3 days
2002-05-20 17:34:31 +00:00
tanimura
92d8381dd5 Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
tanimura
9070f27e7d Do not forget to increase the number of completely connected sockets in
soisconnected_locked().

Forgotten by:	tanimura
2002-05-07 16:17:44 +00:00
alfred
798c53d495 Redo the sigio locking.
Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.
2002-05-01 20:44:46 +00:00
tanimura
89ec521d91 Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.
Requested by:	bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.
2002-04-30 01:54:54 +00:00
tanimura
6d8e4294e0 Fix the code fragment clobbered in my last commit. 2002-04-27 09:33:49 +00:00
tanimura
dbb4756491 Add a global sx sigio_lock to protect the pointer to the sigio object
of a socket.  This avoids lock order reversal caused by locking a
process in pgsigio().

sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now
require sigio_lock to be locked.  Provide sowwakeup_locked(),
soisconnected_locked(), and so on in case where we have to modify a
socket and wake up a process atomically.
2002-04-27 08:24:29 +00:00
silby
dd3cd5fed6 Make sure that sockets undergoing accept filtering are aborted in a
LRU fashion when the listen queue fills up.  Previously, there was
no mechanism to kick out old sockets, leading to an easy DoS of
daemons using accept filtering.

Reviewed by:	alfred
MFC after:	3 days
2002-04-26 02:07:46 +00:00
silby
b4055530fc Remove sodropablereq - this function hasn't been used since the
syncache went in.

MFC after:	3 days
2002-04-24 04:11:08 +00:00
jeff
803cb2a2ba Backout part of my previous commit; I was wrong about vm_zone's handling of
limits on zones w/o objects.
2002-03-20 04:39:32 +00:00
jeff
318cbeeecf Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets.  If you look carefully you'll notice that the old
zone allocator never honored this anyway.
2002-03-20 04:09:59 +00:00
dillon
b3ddc72561 Get rid of the twisted MFREE() macro entirely.
Reviewed by:	dg, bmilekic
MFC after:	3 days
2002-02-05 02:00:56 +00:00
silby
4c0cf8914c Revert 1.81; 1.19 fixed this already in a different way. 2002-01-09 01:45:17 +00:00
silby
719af3e61a Reorder a calculation in sbreserve so that it does not overflow
with multi-megabyte socket buffer sizes.

PR:		7420
MFC after:	3 weeks
2002-01-06 06:50:54 +00:00
alfred
f097734c27 Make AIO a loadable module.
Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO
will use at_exit(9).

Add functions at_exec(9), rm_at_exec(9) which function nearly the
same as at_exec(9) and rm_at_exec(9), these functions are called
on behalf of modules at the time of execve(2) after the image
activator has run.

Use a modified version of tegge's suggestion via at_exec(9) to close
an exploitable race in AIO.

Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral,
the problem was that one had to pass it a paramater indicating the
number of arguments which were actually the number of "int".  Fix
it by using an inline version of the AS macro against the syscall
arguments.  (AS should be available globally but we'll get to that
later.)

Add a primative system for dynamically adding kqueue ops, it's really
not as sophisticated as it should be, but I'll discuss with jlemon when
he's around.
2001-12-29 07:13:47 +00:00
peter
d39f9b4517 Avoid an interaction between syncache and accept filters. The syncache
code only passed up the connection to the tcp stack when it was complete,
so it went directly into the so_comp (complete) queue.  However, with
accept filters, there is an additional phase before calling it "complete".

Reviewed by: jlemon
2001-12-21 04:30:49 +00:00
rwatson
36784fd2c4 o Back out portions of 1.50 and 1.47, eliminating sonewconn3() and
always deriving the credential for a newly accepted connection from
  the listen socket.  Previously, the selection of the credential
  depended on the protocol: UNIX domain sockets would use the
  connecting process's credential, and protocols supporting a creation
  of the socket before the receiving end called accept() would use
  the listening socket.  After this change, it is always the listening
  credential.

Reviewed by:	green
2001-12-13 22:09:37 +00:00
dillon
86ed17d675 Give struct socket structures a ref counting interface similar to
vnodes.  This will hopefully serve as a base from which we can
expand the MP code.  We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
2001-11-17 03:07:11 +00:00
jhb
4806d88677 Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
dwmalone
4be03ca19c Allow sbcreatecontrol to make cluster sized control messages. 2001-10-04 12:59:53 +00:00
julian
5596676e6c KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
jlemon
ac6b9aa8ec Fix up indentation. 2001-06-29 04:01:38 +00:00
peter
4b91e2ecf0 "Fix" the previous initial attempt at fixing TUNABLE_INT(). This time
around, use a common function for looking up and extracting the tunables
from the kernel environment.  This saves duplicating the same function
over and over again.  This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
2001-06-08 05:24:21 +00:00
peter
c1df44ae51 Back out part of my previous commit. This was a last minute change
and I botched testing.  This is a perfect example of how NOT to do
this sort of thing. :-(
2001-06-07 03:17:26 +00:00
peter
0732738ec4 Make the TUNABLE_*() macros look and behave more consistantly like the
SYSCTL_*() macros.  TUNABLE_INT_DECL() was an odd name because it didn't
actually declare the int, which is what the name suggests it would do.
2001-06-06 22:17:08 +00:00
jesper
42275c0786 Revert the last bits of my bogus move of NMBCLUSTERS
to <sys/param.h>
2001-06-01 21:47:34 +00:00
jesper
51b1367e42 Move the definition of NMBCLUSTERS from src/sys/kern/uipc_mbuf.c
to <sys/param.h>, so it's available to src/sys/netinet/ip_input.c,
and remove the now unneeded includes of "opt_param.h".

MFC after:	1 week
2001-05-31 21:56:44 +00:00
markm
bcca5847d5 Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
dwmalone
0d441a8a27 Make sbcompress use the new M_WRITABLE macro. Previously sbcompress
could not compress into clusters. This could result in lots of
wasted clusters while recieving small packets from an interface
that uses clusters for all it's packets.

Patch is partially from BSDi (limiting the size of the copy) and
based on a patch for 4.1 by Ian Dowse <iedowse@maths.tcd.ie> and
myself.

Reviewed by:	bmilekic
Obtained From:	BSDi
Submitted by:	iedowse
2000-11-19 22:22:47 +00:00
truckman
0575d8a3f9 Remove uidinfo hash table lookup and maintenance out of chgproccnt() and
chgsbsize(), which are called rather frequently and may be called from an
interrupt context in the case of chgsbsize().  Instead, do the hash table
lookup and maintenance when credentials are changed, which is a lot less
frequent.  Add pointers to the uidinfo structures to the ucred and pcred
structures for fast access.  Pass a pointer to the credential to chgproccnt()
and chgsbsize() instead of passing the uid.  Add a reference count to the
uidinfo structure and use it to decide when to free the structure rather
than freeing the structure when the resource consumption drops to zero.
Move the resource tracking code from kern_proc.c to kern_resource.c.  Move
some duplicate code sequences in kern_prot.c to separate helper functions.
Change KASSERTs in this code to unconditional tests and calls to panic().
2000-09-05 22:11:13 +00:00
green
dd66d69170 Fix hangs caused by overzealous code removal.
Thanks, Nickolay, for figuring out this is the problem.

Submitted by:	Nickolay Dudorov <nnd@mail.nsk.ru>
2000-08-31 11:31:58 +00:00
green
deb9fb1e44 Remove an extraneous setting of sb_hiwat. 2000-08-30 00:09:57 +00:00
green
0fa5ae2859 Remove any possibility of hiwat-related race conditions by changing
the chgsbsize() call to use a "subject" pointer (&sb.sb_hiwat) and
a u_long target to set it to.  The whole thing is splnet().

This fixes a problem that jdp has been able to provoke.
2000-08-29 11:28:06 +00:00
ps
7961d6fd99 Remove unnecessary call to splnet when setting an accept filter
since we are already at splnet.
2000-07-31 08:23:43 +00:00
alfred
7f71a1a091 fix races in the uidinfo subsystem, several problems existed:
1) while allocating a uidinfo struct malloc is called with M_WAITOK,
   it's possible that while asleep another process by the same user
   could have woken up earlier and inserted an entry into the uid
   hash table.  Having redundant entries causes inconsistancies that
   we can't handle.

   fix: do a non-waiting malloc, and if that fails then do a blocking
   malloc, after waking up check that no one else has inserted an entry
   for us already.

2) Because many checks for sbsize were done as "test then set" in a non
   atomic manner it was possible to exceed the limits put up via races.

   fix: instead of querying the count then setting, we just attempt to
   set the count and leave it up to the function to return success or
   failure.

3) The uidinfo code was inlining and repeating, lookups and insertions
   and deletions needed to be in their own functions for clarity.

Reviewed by: green
2000-06-22 22:27:16 +00:00
alfred
e3e72a583b return of the accept filter part II
accept filters are now loadable as well as able to be compiled into
the kernel.

two accept filters are provided, one that returns sockets when data
arrives the other when an http request is completed (doesn't work
with 0.9 requests)

Reviewed by: jmg
2000-06-20 01:09:23 +00:00
alfred
b88daebd21 backout accept optimizations.
Requested by: jmg, dcs, jdp, nate
2000-06-18 08:49:13 +00:00
alfred
dcf66cb4e2 add socketoptions DELAYACCEPT and HTTPACCEPT which will not allow an accept()
until the incoming connection has either data waiting or what looks like a
HTTP request header already in the socketbuffer.  This ought to reduce
the context switch time and overhead for processing requests.

The initial idea and code for HTTPACCEPT came from Yahoo engineers and has
been cleaned up and a more lightweight DELAYACCEPT for non-http servers
has been added

Reviewed by: silence on hackers.
2000-06-15 18:18:43 +00:00
jlemon
c41c876463 Introduce kqueue() and kevent(), a kernel event notification facility. 2000-04-16 18:53:38 +00:00
shin
73d476cc64 CMSG_XXX macros alignment fixes to follow RFC2292.
Approved by: jkh

Submitted by: Partly from tech@openbsd
Reviewed by: itojun
2000-03-03 11:13:12 +00:00