Commit Graph

83 Commits

Author SHA1 Message Date
jlemon
0147c71899 Check so_error in filt_so{read|write} in order to detect UDP errors.
PR: 21601
2000-09-28 04:41:22 +00:00
truckman
a233f9bd24 Remove uidinfo hash table lookup and maintenance out of chgproccnt() and
chgsbsize(), which are called rather frequently and may be called from an
interrupt context in the case of chgsbsize().  Instead, do the hash table
lookup and maintenance when credentials are changed, which is a lot less
frequent.  Add pointers to the uidinfo structures to the ucred and pcred
structures for fast access.  Pass a pointer to the credential to chgproccnt()
and chgsbsize() instead of passing the uid.  Add a reference count to the
uidinfo structure and use it to decide when to free the structure rather
than freeing the structure when the resource consumption drops to zero.
Move the resource tracking code from kern_proc.c to kern_resource.c.  Move
some duplicate code sequences in kern_prot.c to separate helper functions.
Change KASSERTs in this code to unconditional tests and calls to panic().
2000-09-05 22:11:13 +00:00
green
255db4f2fa Remove any possibility of hiwat-related race conditions by changing
the chgsbsize() call to use a "subject" pointer (&sb.sb_hiwat) and
a u_long target to set it to.  The whole thing is splnet().

This fixes a problem that jdp has been able to provoke.
2000-08-29 11:28:06 +00:00
jlemon
54a240077c Make the kqueue socket read filter honor the SO_RCVLOWAT value.
Spotted by:  "Steve M." <stevem@redlinenetworks.com>
2000-08-07 17:52:08 +00:00
alfred
7040933f3d only allow accept filter modifications on listening sockets
Submitted by: ps
2000-07-20 12:17:17 +00:00
alfred
6bd686ba79 fix races in the uidinfo subsystem, several problems existed:
1) while allocating a uidinfo struct malloc is called with M_WAITOK,
   it's possible that while asleep another process by the same user
   could have woken up earlier and inserted an entry into the uid
   hash table.  Having redundant entries causes inconsistancies that
   we can't handle.

   fix: do a non-waiting malloc, and if that fails then do a blocking
   malloc, after waking up check that no one else has inserted an entry
   for us already.

2) Because many checks for sbsize were done as "test then set" in a non
   atomic manner it was possible to exceed the limits put up via races.

   fix: instead of querying the count then setting, we just attempt to
   set the count and leave it up to the function to return success or
   failure.

3) The uidinfo code was inlining and repeating, lookups and insertions
   and deletions needed to be in their own functions for clarity.

Reviewed by: green
2000-06-22 22:27:16 +00:00
alfred
e1d072e2ef return of the accept filter part II
accept filters are now loadable as well as able to be compiled into
the kernel.

two accept filters are provided, one that returns sockets when data
arrives the other when an http request is completed (doesn't work
with 0.9 requests)

Reviewed by: jmg
2000-06-20 01:09:23 +00:00
alfred
aa49a5b0c1 backout accept optimizations.
Requested by: jmg, dcs, jdp, nate
2000-06-18 08:49:13 +00:00
alfred
1ce0cad2c5 add socketoptions DELAYACCEPT and HTTPACCEPT which will not allow an accept()
until the incoming connection has either data waiting or what looks like a
HTTP request header already in the socketbuffer.  This ought to reduce
the context switch time and overhead for processing requests.

The initial idea and code for HTTPACCEPT came from Yahoo engineers and has
been cleaned up and a more lightweight DELAYACCEPT for non-http servers
has been added

Reviewed by: silence on hackers.
2000-06-15 18:18:43 +00:00
asmodai
8db34fc9f6 Fix panic by moving the prp == 0 check up the order of sanity checks.
Submitted by:	Bart Thate <freebsd@1st.dudi.org> on -current
Approved by:	rwatson
2000-06-13 15:44:04 +00:00
rwatson
1079eb6c86 o Modify jail to limit creation of sockets to UNIX domain sockets,
TCP/IP (v4) sockets, and routing sockets.  Previously, interaction
  with IPv6 was not well-defined, and might be inappropriate for some
  environments.  Similarly, sysctl MIB entries providing interface
  information also give out only addresses from those protocol domains.

  For the time being, this functionality is enabled by default, and
  toggleable using the sysctl variable jail.socket_unixiproute_only.
  In the future, protocol domains will be able to determine whether or
  not they are ``jail aware''.

o Further limitations on process use of getpriority() and setpriority()
  by jailed processes.  Addresses problem described in kern/17878.

Reviewed by:	phk, jmg
2000-06-04 04:28:31 +00:00
jake
5e208b0c18 Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
jake
1d685644e0 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
jlemon
6edfb1e197 Introduce kqueue() and kevent(), a kernel event notification facility. 2000-04-16 18:53:38 +00:00
fenner
145dc30b40 Make sure to free the socket in soabort() if the protocol couldn't
free it (this could happen if the protocol already freed its part
 and we just kept the socket around to make sure accept(2) didn't block)
2000-03-18 08:56:56 +00:00
jasone
748f7489d6 Add aio_waitcomplete(). Make aio work correctly for socket descriptors.
Make gratuitous style(9) fixes (me, not the submitter) to make the aio
code more readable.

PR:		kern/12053
Submitted by:	Chris Sedore <cmsedore@maxwell.syr.edu>
2000-01-14 02:53:29 +00:00
green
34597fd33d Correct an uninitialized variable use, which, unlike most times, is
actually a bug this time.

Submitted by:	bde
Reviewed by:	bde
1999-12-27 06:31:53 +00:00
green
407aeb0ec5 This is Bosko Milekic's mbuf allocation waiting code. Basically, this
means that running out of mbuf space isn't a panic anymore, and code
which runs out of network memory will sleep to wait for it.

Submitted by:	Bosko Milekic <bmilekic@dsuper.net>
Reviewed by:	green, wollman
1999-12-12 05:52:51 +00:00
shin
69e26060ce KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
1999-11-22 02:45:11 +00:00
phk
bed630ce3c This is a partial commit of the patch from PR 14914:
Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
   structures for list operations.  This patch makes all list operations
   in sys/kern use the queue(3) macros, rather than directly accessing the
   *Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by:    phk
Submitted by:   Jake Burkholder <jake@checker.org>
PR:     14914
1999-11-16 10:56:05 +00:00
green
808cf376d5 Implement RLIMIT_SBSIZE in the kernel. This is a per-uid sockbuf total
usage limit.
1999-10-09 20:42:17 +00:00
green
d61a71e09e Change so_cred's type to a ucred, not a pcred. THis makes more sense, actually.
Make a sonewconn3() which takes an extra argument (proc) so new sockets created
with sonewconn() from a user's system call get the correct credentials, not
just the parent's credentials.
1999-09-19 02:17:02 +00:00
peter
e4b04a2b21 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
green
64c465e7ff Reviewed by: the cast of thousands
This is the change to struct sockets that gets rid of so_uid and replaces
it with a much more useful struct pcred *so_cred. This is here to be able
to do socket-level credential checks (i.e. IPFW uid/gid support, to be added
to HEAD soon). Along with this comes an update to pidentd which greatly
simplifies the code necessary to get a uid from a socket. Soon to come:
a sysctl() interface to finding individual sockets' credentials.
1999-06-17 23:54:50 +00:00
peter
3d72629695 Plug a mbuf leak in tcp_usr_send(). pru_send() routines are expected
to either enqueue or free their mbuf chains, but tcp_usr_send() was
dropping them on the floor if the tcpcb/inpcb has been torn down in the
middle of a send/write attempt.  This has been responsible for a wide
variety of mbuf leak patterns, ranging from slow gradual leakage to rather
rapid exhaustion.  This has been a problem since before 2.2 was branched
and appears to have been fixed in rev 1.16 and lost in 1.23/1.28.

Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking
(extensively) into this on a live production 2.2.x system and that it
was the actual cause of the leak and looks like it fixes it.  The machine
in question was loosing (from memory) about 150 mbufs per hour under
load and a change similar to this stopped it.  (Don't blame Jayanth
for this patch though)

An alternative approach to this would be to recheck SS_CANTSENDMORE etc
inside the splnet() right before calling pru_send() after all the potential
sleeps, interrupts and delays have happened.  However, this would mean
exposing knowledge of the tcp stack's reset handling and removal of the
pcb to the generic code.  There are other things that call pru_send()
directly though.

Problem originally noted by:  John Plevyak <jplevyak@inktomi.com>
1999-06-04 02:27:06 +00:00
ache
e3b9d8a6f6 Realy fix overflow on SO_*TIMEO
Submitted by: bde
1999-05-21 15:54:40 +00:00
billf
583ea25cd7 Add sysctl descriptions to many SYSCTL_XXXs
PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)
1999-05-03 23:57:32 +00:00
ache
c6fcc2009f Lite2 bugfixes merge:
so_linger is in seconds, not in 1/HZ
range checking in SO_*TIMEO was wrong

PR: 11252
1999-04-24 18:22:34 +00:00
dfr
f886a4c996 * Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
  file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
	Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
1999-02-16 10:49:55 +00:00
fenner
c8d45a49bc Fix the port of the NetBSD 19990120-accept fix. I misread a piece of
code when examining their fix, which caused my code (in rev 1.52) to:
- panic("soaccept: !NOFDREF")
- fatal trap 12, with tracebacks going thru soclose and soaccept
1999-02-02 07:23:28 +00:00
dillon
7b4c630144 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-27 21:50:00 +00:00
fenner
145db44792 Port NetBSD's 19990120-accept bug fix. This works around the race condition
where select(2) can return that a listening socket has a connected socket
queued, the connection is broken, and the user calls accept(2), which then
blocks because there are no connections queued.

Reviewed by:	wollman
Obtained from:	NetBSD
(ftp://ftp.NetBSD.ORG/pub/NetBSD/misc/security/patches/19990120-accept)
1999-01-25 16:58:56 +00:00
fenner
778f7f8150 Also consider the space left in the socket buffer when deciding whether
to set PRUS_MORETOCOME.
1999-01-20 17:45:22 +00:00
fenner
646dbcde27 Add a flag, passed to pru_send routines, PRUS_MORETOCOME. This
flag means that there is more data to be put into the socket buffer.
Use it in TCP to reduce the interaction between mbuf sizes and the
Nagle algorithm.

Based on:	"Justin C. Walker" <justin@apple.com>'s description of Apple's
		fix for this problem.
1999-01-20 17:32:01 +00:00
eivind
35a471f45d KNFize, by bde. 1999-01-10 01:58:29 +00:00
eivind
f708b467c2 Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by:	msmith
1999-01-08 17:31:30 +00:00
archie
84bd80a4f9 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
truckman
beda60fe4e Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices.  For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR:		kern/7899
Reviewed by:	bde, elvind
1998-11-11 10:04:13 +00:00
wollman
02fd01605f Bow to tradition and correctly implement the bogus-but-hallowed semantics
of getsockopt never telling how much it might have copied if only the
buffer were big enough.
1998-08-31 18:07:23 +00:00
wollman
ee40cc18de Correctly set the return length regardless of the relative size of the
user's buffer.  Simplify the logic a bit.  (Can we have a version of
min() for size_t?)
1998-08-31 15:34:55 +00:00
wollman
fc229edf09 Yow! Completely change the way socket options are handled, eliminating
another specialized mbuf type in the process.  Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
1998-08-23 03:07:17 +00:00
fenner
067636ad36 Undo rev 1.41 until we get more details about why it makes some systems
fail.
1998-07-18 18:48:45 +00:00
fenner
ef5549b007 Introduce (fairly hacky) workaround for odd TCP behavior with application
writes of size (100,208]+N*MCLBYTES.

The bug:
 sosend() hands each mbuf off to the protocol output routine as soon as it
 has copied it, in the hopes of increasing parallelism (see
  http://www.kohala.com/~rstevens/vanj.88jul20.txt ). This works well for
 TCP as long as the first mbuf handed off is at least the MSS.  However,
 when doing small writes (between MHLEN and MINCLSIZE), the transaction is
 split into 2 small MBUF's and each is individually handed off to TCP.
 TCP assumes that the first small mbuf is the whole transaction, so sends
 a small packet.  When the second small mbuf arrives, Nagle prevents TCP
 from sending it so it must wait for a (potentially delayed) ACK.  This
 sends throughput down the toilet.

The workaround:
 Set the "atomic" flag when we're doing small writes.  The "atomic" flag
 has two meanings:
 1. Copy all of the data into a chain of mbufs before handing off to the
    protocol.
 2. Leave room for a datagram header in said mbuf chain.
 TCP wants the first but doesn't want the second.  However, the second
 simply results in some memory wastage (but is why the workaround is a
 hack and not a fix).

The real fix:
 The real fix for this problem is to introduce something like a "requested
 transfer size" variable in the socket->protocol interface.  sosend()
 would then accumulate an mbuf chain until it exceeded the "requested
 transfer size".  TCP could set it to the TCP MSS (note that the
 current interface causes strange TCP behaviors when the MSS > MCLBYTES;
 nobody notices because MCLBYTES > ethernet's MTU).
1998-07-06 19:27:14 +00:00
wollman
2cb299f9a3 Convert socket structures to be type-stable and add a version number.
Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.

Convert PF_LOCAL PCB structures to be type-stable and add a version number.

Define an external format for infomation about socket structures and use
it in several places.

Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time.  This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.

It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).
1998-05-15 20:11:40 +00:00
bde
878d31d269 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
guido
3f4ca728b5 Make sure that you can only bind a more specific address when it is
done by the same uid.
Obtained from: OpenBSD
1998-03-01 19:39:29 +00:00
fenner
dcb9d92a93 Revert sosend() to its behavior from 4.3-Tahoe and before: if
so_error is set, clear it before returning it.  The behavior
introduced in 4.3-Reno (to not clear so_error) causes potentially
transient errors (e.g.  ECONNREFUSED if the other end hasn't opened
its socket yet) to be permanent on connected datagram sockets that
are only used for writing.

(soreceive() clears so_error before returning it, as does
getsockopt(...,SO_ERROR,...).)

Submitted by:	Van Jacobson <van@ee.lbl.gov>, via a comment in the vat sources.
1998-02-19 19:38:20 +00:00
eivind
15aa079292 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
eivind
d8f3bc5b0e Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
jkh
9d4dc375cd MF22: MSG_EOR bug fix.
Submitted by:	wollman
1997-11-09 05:07:40 +00:00