Commit Graph

212 Commits

Author SHA1 Message Date
Brian Feldman
2f9a21326c Change so_cred's type to a ucred, not a pcred. THis makes more sense, actually.
Make a sonewconn3() which takes an extra argument (proc) so new sockets created
with sonewconn() from a user's system call get the correct credentials, not
just the parent's credentials.
1999-09-19 02:17:02 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Brian Feldman
f29be02190 Reviewed by: the cast of thousands
This is the change to struct sockets that gets rid of so_uid and replaces
it with a much more useful struct pcred *so_cred. This is here to be able
to do socket-level credential checks (i.e. IPFW uid/gid support, to be added
to HEAD soon). Along with this comes an update to pidentd which greatly
simplifies the code necessary to get a uid from a socket. Soon to come:
a sysctl() interface to finding individual sockets' credentials.
1999-06-17 23:54:50 +00:00
Peter Wemm
9c9906e912 Plug a mbuf leak in tcp_usr_send(). pru_send() routines are expected
to either enqueue or free their mbuf chains, but tcp_usr_send() was
dropping them on the floor if the tcpcb/inpcb has been torn down in the
middle of a send/write attempt.  This has been responsible for a wide
variety of mbuf leak patterns, ranging from slow gradual leakage to rather
rapid exhaustion.  This has been a problem since before 2.2 was branched
and appears to have been fixed in rev 1.16 and lost in 1.23/1.28.

Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking
(extensively) into this on a live production 2.2.x system and that it
was the actual cause of the leak and looks like it fixes it.  The machine
in question was loosing (from memory) about 150 mbufs per hour under
load and a change similar to this stopped it.  (Don't blame Jayanth
for this patch though)

An alternative approach to this would be to recheck SS_CANTSENDMORE etc
inside the splnet() right before calling pru_send() after all the potential
sleeps, interrupts and delays have happened.  However, this would mean
exposing knowledge of the tcp stack's reset handling and removal of the
pcb to the generic code.  There are other things that call pru_send()
directly though.

Problem originally noted by:  John Plevyak <jplevyak@inktomi.com>
1999-06-04 02:27:06 +00:00
Andrey A. Chernov
925fa5c3f5 Realy fix overflow on SO_*TIMEO
Submitted by: bde
1999-05-21 15:54:40 +00:00
Bill Fumerola
3d177f465a Add sysctl descriptions to many SYSCTL_XXXs
PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)
1999-05-03 23:57:32 +00:00
Andrey A. Chernov
02a3d5261d Lite2 bugfixes merge:
so_linger is in seconds, not in 1/HZ
range checking in SO_*TIMEO was wrong

PR: 11252
1999-04-24 18:22:34 +00:00
Doug Rabson
ce02431ffa * Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
  file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
	Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
1999-02-16 10:49:55 +00:00
Bill Fenner
8f70ac3e02 Fix the port of the NetBSD 19990120-accept fix. I misread a piece of
code when examining their fix, which caused my code (in rev 1.52) to:
- panic("soaccept: !NOFDREF")
- fatal trap 12, with tracebacks going thru soclose and soaccept
1999-02-02 07:23:28 +00:00
Matthew Dillon
d254af07a1 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-27 21:50:00 +00:00
Bill Fenner
527b7a14a5 Port NetBSD's 19990120-accept bug fix. This works around the race condition
where select(2) can return that a listening socket has a connected socket
queued, the connection is broken, and the user calls accept(2), which then
blocks because there are no connections queued.

Reviewed by:	wollman
Obtained from:	NetBSD
(ftp://ftp.NetBSD.ORG/pub/NetBSD/misc/security/patches/19990120-accept)
1999-01-25 16:58:56 +00:00
Bill Fenner
7b1777101c Also consider the space left in the socket buffer when deciding whether
to set PRUS_MORETOCOME.
1999-01-20 17:45:22 +00:00
Bill Fenner
b0acefa8d4 Add a flag, passed to pru_send routines, PRUS_MORETOCOME. This
flag means that there is more data to be put into the socket buffer.
Use it in TCP to reduce the interaction between mbuf sizes and the
Nagle algorithm.

Based on:	"Justin C. Walker" <justin@apple.com>'s description of Apple's
		fix for this problem.
1999-01-20 17:32:01 +00:00
Eivind Eklund
219cbf59f2 KNFize, by bde. 1999-01-10 01:58:29 +00:00
Eivind Eklund
5526d2d920 Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by:	msmith
1999-01-08 17:31:30 +00:00
Archie Cobbs
f1d19042b0 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
Don Lewis
831d27a9f5 Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices.  For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR:		kern/7899
Reviewed by:	bde, elvind
1998-11-11 10:04:13 +00:00
Garrett Wollman
9898afa1f1 Bow to tradition and correctly implement the bogus-but-hallowed semantics
of getsockopt never telling how much it might have copied if only the
buffer were big enough.
1998-08-31 18:07:23 +00:00
Garrett Wollman
d224dbc106 Correctly set the return length regardless of the relative size of the
user's buffer.  Simplify the logic a bit.  (Can we have a version of
min() for size_t?)
1998-08-31 15:34:55 +00:00
Garrett Wollman
cfe8b629f1 Yow! Completely change the way socket options are handled, eliminating
another specialized mbuf type in the process.  Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
1998-08-23 03:07:17 +00:00
Bill Fenner
0c495036b4 Undo rev 1.41 until we get more details about why it makes some systems
fail.
1998-07-18 18:48:45 +00:00
Bill Fenner
dece5b6a43 Introduce (fairly hacky) workaround for odd TCP behavior with application
writes of size (100,208]+N*MCLBYTES.

The bug:
 sosend() hands each mbuf off to the protocol output routine as soon as it
 has copied it, in the hopes of increasing parallelism (see
  http://www.kohala.com/~rstevens/vanj.88jul20.txt ). This works well for
 TCP as long as the first mbuf handed off is at least the MSS.  However,
 when doing small writes (between MHLEN and MINCLSIZE), the transaction is
 split into 2 small MBUF's and each is individually handed off to TCP.
 TCP assumes that the first small mbuf is the whole transaction, so sends
 a small packet.  When the second small mbuf arrives, Nagle prevents TCP
 from sending it so it must wait for a (potentially delayed) ACK.  This
 sends throughput down the toilet.

The workaround:
 Set the "atomic" flag when we're doing small writes.  The "atomic" flag
 has two meanings:
 1. Copy all of the data into a chain of mbufs before handing off to the
    protocol.
 2. Leave room for a datagram header in said mbuf chain.
 TCP wants the first but doesn't want the second.  However, the second
 simply results in some memory wastage (but is why the workaround is a
 hack and not a fix).

The real fix:
 The real fix for this problem is to introduce something like a "requested
 transfer size" variable in the socket->protocol interface.  sosend()
 would then accumulate an mbuf chain until it exceeded the "requested
 transfer size".  TCP could set it to the TCP MSS (note that the
 current interface causes strange TCP behaviors when the MSS > MCLBYTES;
 nobody notices because MCLBYTES > ethernet's MTU).
1998-07-06 19:27:14 +00:00
Garrett Wollman
98271db4d5 Convert socket structures to be type-stable and add a version number.
Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.

Convert PF_LOCAL PCB structures to be type-stable and add a version number.

Define an external format for infomation about socket structures and use
it in several places.

Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time.  This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.

It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).
1998-05-15 20:11:40 +00:00
Bruce Evans
08637435f2 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
Guido van Rooij
4049a04253 Make sure that you can only bind a more specific address when it is
done by the same uid.
Obtained from: OpenBSD
1998-03-01 19:39:29 +00:00
Bill Fenner
92f57d003c Revert sosend() to its behavior from 4.3-Tahoe and before: if
so_error is set, clear it before returning it.  The behavior
introduced in 4.3-Reno (to not clear so_error) causes potentially
transient errors (e.g.  ECONNREFUSED if the other end hasn't opened
its socket yet) to be permanent on connected datagram sockets that
are only used for writing.

(soreceive() clears so_error before returning it, as does
getsockopt(...,SO_ERROR,...).)

Submitted by:	Van Jacobson <van@ee.lbl.gov>, via a comment in the vat sources.
1998-02-19 19:38:20 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
Jordan K. Hubbard
64bd2f7b57 MF22: MSG_EOR bug fix.
Submitted by:	wollman
1997-11-09 05:07:40 +00:00
Poul-Henning Kamp
a1c995b626 Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde
1997-10-12 20:26:33 +00:00
Poul-Henning Kamp
eabecea346 While booting diskless we have no proc pointer. 1997-10-04 18:21:15 +00:00
Peter Wemm
e25aa68e0c Extend select backend for sockets to work with a poll interface (more
detail is passed back and forwards).  This mostly came from NetBSD, except
that our interfaces have changed a lot and this funciton is in a different
part of the kernel.

Obtained from: NetBSD
1997-09-14 02:34:14 +00:00
Bruce Evans
e4ba6a82b0 Removed unused #includes. 1997-09-02 20:06:59 +00:00
Bruce Evans
b1037dcd53 #include <machine/limits.h> explicitly in the few places that it is required. 1997-08-21 20:33:42 +00:00
Garrett Wollman
57bf258e3d Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs.  (Socket buffers are the one exception.)  A number
of kernel APIs needed to get fixed in order to make this happen.  Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead.  Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
1997-08-16 19:16:27 +00:00
Peter Wemm
006ad618b8 Don't accept insane values for SO_(SND|RCV)BUF, and the low water marks.
Specifically, don't allow a value < 1 for any of them (it doesn't make
sense), and don't let the low water mark be greater than the corresponding
high water mark.

Pre-Approved by: wollman
Obtained from: NetBSD
1997-06-27 15:28:54 +00:00
Garrett Wollman
a29f300e80 The long-awaited mega-massive-network-code- cleanup. Part I.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call.  Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken.  I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
1997-04-27 20:01:29 +00:00
Bruce Evans
3ac4d1ef0c Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place.  Most things don't depend on file.h stuff at all.
1997-03-23 03:37:54 +00:00
Garrett Wollman
639acc13e2 Create a new branch of the kernel MIB, kern.ipc, to store
all of the configurables and instrumentation related to
inter-process communication mechanisms.  Some variables,
like mbuf statistics, are instrumented here for the first
time.

For mbuf statistics: also keep track of m_copym() and
m_pullup() failures, and provide for the user's inspection
the compiled-in values of MSIZE, MHLEN, MCLBYTES, and MINCLSIZE.
1997-02-24 20:30:58 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
David Greenman
add2e5d0f4 Check for error return from uiomove to prevent looping endlessly in
soreceive(). Closes PR#2114.

Submitted by:	wpaul
1996-11-29 19:03:42 +00:00
Paul Traina
ebb0cbea75 Increase robustness of FreeBSD against high-rate connection attempt
denial of service attacks.

Reviewed by:	bde,wollman,olah
Inspired by:	vjs@sgi.com
1996-10-07 04:32:42 +00:00
Garrett Wollman
2c37256e5a Modify the kernel to use the new pr_usrreqs interface rather than the old
pr_usrreq mechanism which was poorly designed and error-prone.  This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner.  This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out).  This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted).  Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.
1996-07-11 16:32:50 +00:00
Garrett Wollman
82dab6ce62 Make it possible to return more than one piece of control information
(PR #1178).
Define a new SO_TIMESTAMP socket option for datagram sockets to return
packet-arrival timestamps  as control information (PR #1179).

Submitted by:	Louis Mamakos <loiue@TransSys.com>
1996-05-09 20:15:26 +00:00
David Greenman
46f578e76a Fix for PR #1146: the "next" pointer must be cached before calling soabort
since the struct containing it may be freed.
1996-04-16 03:50:08 +00:00
David Greenman
be24e9e8fa Changed socket code to use 4.4BSD queue macros. This includes removing
the obsolete soqinsque and soqremque functions as well as collapsing
so_q0len and so_qlen into a single queue length of unaccepted connections.
Now the queue of unaccepted & complete connections is checked directly
for queued sockets. The new code should be functionally equivilent to
the old while being substantially faster - especially in cases where
large numbers of connections are often queued for accept (e.g. http).
1996-03-11 15:37:44 +00:00
Garrett Wollman
dc915e7cfc Kill XNS.
While we're at it, fix socreate() to take a process argument.  (This
was supposed to get committed days ago...)
1996-02-13 18:16:31 +00:00
Garrett Wollman
b135805469 Define a new socket option, SO_PRIVSTATE. Getting it returns the state
of the SS_PRIV flag in so_state; setting it always clears same.
1996-02-07 16:19:19 +00:00
Bruce Evans
47daf5d5d6 Nuked ambiguous sleep message strings:
old:				new:
	netcls[] = "netcls"		"soclos"
	netcon[] = "netcon"		"accept", "connec"
	netio[] = "netio"		"sblock", "sbwait"
1995-12-14 22:51:13 +00:00
Garrett Wollman
ff5c09da20 Make somaxconn (maximum backlog in a listen(2) request) and sb_max
(maximum size of a socket buffer) tunable.

Permit callers of listen(2) to specify a negative backlog, which
is translated into somaxconn.  Previously, a negative backlog was
silently translated into 0.
1995-11-03 18:33:46 +00:00
Bruce Evans
5e319b84a1 Remove extra arg from one of the calls to (*pr_usrreq)(). 1995-08-25 20:27:46 +00:00
Rodney W. Grimes
9b2e535452 Remove trailing whitespace. 1995-05-30 08:16:23 +00:00
Garrett Wollman
5f540404a8 getsockopt(s, SOL_SOCKET, SO_SNDTIMEO, ...) would construct the returned
timeval incorrectly, truncating the usec part.

Obtained from: Stevens vol. 2 p. 548
1995-02-16 01:07:43 +00:00
Garrett Wollman
6b8fda4d12 Merge in the socket-level support for Transaction TCP. 1995-02-07 02:01:16 +00:00
David Greenman
a635d6c76a Use M_NOWAIT instead of M_KERNEL for socket allocations; it is apparantly
possible for certain socket operations to occur during interrupt context.

Submitted by:	John Dyson
1995-02-06 02:22:12 +00:00
David Greenman
9f518539fd Calling semantics for kmem_malloc() have been changed...and the third
argument is now more than just a single flag. (kern_malloc.c)
Used new M_KERNEL value for socket allocations that previous were
"M_NOWAIT". Note that this will change when we clean up the M_ namespace
mess.

Submitted by:	John Dyson
1995-02-02 08:49:08 +00:00
Poul-Henning Kamp
797f2d22f0 All of this is cosmetic. prototypes, #includes, printfs and so on. Makes
GCC a lot more silent.
1994-10-02 17:35:40 +00:00
David Greenman
3c4dd3568f Added $Id$ 1994-08-02 07:55:43 +00:00
David Greenman
3962127e78 Changed mbuf allocation policy to get a cluster if size > MINCLSIZE. Makes
a BIG difference in socket performance.
1994-05-29 07:48:17 +00:00
Rodney W. Grimes
26f9a76710 The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by:	Rodney W. Grimes
Submitted by:	John Dyson and David Greenman
1994-05-25 09:21:21 +00:00
Rodney W. Grimes
df8bae1de4 BSD 4.4 Lite Kernel Sources 1994-05-24 10:09:53 +00:00