54 Commits

Author SHA1 Message Date
Garrett Wollman
98271db4d5 Convert socket structures to be type-stable and add a version number.
Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.

Convert PF_LOCAL PCB structures to be type-stable and add a version number.

Define an external format for infomation about socket structures and use
it in several places.

Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time.  This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.

It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).
1998-05-15 20:11:40 +00:00
Bruce Evans
8781d8e928 Fixed style bugs (mostly) in previous commit. 1998-03-28 10:18:26 +00:00
Garrett Wollman
3d4d47f398 Use the zone allocator to allocate inpcbs and tcpcbs. Each protocol creates
its own zone; this is used particularly by TCP which allocates both inpcb and
tcpcb in a single allocation.  (Some hackery ensures that the tcpcb is
reasonably aligned.)  Also keep track of the number of pcbs of each type
allocated, and keep a generation count (instance version number) for future
use.
1998-03-24 18:06:34 +00:00
David Greenman
c3229e05a3 Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.

Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
   to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
   hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
   be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
   the future, however.

These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.

Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.

WARNING: Anything that knows about inpcb and tcpcb structs will have to be
         recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
David Greenman
86b3ebce35 Call in_pcballoc() at splnet(). As near as I can tell, this won't fix
any instability problems, but it was wrong nonetheless and will be
required in an upcoming round of PCB changes.
1997-12-18 09:13:39 +00:00
Peter Wemm
f8f6cbba92 Update network code to use poll support. 1997-09-14 03:10:42 +00:00
Garrett Wollman
57bf258e3d Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs.  (Socket buffers are the one exception.)  A number
of kernel APIs needed to get fixed in order to make this happen.  Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead.  Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
1997-08-16 19:16:27 +00:00
Bruce Evans
1fd0b0588f Removed unused #includes. 1997-08-02 14:33:27 +00:00
Bill Fenner
911089957e Disallow writing raw IP packets shorter than the IP header. 1997-05-22 20:52:56 +00:00
Garrett Wollman
a29f300e80 The long-awaited mega-massive-network-code- cleanup. Part I.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call.  Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken.  I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
1997-04-27 20:01:29 +00:00
David Greenman
ca98b82c8d Reorganize elements of the inpcb struct to take better advantage of
cache lines. Removed the struct ip proto since only a couple of chars
were actually being used in it. Changed the order of compares in the
PCB hash lookup to take advantage of partial cache line fills (on PPro).

Discussed-with: wollman
1997-04-03 05:14:45 +00:00
David Greenman
ddd79a9790 Improved performance of hash algorithm while (hopefully) not reducing
the quality of the hash distribution. This does not fix a problem dealing
with poor distribution when using lots of IP aliases and listening
on the same port on every one of them...some other day perhaps; fixing
that requires significant code changes.
The use of xor was inspired by David S. Miller <davem@jenolan.rutgers.edu>
1997-03-03 09:23:37 +00:00
Garrett Wollman
117bcae7c4 Convert raw IP from mondo-switch-statement-from-Hell to
pr_usrreqs.  Collapse duplicates with udp_usrreq.c and
tcp_usrreq.c (calling the generic routines in uipc_socket2.c and
in_pcb.c).  Calling sockaddr()_ or peeraddr() on a detached
socket now traps, rather than harmlessly returning an error; this
should never happen.  Allow the raw IP buffer sizes to be
controlled via sysctl.
1997-02-18 20:46:36 +00:00
Garrett Wollman
39191c8eb8 Provide PRC_IFDOWN and PRC_IFUP support for IP. Now, when an interface
is administratively downed, all routes to that interface (including the
interface route itself) which are not static will be deleted.  When
it comes back up, and addresses remaining will have their interface routes
re-added.  This solves the problem where, for example, an Ethernet interface
is downed by traffic continues to flow by way of ARP entries.
1997-02-13 19:46:45 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
Garrett Wollman
294121822b Use queue macros for the list of interfaces. Next stop: ifaddrs! 1996-12-11 20:38:25 +00:00
Bill Fenner
82c23eba89 Add the IP_RECVIF socket option, which supplies a packet's incoming interface
using a sockaddr_dl.

Fix the other packet-information socket options (SO_TIMESTAMP, IP_RECVDSTADDR)
to work for multicast UDP and raw sockets as well.  (They previously only
worked for unicast UDP).
1996-11-11 04:56:32 +00:00
Bill Fenner
430d30d837 Don't allow reassembly to create packets bigger than IP_MAXPACKET, and count
attempts to do so.
Don't allow users to source packets bigger than IP_MAXPACKET.
Make UDP length and ipovly's protocol length unsigned short.

Reviewed by:	wollman
Submitted by:	(partly by) kml@nas.nasa.gov (Kevin Lahey)
1996-10-25 17:57:53 +00:00
Garrett Wollman
5893891624 All three files: make COMPAT_IPFW==0 case work again.
ip_input.c:
	- delete some dusty code
	- _IP_VHL
	- use fast inline header checksum when possible
1996-10-07 19:21:46 +00:00
Søren Schmidt
8f52c18724 Oops, send the operation type, not the name to the NAT code... 1996-08-27 20:52:27 +00:00
Søren Schmidt
fed1c7e9e4 Add hooks for an IP NAT module, much like the firewall stuff...
Move the sockopt definitions for the firewall code from
ip_fw.h to in.h where it belongs.
1996-08-21 21:37:07 +00:00
Garrett Wollman
5e2d069649 Eliminate some more references to separate ip_v and ip_hl fields. 1996-07-24 18:46:19 +00:00
Alexander Langer
c616bf2a71 Removed extraneous return. 1996-07-20 00:16:20 +00:00
Garrett Wollman
f9493383fc Conditionalize calls to IPFW code on COMPAT_IPFW. This is done slightly
unconventionally:
	If COMPAT_IPFW is not defined, or if it is defined to 1, enable;
otherwise, disable.

This means that these changes actually have no effect on anyone at the
moment.  (It just makes it easier for me to keep my code in sync.)
In the future, the `not defined' part of the hack should be eliminated,
but doing this now would require everyone to change their config files.

The same conditionals need to be made in ip_input.c as well for this to
ave any useful effect, but I'm not ready to do that right now.
1996-05-22 17:23:09 +00:00
Bill Fenner
e62b8c4939 Make rip_input() take the header length
Move ipip_input() and rsvp_input() prototypes to ip_var.h
Remove unused prototype for rip_ip_input() from ip_var.h
Remove unused variable *opts from rip_output()
1996-03-26 19:16:46 +00:00
Paul Traina
072b9b24e3 Fix ip option processing for raw IP sockets. This whole thing is a compromise
between ignoring options specified in the setsockopt call if IP_HDRINCL is set
(the UCB choice when VJ's code was brought in) vs allowing them (what everyone
else did, and what is assumed by programs everywhere...sigh).

Also perform some checking of the passed down packet to avoid running off
the end of a mbuf chain.

Reviewed by:	fenner
1996-03-13 08:02:45 +00:00
David Greenman
2ee45d7d28 Move or add #include <queue.h> in preparation for upcoming struct socket
changes.
1996-03-11 15:13:58 +00:00
Poul-Henning Kamp
09bb5f7589 Make getsockopt() capable of handling more than one mbuf worth of data.
Use this to read rules out of ipfw.
Add the lkm code to ipfw.c
1996-02-24 13:38:28 +00:00
Poul-Henning Kamp
e7319bab6b Big sweep over the IPFIREWALL and IPACCT code.
Close the ip-fragment hole.
Waste less memory.
Rewrite to contemporary more readable style.
Kill separate IPACCT facility, use "accept" rules in IPFIREWALL.
Filter incoming >and< outgoing packets.
Replace "policy" by sticky "deny all" rule.
Rules have numbers used for ordering and deletion.
Remove "rerorder" code entirely.
Count packet & bytecount matches for rules.

Code in -current & -stable is now the same.
1996-02-23 15:47:58 +00:00
Poul-Henning Kamp
f6d24a780b Staticize. 1995-12-09 20:43:53 +00:00
Poul-Henning Kamp
0312fbe97d New style sysctl & staticize alot of stuff. 1995-11-14 20:34:56 +00:00
David Greenman
c1f8a6cefa Fix panic caused by PRU_CONTROL not being dealt with properly. Bug pointed
out by David Maltz <dmaltz@orval.mach.cs.cmu.edu>, but this fix is by me.
1995-10-21 02:12:20 +00:00
Garrett Wollman
25f26ad85a Merge with 4.4-Lite-2: fix bug that caused getsockopt of IP_HDRINCL
to fail.

Obtained from:	4.4BSD-Lite-2
1995-09-21 19:59:43 +00:00
Garrett Wollman
b4489dc30a Completely turn off RSVP intercept when a socket being used for that purpose
is PRU_DETACHed.  This solves the problem that RSVP would not come up inm
raw mode if previously killed.
1995-07-24 16:33:51 +00:00
Garrett Wollman
1c5de19afb Kernel side of 3.5 multicast routing code, based on work by Bill Fenner
and other work done here.  The LKM support is probably broken, but it
still compiles and will be fixed later.
1995-06-13 17:51:16 +00:00
Rodney W. Grimes
9b2e535452 Remove trailing whitespace. 1995-05-30 08:16:23 +00:00
Andrey A. Chernov
2f632e8fe8 Fix getsockopt(IP_ACCT_*) to not panic kernel
Submitted by: Bill Fenner <fenner@parc.xerox.com>
1995-05-12 20:00:21 +00:00
David Greenman
15bd2b4385 Implemented PCB hashing. Includes new functions in_pcbinshash, in_pcbrehash,
and in_pcblookuphash.
1995-04-09 01:29:31 +00:00
Garrett Wollman
d99c7a23fa This set of patches enables IP multicasting to work under FreeBSD. I am
submitting them as context diffs for the following files:

sys/netinet/ip_mroute.c
sys/netinet/ip_var.h
sys/netinet/raw_ip.c
usr.sbin/mrouted/igmp.c
usr.sbin/mrouted/prune.c

The routine rip_ip_input in raw_ip.c is suggested by Mark Tinguely
(tinguely@plains.nodak.edu). I have been running mrouted with these patches
for over a week and nothing has seemed seriously wrong. It is being run in
two places on our network as a tunnel on one and a subnet querier on the
other. The only problem I have run into is that mrouted on the tunnel must
start up last or the pruning isn't done correctly and multicast packets
flood your subnets.

Submitted by:	Soochon Radee <slr@mitre.org>
1995-03-16 16:25:55 +00:00
Poul-Henning Kamp
c70f45100d YFfix. 1995-02-14 06:28:25 +00:00
Garrett Wollman
838ecf4225 Make sure to disable RSVP intercept when the socket is closed. 1995-02-07 02:53:14 +00:00
Garrett Wollman
479bb8da0e Correct long-standing error in the RSVP hooks (would initialize but never
return success).
1995-01-26 18:59:02 +00:00
Ugen J.S. Antsilevich
4dd1662b4c Actual firewall change.
1) Firewall is not subdivided on forwarding / blocking chains
   anymore.Actually only one chain left-it was the blocking one.
2) LKM support.ip_fwdef.c is function pointers definition and
goes into kernel along with all INET stuff.
1995-01-12 13:06:32 +00:00
David Greenman
aedcdea1de Fixed mbuf lossage when level != IPPROTO_IP. Problem reported by Robert
Dobbs, hint from Charles Hannum, fix by me.
1995-01-12 10:53:25 +00:00
Ugen J.S. Antsilevich
3107b31b8d Add clear one accounting entry control.
Structure fields changed to seem more standart.
1994-12-13 15:57:34 +00:00
Ugen J.S. Antsilevich
10a642bb05 Add match by interface from which packet arrived (via)
Handle right fragmented packets. Remove checking option
from kernel..
1994-12-12 17:20:55 +00:00
Jordan K. Hubbard
63f8d699ac Ugen J.S.Antsilevich's latest, happiest, IP firewall code.
Poul:  Please take this into BETA.  It's non-intrusive, and a rather
substantial improvement over what was there before.
1994-11-16 10:17:11 +00:00
Jordan K. Hubbard
ad63b51399 2 11th-hour fixes from Ugen (not Uben, sorry!) J.S.Antsilevich.
I think it's time for Ugen to get a freefall account, just so I can
direct mail at him directly and let him drop off patches for us here.  Ugen?
Done!
Submitted by:	ugen
1994-11-07 10:01:32 +00:00
Jordan K. Hubbard
100ba1a617 IP Firewall code from Daniel Boulet and J.S.Antsilevich
Submitted by:	danny ugen
1994-10-28 15:09:49 +00:00
Poul-Henning Kamp
623ae52e4e GCC cleanup.
Reviewed by:
Submitted by:
Obtained from:
1994-10-02 17:48:58 +00:00