2005-01-07 01:45:51 +00:00
|
|
|
/*-
|
1995-09-21 17:55:49 +00:00
|
|
|
* Copyright (c) 1982, 1986, 1991, 1993, 1995
|
1994-05-24 10:09:53 +00:00
|
|
|
* The Regents of the University of California. All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 4. Neither the name of the University nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1995-09-21 17:55:49 +00:00
|
|
|
* @(#)in_pcb.c 8.4 (Berkeley) 5/24/95
|
1999-08-28 01:08:13 +00:00
|
|
|
* $FreeBSD$
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
|
1999-12-22 19:13:38 +00:00
|
|
|
#include "opt_ipsec.h"
|
1999-12-07 17:39:16 +00:00
|
|
|
#include "opt_inet6.h"
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
#include "opt_mac.h"
|
1999-12-07 17:39:16 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/systm.h>
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
#include <sys/mac.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/malloc.h>
|
|
|
|
#include <sys/mbuf.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <sys/domain.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/protosw.h>
|
|
|
|
#include <sys/socket.h>
|
|
|
|
#include <sys/socketvar.h>
|
|
|
|
#include <sys/proc.h>
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
#include <sys/jail.h>
|
1996-01-19 08:00:58 +00:00
|
|
|
#include <sys/kernel.h>
|
|
|
|
#include <sys/sysctl.h>
|
1998-03-28 10:18:26 +00:00
|
|
|
|
2002-03-20 05:48:55 +00:00
|
|
|
#include <vm/uma.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
#include <net/if.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <net/if_types.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <net/route.h>
|
|
|
|
|
|
|
|
#include <netinet/in.h>
|
|
|
|
#include <netinet/in_pcb.h>
|
|
|
|
#include <netinet/in_var.h>
|
|
|
|
#include <netinet/ip_var.h>
|
2003-02-19 22:32:43 +00:00
|
|
|
#include <netinet/tcp_var.h>
|
2005-01-02 01:50:57 +00:00
|
|
|
#include <netinet/udp.h>
|
|
|
|
#include <netinet/udp_var.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
|
|
|
#include <netinet/ip6.h>
|
|
|
|
#include <netinet6/ip6_var.h>
|
|
|
|
#endif /* INET6 */
|
|
|
|
|
|
|
|
#ifdef IPSEC
|
|
|
|
#include <netinet6/ipsec.h>
|
|
|
|
#include <netkey/key.h>
|
|
|
|
#endif /* IPSEC */
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2002-10-16 02:25:05 +00:00
|
|
|
#ifdef FAST_IPSEC
|
|
|
|
#if defined(IPSEC) || defined(IPSEC_ESP)
|
|
|
|
#error "Bad idea: don't compile with both IPSEC and FAST_IPSEC!"
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#include <netipsec/ipsec.h>
|
|
|
|
#include <netipsec/key.h>
|
|
|
|
#endif /* FAST_IPSEC */
|
|
|
|
|
1996-01-19 08:00:58 +00:00
|
|
|
/*
|
|
|
|
* These configure the range of local port addresses assigned to
|
|
|
|
* "unspecified" outgoing connections/packets/whatever.
|
|
|
|
*/
|
1999-11-22 02:45:11 +00:00
|
|
|
int ipport_lowfirstauto = IPPORT_RESERVED - 1; /* 1023 */
|
|
|
|
int ipport_lowlastauto = IPPORT_RESERVEDSTART; /* 600 */
|
2002-03-22 03:28:11 +00:00
|
|
|
int ipport_firstauto = IPPORT_HIFIRSTAUTO; /* 49152 */
|
|
|
|
int ipport_lastauto = IPPORT_HILASTAUTO; /* 65535 */
|
1999-11-22 02:45:11 +00:00
|
|
|
int ipport_hifirstauto = IPPORT_HIFIRSTAUTO; /* 49152 */
|
|
|
|
int ipport_hilastauto = IPPORT_HILASTAUTO; /* 65535 */
|
1996-01-19 08:00:58 +00:00
|
|
|
|
The ancient and outdated concept of "privileged ports" in UNIX-type
OSes has probably caused more problems than it ever solved. Allow the
user to retire the old behavior by specifying their own privileged
range with,
net.inet.ip.portrange.reservedhigh default = IPPORT_RESERVED - 1
net.inet.ip.portrange.reservedlo default = 0
Now you can run that webserver without ever needing root at all. Or
just imagine, an ftpd that can really drop privileges, rather than
just set the euid, and still do PORT data transfers from 20/tcp.
Two edge cases to note,
# sysctl net.inet.ip.portrange.reservedhigh=0
Opens all ports to everyone, and,
# sysctl net.inet.ip.portrange.reservedhigh=65535
Locks all network activity to root only (which could actually have
been achieved before with ipfw(8), but is somewhat more
complicated).
For those who stick to the old religion that 0-1023 belong to root and
root alone, don't touch the knobs (or even lock them by raising
securelevel(8)), and nothing changes.
2003-02-21 05:28:27 +00:00
|
|
|
/*
|
|
|
|
* Reserved ports accessible only to root. There are significant
|
|
|
|
* security considerations that must be accounted for when changing these,
|
|
|
|
* but the security benefits can be great. Please be careful.
|
|
|
|
*/
|
|
|
|
int ipport_reservedhigh = IPPORT_RESERVED - 1; /* 1023 */
|
|
|
|
int ipport_reservedlow = 0;
|
|
|
|
|
2005-01-02 01:50:57 +00:00
|
|
|
/* Variables dealing with random ephemeral port allocation. */
|
|
|
|
int ipport_randomized = 1; /* user controlled via sysctl */
|
|
|
|
int ipport_randomcps = 10; /* user controlled via sysctl */
|
|
|
|
int ipport_randomtime = 45; /* user controlled via sysctl */
|
|
|
|
int ipport_stoprandom = 0; /* toggled by ipport_tick */
|
|
|
|
int ipport_tcpallocs;
|
|
|
|
int ipport_tcplastcount;
|
2004-04-22 08:32:14 +00:00
|
|
|
|
1996-08-12 14:05:54 +00:00
|
|
|
#define RANGECHK(var, min, max) \
|
|
|
|
if ((var) < (min)) { (var) = (min); } \
|
|
|
|
else if ((var) > (max)) { (var) = (max); }
|
|
|
|
|
|
|
|
static int
|
2000-07-04 11:25:35 +00:00
|
|
|
sysctl_net_ipport_check(SYSCTL_HANDLER_ARGS)
|
1996-08-12 14:05:54 +00:00
|
|
|
{
|
2004-04-06 10:59:11 +00:00
|
|
|
int error;
|
|
|
|
|
|
|
|
error = sysctl_handle_int(oidp, oidp->oid_arg1, oidp->oid_arg2, req);
|
|
|
|
if (error == 0) {
|
1996-08-12 14:05:54 +00:00
|
|
|
RANGECHK(ipport_lowfirstauto, 1, IPPORT_RESERVED - 1);
|
|
|
|
RANGECHK(ipport_lowlastauto, 1, IPPORT_RESERVED - 1);
|
2004-04-06 10:59:11 +00:00
|
|
|
RANGECHK(ipport_firstauto, IPPORT_RESERVED, IPPORT_MAX);
|
|
|
|
RANGECHK(ipport_lastauto, IPPORT_RESERVED, IPPORT_MAX);
|
|
|
|
RANGECHK(ipport_hifirstauto, IPPORT_RESERVED, IPPORT_MAX);
|
|
|
|
RANGECHK(ipport_hilastauto, IPPORT_RESERVED, IPPORT_MAX);
|
1996-08-12 14:05:54 +00:00
|
|
|
}
|
2004-04-06 10:59:11 +00:00
|
|
|
return (error);
|
1996-08-12 14:05:54 +00:00
|
|
|
}
|
1996-02-22 21:32:23 +00:00
|
|
|
|
1996-08-12 14:05:54 +00:00
|
|
|
#undef RANGECHK
|
1996-01-19 08:00:58 +00:00
|
|
|
|
1996-08-12 14:05:54 +00:00
|
|
|
SYSCTL_NODE(_net_inet_ip, IPPROTO_IP, portrange, CTLFLAG_RW, 0, "IP Ports");
|
|
|
|
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, lowfirst, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_lowfirstauto, 0, &sysctl_net_ipport_check, "I", "");
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, lowlast, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_lowlastauto, 0, &sysctl_net_ipport_check, "I", "");
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, first, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_firstauto, 0, &sysctl_net_ipport_check, "I", "");
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, last, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_lastauto, 0, &sysctl_net_ipport_check, "I", "");
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, hifirst, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_hifirstauto, 0, &sysctl_net_ipport_check, "I", "");
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, hilast, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_hilastauto, 0, &sysctl_net_ipport_check, "I", "");
|
The ancient and outdated concept of "privileged ports" in UNIX-type
OSes has probably caused more problems than it ever solved. Allow the
user to retire the old behavior by specifying their own privileged
range with,
net.inet.ip.portrange.reservedhigh default = IPPORT_RESERVED - 1
net.inet.ip.portrange.reservedlo default = 0
Now you can run that webserver without ever needing root at all. Or
just imagine, an ftpd that can really drop privileges, rather than
just set the euid, and still do PORT data transfers from 20/tcp.
Two edge cases to note,
# sysctl net.inet.ip.portrange.reservedhigh=0
Opens all ports to everyone, and,
# sysctl net.inet.ip.portrange.reservedhigh=65535
Locks all network activity to root only (which could actually have
been achieved before with ipfw(8), but is somewhat more
complicated).
For those who stick to the old religion that 0-1023 belong to root and
root alone, don't touch the knobs (or even lock them by raising
securelevel(8)), and nothing changes.
2003-02-21 05:28:27 +00:00
|
|
|
SYSCTL_INT(_net_inet_ip_portrange, OID_AUTO, reservedhigh,
|
|
|
|
CTLFLAG_RW|CTLFLAG_SECURE, &ipport_reservedhigh, 0, "");
|
|
|
|
SYSCTL_INT(_net_inet_ip_portrange, OID_AUTO, reservedlow,
|
|
|
|
CTLFLAG_RW|CTLFLAG_SECURE, &ipport_reservedlow, 0, "");
|
2005-03-23 09:26:38 +00:00
|
|
|
SYSCTL_INT(_net_inet_ip_portrange, OID_AUTO, randomized, CTLFLAG_RW,
|
|
|
|
&ipport_randomized, 0, "Enable random port allocation");
|
|
|
|
SYSCTL_INT(_net_inet_ip_portrange, OID_AUTO, randomcps, CTLFLAG_RW,
|
|
|
|
&ipport_randomcps, 0, "Maximum number of random port "
|
|
|
|
"allocations before switching to a sequental one");
|
|
|
|
SYSCTL_INT(_net_inet_ip_portrange, OID_AUTO, randomtime, CTLFLAG_RW,
|
|
|
|
&ipport_randomtime, 0, "Minimum time to keep sequental port "
|
|
|
|
"allocation before switching to a random one");
|
1995-11-14 20:34:56 +00:00
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
/*
|
|
|
|
* in_pcb.c: manage the Protocol Control Blocks.
|
|
|
|
*
|
2005-07-19 12:24:27 +00:00
|
|
|
* NOTE: It is assumed that most of these functions will be called with
|
|
|
|
* the pcbinfo lock held, and often, the inpcb lock held, as these utility
|
|
|
|
* functions often modify hash chains or addresses in pcbs.
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate a PCB and associate it with the socket.
|
2006-07-18 22:34:27 +00:00
|
|
|
* On success return with the PCB locked.
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
*/
|
1994-05-24 10:09:53 +00:00
|
|
|
int
|
2006-07-18 22:34:27 +00:00
|
|
|
in_pcballoc(struct socket *so, struct inpcbinfo *pcbinfo)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2006-01-22 01:16:25 +00:00
|
|
|
struct inpcb *inp;
|
2001-07-26 19:19:49 +00:00
|
|
|
int error;
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
|
2003-11-08 23:02:36 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(pcbinfo);
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
error = 0;
|
2006-07-18 22:34:27 +00:00
|
|
|
inp = uma_zalloc(pcbinfo->ipi_zone, M_NOWAIT);
|
1994-05-24 10:09:53 +00:00
|
|
|
if (inp == NULL)
|
|
|
|
return (ENOBUFS);
|
2006-07-18 22:34:27 +00:00
|
|
|
bzero(inp,inp_zero_size);
|
1995-04-09 01:29:31 +00:00
|
|
|
inp->inp_pcbinfo = pcbinfo;
|
1994-05-24 10:09:53 +00:00
|
|
|
inp->inp_socket = so;
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
#ifdef MAC
|
|
|
|
error = mac_init_inpcb(inp, M_NOWAIT);
|
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
2004-06-13 02:50:07 +00:00
|
|
|
SOCK_LOCK(so);
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
mac_create_inpcb_from_socket(so, inp);
|
2004-06-13 02:50:07 +00:00
|
|
|
SOCK_UNLOCK(so);
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
#endif
|
- cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.
Tested by: nork
Obtained from: KAME
2003-11-04 16:02:05 +00:00
|
|
|
#if defined(IPSEC) || defined(FAST_IPSEC)
|
|
|
|
#ifdef FAST_IPSEC
|
2001-07-26 19:19:49 +00:00
|
|
|
error = ipsec_init_policy(so, &inp->inp_sp);
|
- cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.
Tested by: nork
Obtained from: KAME
2003-11-04 16:02:05 +00:00
|
|
|
#else
|
|
|
|
error = ipsec_init_pcbpolicy(so, &inp->inp_sp);
|
|
|
|
#endif
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
2001-07-26 19:19:49 +00:00
|
|
|
#endif /*IPSEC*/
|
2000-04-02 03:49:25 +00:00
|
|
|
#if defined(INET6)
|
2003-02-19 22:32:43 +00:00
|
|
|
if (INP_SOCKAF(so) == AF_INET6) {
|
|
|
|
inp->inp_vflag |= INP_IPV6PROTO;
|
|
|
|
if (ip6_v6only)
|
|
|
|
inp->inp_flags |= IN6P_IPV6_V6ONLY;
|
|
|
|
}
|
2000-04-02 03:49:25 +00:00
|
|
|
#endif
|
1995-04-09 01:29:31 +00:00
|
|
|
LIST_INSERT_HEAD(pcbinfo->listhead, inp, inp_list);
|
1998-03-24 18:06:34 +00:00
|
|
|
pcbinfo->ipi_count++;
|
1994-05-24 10:09:53 +00:00
|
|
|
so->so_pcb = (caddr_t)inp;
|
2001-06-11 12:39:29 +00:00
|
|
|
#ifdef INET6
|
|
|
|
if (ip6_auto_flowlabel)
|
|
|
|
inp->inp_flags |= IN6P_AUTOFLOWLABEL;
|
|
|
|
#endif
|
2006-07-18 22:34:27 +00:00
|
|
|
INP_LOCK(inp);
|
|
|
|
inp->inp_gencnt = ++pcbinfo->ipi_gencnt;
|
|
|
|
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
#if defined(IPSEC) || defined(FAST_IPSEC) || defined(MAC)
|
|
|
|
out:
|
|
|
|
if (error != 0)
|
|
|
|
uma_zfree(pcbinfo->ipi_zone, inp);
|
|
|
|
#endif
|
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
int
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbbind(struct inpcb *inp, struct sockaddr *nam, struct ucred *cred)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2002-10-20 21:44:31 +00:00
|
|
|
int anonport, error;
|
|
|
|
|
2003-11-13 05:16:56 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
|
2003-11-08 23:02:36 +00:00
|
|
|
INP_LOCK_ASSERT(inp);
|
|
|
|
|
2002-10-20 21:44:31 +00:00
|
|
|
if (inp->inp_lport != 0 || inp->inp_laddr.s_addr != INADDR_ANY)
|
|
|
|
return (EINVAL);
|
|
|
|
anonport = inp->inp_lport == 0 && (nam == NULL ||
|
|
|
|
((struct sockaddr_in *)nam)->sin_port == 0);
|
|
|
|
error = in_pcbbind_setup(inp, nam, &inp->inp_laddr.s_addr,
|
2004-03-27 21:05:46 +00:00
|
|
|
&inp->inp_lport, cred);
|
2002-10-20 21:44:31 +00:00
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
if (in_pcbinshash(inp) != 0) {
|
|
|
|
inp->inp_laddr.s_addr = INADDR_ANY;
|
|
|
|
inp->inp_lport = 0;
|
|
|
|
return (EAGAIN);
|
|
|
|
}
|
|
|
|
if (anonport)
|
|
|
|
inp->inp_flags |= INP_ANONPORT;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set up a bind operation on a PCB, performing port allocation
|
|
|
|
* as required, but do not actually modify the PCB. Callers can
|
|
|
|
* either complete the bind by setting inp_laddr/inp_lport and
|
|
|
|
* calling in_pcbinshash(), or they can just use the resulting
|
|
|
|
* port and address to authorise the sending of a once-off packet.
|
|
|
|
*
|
|
|
|
* On error, the values of *laddrp and *lportp are not changed.
|
|
|
|
*/
|
|
|
|
int
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbbind_setup(struct inpcb *inp, struct sockaddr *nam, in_addr_t *laddrp,
|
|
|
|
u_short *lportp, struct ucred *cred)
|
2002-10-20 21:44:31 +00:00
|
|
|
{
|
|
|
|
struct socket *so = inp->inp_socket;
|
1996-10-30 06:13:10 +00:00
|
|
|
unsigned short *lastport;
|
1995-04-09 01:29:31 +00:00
|
|
|
struct sockaddr_in *sin;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
|
2002-10-20 21:44:31 +00:00
|
|
|
struct in_addr laddr;
|
1994-05-24 10:09:53 +00:00
|
|
|
u_short lport = 0;
|
2002-05-31 11:52:35 +00:00
|
|
|
int wild = 0, reuseport = (so->so_options & SO_REUSEPORT);
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
int error, prison = 0;
|
2005-01-02 01:50:57 +00:00
|
|
|
int dorandom;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2003-11-13 05:16:56 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(pcbinfo);
|
2003-11-08 23:02:36 +00:00
|
|
|
INP_LOCK_ASSERT(inp);
|
|
|
|
|
1996-12-13 21:29:07 +00:00
|
|
|
if (TAILQ_EMPTY(&in_ifaddrhead)) /* XXX broken! */
|
1994-05-24 10:09:53 +00:00
|
|
|
return (EADDRNOTAVAIL);
|
2002-10-20 21:44:31 +00:00
|
|
|
laddr.s_addr = *laddrp;
|
|
|
|
if (nam != NULL && laddr.s_addr != INADDR_ANY)
|
1994-05-24 10:09:53 +00:00
|
|
|
return (EINVAL);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if ((so->so_options & (SO_REUSEADDR|SO_REUSEPORT)) == 0)
|
2006-06-29 10:49:49 +00:00
|
|
|
wild = INPLOOKUP_WILDCARD;
|
1994-05-24 10:09:53 +00:00
|
|
|
if (nam) {
|
1997-08-16 19:16:27 +00:00
|
|
|
sin = (struct sockaddr_in *)nam;
|
|
|
|
if (nam->sa_len != sizeof (*sin))
|
1994-05-24 10:09:53 +00:00
|
|
|
return (EINVAL);
|
|
|
|
#ifdef notdef
|
|
|
|
/*
|
|
|
|
* We should check the family, but old programs
|
|
|
|
* incorrectly fail to initialize it.
|
|
|
|
*/
|
|
|
|
if (sin->sin_family != AF_INET)
|
|
|
|
return (EAFNOSUPPORT);
|
|
|
|
#endif
|
2000-09-17 13:35:42 +00:00
|
|
|
if (sin->sin_addr.s_addr != INADDR_ANY)
|
2004-03-27 21:05:46 +00:00
|
|
|
if (prison_ip(cred, 0, &sin->sin_addr.s_addr))
|
2000-09-17 13:35:42 +00:00
|
|
|
return(EINVAL);
|
2002-10-20 21:44:31 +00:00
|
|
|
if (sin->sin_port != *lportp) {
|
|
|
|
/* Don't allow the port to change. */
|
|
|
|
if (*lportp != 0)
|
|
|
|
return (EINVAL);
|
|
|
|
lport = sin->sin_port;
|
|
|
|
}
|
|
|
|
/* NB: lport is left as 0 if the port isn't being changed. */
|
1994-05-24 10:09:53 +00:00
|
|
|
if (IN_MULTICAST(ntohl(sin->sin_addr.s_addr))) {
|
|
|
|
/*
|
|
|
|
* Treat SO_REUSEADDR as SO_REUSEPORT for multicast;
|
|
|
|
* allow complete duplication of binding if
|
|
|
|
* SO_REUSEPORT is set, or if SO_REUSEADDR is set
|
|
|
|
* and a multicast address is bound on both
|
|
|
|
* new and duplicated sockets.
|
|
|
|
*/
|
|
|
|
if (so->so_options & SO_REUSEADDR)
|
|
|
|
reuseport = SO_REUSEADDR|SO_REUSEPORT;
|
|
|
|
} else if (sin->sin_addr.s_addr != INADDR_ANY) {
|
|
|
|
sin->sin_port = 0; /* yech... */
|
2001-11-06 00:48:01 +00:00
|
|
|
bzero(&sin->sin_zero, sizeof(sin->sin_zero));
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ifa_ifwithaddr((struct sockaddr *)sin) == 0)
|
|
|
|
return (EADDRNOTAVAIL);
|
|
|
|
}
|
2002-10-20 21:44:31 +00:00
|
|
|
laddr = sin->sin_addr;
|
1994-05-24 10:09:53 +00:00
|
|
|
if (lport) {
|
|
|
|
struct inpcb *t;
|
2006-04-04 12:26:07 +00:00
|
|
|
struct tcptw *tw;
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/* GROSS */
|
The ancient and outdated concept of "privileged ports" in UNIX-type
OSes has probably caused more problems than it ever solved. Allow the
user to retire the old behavior by specifying their own privileged
range with,
net.inet.ip.portrange.reservedhigh default = IPPORT_RESERVED - 1
net.inet.ip.portrange.reservedlo default = 0
Now you can run that webserver without ever needing root at all. Or
just imagine, an ftpd that can really drop privileges, rather than
just set the euid, and still do PORT data transfers from 20/tcp.
Two edge cases to note,
# sysctl net.inet.ip.portrange.reservedhigh=0
Opens all ports to everyone, and,
# sysctl net.inet.ip.portrange.reservedhigh=65535
Locks all network activity to root only (which could actually have
been achieved before with ipfw(8), but is somewhat more
complicated).
For those who stick to the old religion that 0-1023 belong to root and
root alone, don't touch the knobs (or even lock them by raising
securelevel(8)), and nothing changes.
2003-02-21 05:28:27 +00:00
|
|
|
if (ntohs(lport) <= ipport_reservedhigh &&
|
|
|
|
ntohs(lport) >= ipport_reservedlow &&
|
2004-07-26 07:24:04 +00:00
|
|
|
suser_cred(cred, SUSER_ALLOWJAIL))
|
1995-09-21 17:55:49 +00:00
|
|
|
return (EACCES);
|
2004-03-27 21:05:46 +00:00
|
|
|
if (jailed(cred))
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
prison = 1;
|
2006-06-27 11:35:53 +00:00
|
|
|
if (!IN_MULTICAST(ntohl(sin->sin_addr.s_addr)) &&
|
|
|
|
suser_cred(so->so_cred, SUSER_ALLOWJAIL) != 0) {
|
1998-03-01 19:39:29 +00:00
|
|
|
t = in_pcblookup_local(inp->inp_pcbinfo,
|
1999-11-22 02:45:11 +00:00
|
|
|
sin->sin_addr, lport,
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
prison ? 0 : INPLOOKUP_WILDCARD);
|
2003-02-19 22:32:43 +00:00
|
|
|
/*
|
|
|
|
* XXX
|
|
|
|
* This entire block sorely needs a rewrite.
|
|
|
|
*/
|
2002-05-31 11:52:35 +00:00
|
|
|
if (t &&
|
2004-05-20 06:35:02 +00:00
|
|
|
((t->inp_vflag & INP_TIMEWAIT) == 0) &&
|
|
|
|
(so->so_type != SOCK_STREAM ||
|
|
|
|
ntohl(t->inp_faddr.s_addr) == INADDR_ANY) &&
|
2002-05-31 11:52:35 +00:00
|
|
|
(ntohl(sin->sin_addr.s_addr) != INADDR_ANY ||
|
|
|
|
ntohl(t->inp_laddr.s_addr) != INADDR_ANY ||
|
|
|
|
(t->inp_socket->so_options &
|
|
|
|
SO_REUSEPORT) == 0) &&
|
|
|
|
(so->so_cred->cr_uid !=
|
2004-07-28 13:03:07 +00:00
|
|
|
t->inp_socket->so_cred->cr_uid))
|
2002-05-31 11:52:35 +00:00
|
|
|
return (EADDRINUSE);
|
1998-03-01 19:39:29 +00:00
|
|
|
}
|
2004-03-27 21:05:46 +00:00
|
|
|
if (prison && prison_ip(cred, 0, &sin->sin_addr.s_addr))
|
2001-02-28 09:38:48 +00:00
|
|
|
return (EADDRNOTAVAIL);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
t = in_pcblookup_local(pcbinfo, sin->sin_addr,
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
lport, prison ? 0 : wild);
|
2003-02-19 22:32:43 +00:00
|
|
|
if (t && (t->inp_vflag & INP_TIMEWAIT)) {
|
2006-04-04 12:26:07 +00:00
|
|
|
/*
|
|
|
|
* XXXRW: If an incpb has had its timewait
|
|
|
|
* state recycled, we treat the address as
|
|
|
|
* being in use (for now). This is better
|
|
|
|
* than a panic, but not desirable.
|
|
|
|
*/
|
|
|
|
tw = intotw(inp);
|
|
|
|
if (tw == NULL ||
|
|
|
|
(reuseport & tw->tw_so_options) == 0)
|
2003-02-19 22:32:43 +00:00
|
|
|
return (EADDRINUSE);
|
2006-04-04 12:26:07 +00:00
|
|
|
} else if (t &&
|
2002-05-31 11:52:35 +00:00
|
|
|
(reuseport & t->inp_socket->so_options) == 0) {
|
1999-12-07 17:39:16 +00:00
|
|
|
#if defined(INET6)
|
2002-05-31 11:52:35 +00:00
|
|
|
if (ntohl(sin->sin_addr.s_addr) !=
|
|
|
|
INADDR_ANY ||
|
|
|
|
ntohl(t->inp_laddr.s_addr) !=
|
|
|
|
INADDR_ANY ||
|
|
|
|
INP_SOCKAF(so) ==
|
|
|
|
INP_SOCKAF(t->inp_socket))
|
1999-12-07 17:39:16 +00:00
|
|
|
#endif /* defined(INET6) */
|
2002-05-31 11:52:35 +00:00
|
|
|
return (EADDRINUSE);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
}
|
2002-10-20 21:44:31 +00:00
|
|
|
if (*lportp != 0)
|
|
|
|
lport = *lportp;
|
1996-02-22 21:32:23 +00:00
|
|
|
if (lport == 0) {
|
2004-04-22 08:32:14 +00:00
|
|
|
u_short first, last;
|
2004-04-22 08:34:55 +00:00
|
|
|
int count;
|
1996-02-22 21:32:23 +00:00
|
|
|
|
2002-10-20 21:44:31 +00:00
|
|
|
if (laddr.s_addr != INADDR_ANY)
|
2004-03-27 21:05:46 +00:00
|
|
|
if (prison_ip(cred, 0, &laddr.s_addr))
|
2000-09-17 13:35:42 +00:00
|
|
|
return (EINVAL);
|
1996-08-23 18:59:07 +00:00
|
|
|
|
1996-02-22 21:32:23 +00:00
|
|
|
if (inp->inp_flags & INP_HIGHPORT) {
|
|
|
|
first = ipport_hifirstauto; /* sysctl */
|
|
|
|
last = ipport_hilastauto;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
lastport = &pcbinfo->lasthi;
|
1996-02-22 21:32:23 +00:00
|
|
|
} else if (inp->inp_flags & INP_LOWPORT) {
|
2004-07-26 07:24:04 +00:00
|
|
|
if ((error = suser_cred(cred, SUSER_ALLOWJAIL)) != 0)
|
1997-04-27 20:01:29 +00:00
|
|
|
return error;
|
1996-08-12 14:05:54 +00:00
|
|
|
first = ipport_lowfirstauto; /* 1023 */
|
|
|
|
last = ipport_lowlastauto; /* 600 */
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
lastport = &pcbinfo->lastlow;
|
1996-02-22 21:32:23 +00:00
|
|
|
} else {
|
|
|
|
first = ipport_firstauto; /* sysctl */
|
|
|
|
last = ipport_lastauto;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
lastport = &pcbinfo->lastport;
|
1996-02-22 21:32:23 +00:00
|
|
|
}
|
2005-01-02 01:50:57 +00:00
|
|
|
/*
|
|
|
|
* For UDP, use random port allocation as long as the user
|
|
|
|
* allows it. For TCP (and as of yet unknown) connections,
|
|
|
|
* use random port allocation only if the user allows it AND
|
2005-04-08 08:43:21 +00:00
|
|
|
* ipport_tick() allows it.
|
2005-01-02 01:50:57 +00:00
|
|
|
*/
|
|
|
|
if (ipport_randomized &&
|
|
|
|
(!ipport_stoprandom || pcbinfo == &udbinfo))
|
|
|
|
dorandom = 1;
|
|
|
|
else
|
|
|
|
dorandom = 0;
|
2005-04-08 08:42:10 +00:00
|
|
|
/*
|
|
|
|
* It makes no sense to do random port allocation if
|
|
|
|
* we have the only port available.
|
|
|
|
*/
|
|
|
|
if (first == last)
|
|
|
|
dorandom = 0;
|
2005-01-02 01:50:57 +00:00
|
|
|
/* Make sure to not include UDP packets in the count. */
|
|
|
|
if (pcbinfo != &udbinfo)
|
|
|
|
ipport_tcpallocs++;
|
1996-02-22 21:32:23 +00:00
|
|
|
/*
|
|
|
|
* Simple check to ensure all ports are not used up causing
|
|
|
|
* a deadlock here.
|
|
|
|
*
|
|
|
|
* We split the two cases (up and down) so that the direction
|
|
|
|
* is not being tested on each round of the loop.
|
|
|
|
*/
|
|
|
|
if (first > last) {
|
|
|
|
/*
|
|
|
|
* counting down
|
|
|
|
*/
|
2005-01-02 01:50:57 +00:00
|
|
|
if (dorandom)
|
2004-04-23 23:29:49 +00:00
|
|
|
*lastport = first -
|
|
|
|
(arc4random() % (first - last));
|
1996-02-22 21:32:23 +00:00
|
|
|
count = first - last;
|
2004-04-22 08:34:55 +00:00
|
|
|
|
1996-02-22 21:32:23 +00:00
|
|
|
do {
|
2004-04-22 08:32:14 +00:00
|
|
|
if (count-- < 0) /* completely used? */
|
2001-01-23 07:27:56 +00:00
|
|
|
return (EADDRNOTAVAIL);
|
1996-02-22 21:32:23 +00:00
|
|
|
--*lastport;
|
|
|
|
if (*lastport > first || *lastport < last)
|
|
|
|
*lastport = first;
|
|
|
|
lport = htons(*lastport);
|
2002-10-20 21:44:31 +00:00
|
|
|
} while (in_pcblookup_local(pcbinfo, laddr, lport,
|
|
|
|
wild));
|
1996-02-22 21:32:23 +00:00
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* counting up
|
|
|
|
*/
|
2005-01-02 01:50:57 +00:00
|
|
|
if (dorandom)
|
2004-04-23 23:29:49 +00:00
|
|
|
*lastport = first +
|
|
|
|
(arc4random() % (last - first));
|
1996-02-22 21:32:23 +00:00
|
|
|
count = last - first;
|
2004-04-22 08:34:55 +00:00
|
|
|
|
1996-02-22 21:32:23 +00:00
|
|
|
do {
|
2004-04-22 08:32:14 +00:00
|
|
|
if (count-- < 0) /* completely used? */
|
2001-01-23 07:27:56 +00:00
|
|
|
return (EADDRNOTAVAIL);
|
1996-02-22 21:32:23 +00:00
|
|
|
++*lastport;
|
|
|
|
if (*lastport < first || *lastport > last)
|
|
|
|
*lastport = first;
|
|
|
|
lport = htons(*lastport);
|
2002-10-20 21:44:31 +00:00
|
|
|
} while (in_pcblookup_local(pcbinfo, laddr, lport,
|
|
|
|
wild));
|
1996-02-22 21:32:23 +00:00
|
|
|
}
|
|
|
|
}
|
2004-03-27 21:05:46 +00:00
|
|
|
if (prison_ip(cred, 0, &laddr.s_addr))
|
2001-12-13 04:01:23 +00:00
|
|
|
return (EINVAL);
|
2002-10-20 21:44:31 +00:00
|
|
|
*laddrp = laddr.s_addr;
|
|
|
|
*lportp = lport;
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
1995-02-08 20:22:09 +00:00
|
|
|
/*
|
2002-10-21 13:55:50 +00:00
|
|
|
* Connect from a socket to a specified address.
|
|
|
|
* Both address and port must be specified in argument sin.
|
|
|
|
* If don't have a local address for this socket yet,
|
|
|
|
* then pick one.
|
1995-02-08 20:22:09 +00:00
|
|
|
*/
|
2002-10-21 13:55:50 +00:00
|
|
|
int
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbconnect(struct inpcb *inp, struct sockaddr *nam, struct ucred *cred)
|
2002-10-21 13:55:50 +00:00
|
|
|
{
|
|
|
|
u_short lport, fport;
|
|
|
|
in_addr_t laddr, faddr;
|
|
|
|
int anonport, error;
|
|
|
|
|
2004-08-11 04:35:20 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
|
|
|
|
INP_LOCK_ASSERT(inp);
|
|
|
|
|
2002-10-21 13:55:50 +00:00
|
|
|
lport = inp->inp_lport;
|
|
|
|
laddr = inp->inp_laddr.s_addr;
|
|
|
|
anonport = (lport == 0);
|
|
|
|
error = in_pcbconnect_setup(inp, nam, &laddr, &lport, &faddr, &fport,
|
2004-03-27 21:05:46 +00:00
|
|
|
NULL, cred);
|
2002-10-21 13:55:50 +00:00
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
/* Do the initial binding of the local address if required. */
|
|
|
|
if (inp->inp_laddr.s_addr == INADDR_ANY && inp->inp_lport == 0) {
|
|
|
|
inp->inp_lport = lport;
|
|
|
|
inp->inp_laddr.s_addr = laddr;
|
|
|
|
if (in_pcbinshash(inp) != 0) {
|
|
|
|
inp->inp_laddr.s_addr = INADDR_ANY;
|
|
|
|
inp->inp_lport = 0;
|
|
|
|
return (EAGAIN);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Commit the remaining changes. */
|
|
|
|
inp->inp_lport = lport;
|
|
|
|
inp->inp_laddr.s_addr = laddr;
|
|
|
|
inp->inp_faddr.s_addr = faddr;
|
|
|
|
inp->inp_fport = fport;
|
|
|
|
in_pcbrehash(inp);
|
- cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.
Tested by: nork
Obtained from: KAME
2003-11-04 16:02:05 +00:00
|
|
|
#ifdef IPSEC
|
|
|
|
if (inp->inp_socket->so_type == SOCK_STREAM)
|
|
|
|
ipsec_pcbconn(inp->inp_sp);
|
|
|
|
#endif
|
2002-10-21 13:55:50 +00:00
|
|
|
if (anonport)
|
|
|
|
inp->inp_flags |= INP_ANONPORT;
|
|
|
|
return (0);
|
|
|
|
}
|
1995-02-08 20:22:09 +00:00
|
|
|
|
2002-10-21 13:55:50 +00:00
|
|
|
/*
|
|
|
|
* Set up for a connect from a socket to the specified address.
|
|
|
|
* On entry, *laddrp and *lportp should contain the current local
|
|
|
|
* address and port for the PCB; these are updated to the values
|
|
|
|
* that should be placed in inp_laddr and inp_lport to complete
|
|
|
|
* the connect.
|
|
|
|
*
|
|
|
|
* On success, *faddrp and *fportp will be set to the remote address
|
|
|
|
* and port. These are not updated in the error case.
|
|
|
|
*
|
|
|
|
* If the operation fails because the connection already exists,
|
|
|
|
* *oinpp will be set to the PCB of that connection so that the
|
|
|
|
* caller can decide to override it. In all other cases, *oinpp
|
|
|
|
* is set to NULL.
|
|
|
|
*/
|
1995-02-08 20:22:09 +00:00
|
|
|
int
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbconnect_setup(struct inpcb *inp, struct sockaddr *nam,
|
|
|
|
in_addr_t *laddrp, u_short *lportp, in_addr_t *faddrp, u_short *fportp,
|
|
|
|
struct inpcb **oinpp, struct ucred *cred)
|
1995-02-08 20:22:09 +00:00
|
|
|
{
|
2002-10-21 13:55:50 +00:00
|
|
|
struct sockaddr_in *sin = (struct sockaddr_in *)nam;
|
1994-05-24 10:09:53 +00:00
|
|
|
struct in_ifaddr *ia;
|
2002-10-21 13:55:50 +00:00
|
|
|
struct sockaddr_in sa;
|
2004-03-27 21:05:46 +00:00
|
|
|
struct ucred *socred;
|
2002-10-21 13:55:50 +00:00
|
|
|
struct inpcb *oinp;
|
|
|
|
struct in_addr laddr, faddr;
|
|
|
|
u_short lport, fport;
|
|
|
|
int error;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2004-08-11 04:35:20 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
|
|
|
|
INP_LOCK_ASSERT(inp);
|
|
|
|
|
2002-10-21 13:55:50 +00:00
|
|
|
if (oinpp != NULL)
|
|
|
|
*oinpp = NULL;
|
1997-08-16 19:16:27 +00:00
|
|
|
if (nam->sa_len != sizeof (*sin))
|
1994-05-24 10:09:53 +00:00
|
|
|
return (EINVAL);
|
|
|
|
if (sin->sin_family != AF_INET)
|
|
|
|
return (EAFNOSUPPORT);
|
|
|
|
if (sin->sin_port == 0)
|
|
|
|
return (EADDRNOTAVAIL);
|
2002-10-21 13:55:50 +00:00
|
|
|
laddr.s_addr = *laddrp;
|
|
|
|
lport = *lportp;
|
|
|
|
faddr = sin->sin_addr;
|
|
|
|
fport = sin->sin_port;
|
2004-03-27 21:05:46 +00:00
|
|
|
socred = inp->inp_socket->so_cred;
|
|
|
|
if (laddr.s_addr == INADDR_ANY && jailed(socred)) {
|
2002-10-21 13:55:50 +00:00
|
|
|
bzero(&sa, sizeof(sa));
|
2004-03-27 21:05:46 +00:00
|
|
|
sa.sin_addr.s_addr = htonl(prison_getip(socred));
|
2002-10-21 13:55:50 +00:00
|
|
|
sa.sin_len = sizeof(sa);
|
|
|
|
sa.sin_family = AF_INET;
|
|
|
|
error = in_pcbbind_setup(inp, (struct sockaddr *)&sa,
|
2004-03-27 21:05:46 +00:00
|
|
|
&laddr.s_addr, &lport, cred);
|
2002-10-21 13:55:50 +00:00
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
}
|
1996-12-13 21:29:07 +00:00
|
|
|
if (!TAILQ_EMPTY(&in_ifaddrhead)) {
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* If the destination address is INADDR_ANY,
|
|
|
|
* use the primary local address.
|
|
|
|
* If the supplied address is INADDR_BROADCAST,
|
|
|
|
* and the primary interface supports broadcast,
|
|
|
|
* choose the broadcast address for that interface.
|
|
|
|
*/
|
2002-10-21 13:55:50 +00:00
|
|
|
if (faddr.s_addr == INADDR_ANY)
|
|
|
|
faddr = IA_SIN(TAILQ_FIRST(&in_ifaddrhead))->sin_addr;
|
|
|
|
else if (faddr.s_addr == (u_long)INADDR_BROADCAST &&
|
|
|
|
(TAILQ_FIRST(&in_ifaddrhead)->ia_ifp->if_flags &
|
|
|
|
IFF_BROADCAST))
|
|
|
|
faddr = satosin(&TAILQ_FIRST(
|
|
|
|
&in_ifaddrhead)->ia_broadaddr)->sin_addr;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2002-10-21 13:55:50 +00:00
|
|
|
if (laddr.s_addr == INADDR_ANY) {
|
1994-05-24 10:09:53 +00:00
|
|
|
ia = (struct in_ifaddr *)0;
|
1995-05-30 08:16:23 +00:00
|
|
|
/*
|
2003-11-20 20:07:39 +00:00
|
|
|
* If route is known our src addr is taken from the i/f,
|
|
|
|
* else punt.
|
2006-02-16 15:45:28 +00:00
|
|
|
*
|
|
|
|
* Find out route to destination
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
2006-02-16 15:45:28 +00:00
|
|
|
if ((inp->inp_socket->so_options & SO_DONTROUTE) == 0)
|
|
|
|
ia = ip_rtaddr(faddr);
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2006-02-16 15:45:28 +00:00
|
|
|
* If we found a route, use the address corresponding to
|
|
|
|
* the outgoing interface.
|
|
|
|
*
|
|
|
|
* Otherwise assume faddr is reachable on a directly connected
|
|
|
|
* network and try to find a corresponding interface to take
|
|
|
|
* the source address from.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
if (ia == 0) {
|
2002-10-21 13:55:50 +00:00
|
|
|
bzero(&sa, sizeof(sa));
|
|
|
|
sa.sin_addr = faddr;
|
|
|
|
sa.sin_len = sizeof(sa);
|
|
|
|
sa.sin_family = AF_INET;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2002-10-21 13:55:50 +00:00
|
|
|
ia = ifatoia(ifa_ifwithdstaddr(sintosa(&sa)));
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ia == 0)
|
2002-10-21 13:55:50 +00:00
|
|
|
ia = ifatoia(ifa_ifwithnet(sintosa(&sa)));
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ia == 0)
|
2004-06-16 10:02:36 +00:00
|
|
|
return (ENETUNREACH);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
/*
|
|
|
|
* If the destination address is multicast and an outgoing
|
|
|
|
* interface has been set as a multicast option, use the
|
|
|
|
* address of that interface as our source address.
|
|
|
|
*/
|
2002-10-21 13:55:50 +00:00
|
|
|
if (IN_MULTICAST(ntohl(faddr.s_addr)) &&
|
1994-05-24 10:09:53 +00:00
|
|
|
inp->inp_moptions != NULL) {
|
|
|
|
struct ip_moptions *imo;
|
|
|
|
struct ifnet *ifp;
|
|
|
|
|
|
|
|
imo = inp->inp_moptions;
|
|
|
|
if (imo->imo_multicast_ifp != NULL) {
|
|
|
|
ifp = imo->imo_multicast_ifp;
|
2001-02-04 16:08:18 +00:00
|
|
|
TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link)
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ia->ia_ifp == ifp)
|
|
|
|
break;
|
|
|
|
if (ia == 0)
|
|
|
|
return (EADDRNOTAVAIL);
|
|
|
|
}
|
|
|
|
}
|
2002-10-21 13:55:50 +00:00
|
|
|
laddr = ia->ia_addr.sin_addr;
|
1995-02-08 20:22:09 +00:00
|
|
|
}
|
|
|
|
|
2002-10-21 13:55:50 +00:00
|
|
|
oinp = in_pcblookup_hash(inp->inp_pcbinfo, faddr, fport, laddr, lport,
|
|
|
|
0, NULL);
|
|
|
|
if (oinp != NULL) {
|
|
|
|
if (oinpp != NULL)
|
|
|
|
*oinpp = oinp;
|
1994-05-24 10:09:53 +00:00
|
|
|
return (EADDRINUSE);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
}
|
2002-10-21 13:55:50 +00:00
|
|
|
if (lport == 0) {
|
2004-03-27 21:05:46 +00:00
|
|
|
error = in_pcbbind_setup(inp, NULL, &laddr.s_addr, &lport,
|
|
|
|
cred);
|
2002-10-21 13:55:50 +00:00
|
|
|
if (error)
|
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2002-10-21 13:55:50 +00:00
|
|
|
*laddrp = laddr.s_addr;
|
|
|
|
*lportp = lport;
|
|
|
|
*faddrp = faddr.s_addr;
|
|
|
|
*fportp = fport;
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
void
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbdisconnect(struct inpcb *inp)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2005-06-01 11:39:42 +00:00
|
|
|
|
2005-06-01 11:43:39 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
|
2003-11-08 23:02:36 +00:00
|
|
|
INP_LOCK_ASSERT(inp);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
inp->inp_faddr.s_addr = INADDR_ANY;
|
|
|
|
inp->inp_fport = 0;
|
1995-04-09 01:29:31 +00:00
|
|
|
in_pcbrehash(inp);
|
- cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.
Tested by: nork
Obtained from: KAME
2003-11-04 16:02:05 +00:00
|
|
|
#ifdef IPSEC
|
|
|
|
ipsec_pcbdisconn(inp->inp_sp);
|
|
|
|
#endif
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
2006-04-01 16:04:42 +00:00
|
|
|
/*
|
|
|
|
* In the old world order, in_pcbdetach() served two functions: to detach the
|
|
|
|
* pcb from the socket/potentially free the socket, and to free the pcb
|
|
|
|
* itself. In the new world order, the protocol code is responsible for
|
|
|
|
* managing the relationship with the socket, and this code simply frees the
|
|
|
|
* pcb.
|
|
|
|
*/
|
1994-05-25 09:21:21 +00:00
|
|
|
void
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbdetach(struct inpcb *inp)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2006-04-01 16:04:42 +00:00
|
|
|
|
|
|
|
KASSERT(inp->inp_socket != NULL, ("in_pcbdetach: inp_socket == NULL"));
|
|
|
|
inp->inp_socket->so_pcb = NULL;
|
|
|
|
inp->inp_socket = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
in_pcbfree(struct inpcb *inp)
|
|
|
|
{
|
1998-03-24 18:06:34 +00:00
|
|
|
struct inpcbinfo *ipi = inp->inp_pcbinfo;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2006-04-01 16:04:42 +00:00
|
|
|
KASSERT(inp->inp_socket == NULL, ("in_pcbfree: inp_socket != NULL"));
|
2005-06-01 11:43:39 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(ipi);
|
2003-11-08 23:02:36 +00:00
|
|
|
INP_LOCK_ASSERT(inp);
|
|
|
|
|
- cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.
Tested by: nork
Obtained from: KAME
2003-11-04 16:02:05 +00:00
|
|
|
#if defined(IPSEC) || defined(FAST_IPSEC)
|
2000-07-04 16:35:15 +00:00
|
|
|
ipsec4_delete_pcbpolicy(inp);
|
1999-12-07 17:39:16 +00:00
|
|
|
#endif /*IPSEC*/
|
1998-03-24 18:06:34 +00:00
|
|
|
inp->inp_gencnt = ++ipi->ipi_gencnt;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
in_pcbremlists(inp);
|
1994-05-24 10:09:53 +00:00
|
|
|
if (inp->inp_options)
|
|
|
|
(void)m_free(inp->inp_options);
|
|
|
|
ip_freemoptions(inp->inp_moptions);
|
1999-12-07 17:39:16 +00:00
|
|
|
inp->inp_vflag = 0;
|
2006-07-18 22:34:27 +00:00
|
|
|
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
#ifdef MAC
|
|
|
|
mac_destroy_inpcb(inp);
|
|
|
|
#endif
|
2006-07-18 22:34:27 +00:00
|
|
|
INP_UNLOCK(inp);
|
2002-03-20 05:48:55 +00:00
|
|
|
uma_zfree(ipi->ipi_zone, inp);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
2006-04-25 11:17:35 +00:00
|
|
|
/*
|
|
|
|
* TCP needs to maintain its inpcb structure after the TCP connection has
|
|
|
|
* been torn down. However, it must be disconnected from the inpcb hashes as
|
|
|
|
* it must not prevent binding of future connections to the same port/ip
|
|
|
|
* combination by other inpcbs.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
in_pcbdrop(struct inpcb *inp)
|
|
|
|
{
|
|
|
|
|
2006-04-25 23:23:13 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
|
2006-04-25 11:17:35 +00:00
|
|
|
INP_LOCK_ASSERT(inp);
|
|
|
|
|
|
|
|
inp->inp_vflag |= INP_DROPPED;
|
|
|
|
if (inp->inp_lport) {
|
|
|
|
struct inpcbport *phd = inp->inp_phd;
|
|
|
|
|
|
|
|
LIST_REMOVE(inp, inp_hash);
|
|
|
|
LIST_REMOVE(inp, inp_portlist);
|
|
|
|
if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
|
|
|
|
LIST_REMOVE(phd, phd_hash);
|
|
|
|
free(phd, M_PCB);
|
|
|
|
}
|
|
|
|
inp->inp_lport = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2002-08-21 11:57:12 +00:00
|
|
|
struct sockaddr *
|
2006-01-22 01:16:25 +00:00
|
|
|
in_sockaddr(in_port_t port, struct in_addr *addr_p)
|
2002-08-21 11:57:12 +00:00
|
|
|
{
|
|
|
|
struct sockaddr_in *sin;
|
|
|
|
|
|
|
|
MALLOC(sin, struct sockaddr_in *, sizeof *sin, M_SONAME,
|
2003-02-19 05:47:46 +00:00
|
|
|
M_WAITOK | M_ZERO);
|
2002-08-21 11:57:12 +00:00
|
|
|
sin->sin_family = AF_INET;
|
|
|
|
sin->sin_len = sizeof(*sin);
|
|
|
|
sin->sin_addr = *addr_p;
|
|
|
|
sin->sin_port = port;
|
|
|
|
|
|
|
|
return (struct sockaddr *)sin;
|
|
|
|
}
|
|
|
|
|
1997-02-18 20:46:36 +00:00
|
|
|
/*
|
2002-06-10 20:05:46 +00:00
|
|
|
* The wrapper function will pass down the pcbinfo for this function to lock.
|
|
|
|
* The socket must have a valid
|
1997-02-18 20:46:36 +00:00
|
|
|
* (i.e., non-nil) PCB, but it should be impossible to get an invalid one
|
|
|
|
* except through a kernel programming error, so it is acceptable to panic
|
1997-08-16 19:16:27 +00:00
|
|
|
* (or in this case trap) if the PCB is invalid. (Actually, we don't trap
|
|
|
|
* because there actually /is/ a programming error somewhere... XXX)
|
1997-02-18 20:46:36 +00:00
|
|
|
*/
|
|
|
|
int
|
2006-01-22 01:16:25 +00:00
|
|
|
in_setsockaddr(struct socket *so, struct sockaddr **nam,
|
|
|
|
struct inpcbinfo *pcbinfo)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2006-01-22 01:16:25 +00:00
|
|
|
struct inpcb *inp;
|
2002-08-21 11:57:12 +00:00
|
|
|
struct in_addr addr;
|
|
|
|
in_port_t port;
|
1997-12-25 06:57:36 +00:00
|
|
|
|
1997-05-19 01:28:39 +00:00
|
|
|
inp = sotoinpcb(so);
|
2006-04-22 19:10:02 +00:00
|
|
|
KASSERT(inp != NULL, ("in_setsockaddr: inp == NULL"));
|
|
|
|
|
2002-06-10 20:05:46 +00:00
|
|
|
INP_LOCK(inp);
|
2002-08-21 11:57:12 +00:00
|
|
|
port = inp->inp_lport;
|
|
|
|
addr = inp->inp_laddr;
|
2002-06-10 20:05:46 +00:00
|
|
|
INP_UNLOCK(inp);
|
1997-12-25 06:57:36 +00:00
|
|
|
|
2002-08-21 11:57:12 +00:00
|
|
|
*nam = in_sockaddr(port, &addr);
|
1997-02-18 20:46:36 +00:00
|
|
|
return 0;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
2002-06-10 20:05:46 +00:00
|
|
|
/*
|
|
|
|
* The wrapper function will pass down the pcbinfo for this function to lock.
|
|
|
|
*/
|
1997-02-18 20:46:36 +00:00
|
|
|
int
|
2006-01-22 01:16:25 +00:00
|
|
|
in_setpeeraddr(struct socket *so, struct sockaddr **nam,
|
|
|
|
struct inpcbinfo *pcbinfo)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2006-01-22 01:16:25 +00:00
|
|
|
struct inpcb *inp;
|
2002-08-21 11:57:12 +00:00
|
|
|
struct in_addr addr;
|
|
|
|
in_port_t port;
|
1997-12-25 06:57:36 +00:00
|
|
|
|
1997-05-19 01:28:39 +00:00
|
|
|
inp = sotoinpcb(so);
|
2006-04-22 19:10:02 +00:00
|
|
|
KASSERT(inp != NULL, ("in_setpeeraddr: inp == NULL"));
|
|
|
|
|
2002-06-10 20:05:46 +00:00
|
|
|
INP_LOCK(inp);
|
2002-08-21 11:57:12 +00:00
|
|
|
port = inp->inp_fport;
|
|
|
|
addr = inp->inp_faddr;
|
2002-06-10 20:05:46 +00:00
|
|
|
INP_UNLOCK(inp);
|
1997-12-25 06:57:36 +00:00
|
|
|
|
2002-08-21 11:57:12 +00:00
|
|
|
*nam = in_sockaddr(port, &addr);
|
1997-02-18 20:46:36 +00:00
|
|
|
return 0;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
void
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbnotifyall(struct inpcbinfo *pcbinfo, struct in_addr faddr, int errno,
|
|
|
|
struct inpcb *(*notify)(struct inpcb *, int))
|
2001-02-22 21:23:45 +00:00
|
|
|
{
|
2001-02-26 21:19:47 +00:00
|
|
|
struct inpcb *inp, *ninp;
|
2002-06-10 20:05:46 +00:00
|
|
|
struct inpcbhead *head;
|
2001-02-22 21:23:45 +00:00
|
|
|
|
2003-02-12 23:55:07 +00:00
|
|
|
INP_INFO_WLOCK(pcbinfo);
|
2002-06-10 20:05:46 +00:00
|
|
|
head = pcbinfo->listhead;
|
2001-02-26 21:19:47 +00:00
|
|
|
for (inp = LIST_FIRST(head); inp != NULL; inp = ninp) {
|
2002-06-10 20:05:46 +00:00
|
|
|
INP_LOCK(inp);
|
2001-02-26 21:19:47 +00:00
|
|
|
ninp = LIST_NEXT(inp, inp_list);
|
2001-02-22 21:23:45 +00:00
|
|
|
#ifdef INET6
|
2002-06-10 20:05:46 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV4) == 0) {
|
2003-02-12 23:55:07 +00:00
|
|
|
INP_UNLOCK(inp);
|
2001-02-22 21:23:45 +00:00
|
|
|
continue;
|
2002-06-10 20:05:46 +00:00
|
|
|
}
|
2001-02-22 21:23:45 +00:00
|
|
|
#endif
|
|
|
|
if (inp->inp_faddr.s_addr != faddr.s_addr ||
|
2002-06-10 20:05:46 +00:00
|
|
|
inp->inp_socket == NULL) {
|
2003-02-12 23:55:07 +00:00
|
|
|
INP_UNLOCK(inp);
|
|
|
|
continue;
|
2002-06-10 20:05:46 +00:00
|
|
|
}
|
2003-02-12 23:55:07 +00:00
|
|
|
if ((*notify)(inp, errno))
|
|
|
|
INP_UNLOCK(inp);
|
2001-02-22 21:23:45 +00:00
|
|
|
}
|
2003-02-12 23:55:07 +00:00
|
|
|
INP_INFO_WUNLOCK(pcbinfo);
|
2001-02-22 21:23:45 +00:00
|
|
|
}
|
|
|
|
|
2001-08-04 17:10:14 +00:00
|
|
|
void
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbpurgeif0(struct inpcbinfo *pcbinfo, struct ifnet *ifp)
|
2001-08-04 17:10:14 +00:00
|
|
|
{
|
|
|
|
struct inpcb *inp;
|
|
|
|
struct ip_moptions *imo;
|
|
|
|
int i, gap;
|
|
|
|
|
2002-06-10 20:05:46 +00:00
|
|
|
INP_INFO_RLOCK(pcbinfo);
|
2002-06-12 03:08:08 +00:00
|
|
|
LIST_FOREACH(inp, pcbinfo->listhead, inp_list) {
|
2002-06-10 20:05:46 +00:00
|
|
|
INP_LOCK(inp);
|
2001-08-04 17:10:14 +00:00
|
|
|
imo = inp->inp_moptions;
|
|
|
|
if ((inp->inp_vflag & INP_IPV4) &&
|
|
|
|
imo != NULL) {
|
|
|
|
/*
|
|
|
|
* Unselect the outgoing interface if it is being
|
|
|
|
* detached.
|
|
|
|
*/
|
|
|
|
if (imo->imo_multicast_ifp == ifp)
|
|
|
|
imo->imo_multicast_ifp = NULL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Drop multicast group membership if we joined
|
|
|
|
* through the interface being detached.
|
|
|
|
*/
|
|
|
|
for (i = 0, gap = 0; i < imo->imo_num_memberships;
|
|
|
|
i++) {
|
|
|
|
if (imo->imo_membership[i]->inm_ifp == ifp) {
|
|
|
|
in_delmulti(imo->imo_membership[i]);
|
|
|
|
gap++;
|
|
|
|
} else if (gap != 0)
|
|
|
|
imo->imo_membership[i - gap] =
|
|
|
|
imo->imo_membership[i];
|
|
|
|
}
|
|
|
|
imo->imo_num_memberships -= gap;
|
|
|
|
}
|
2002-06-10 20:05:46 +00:00
|
|
|
INP_UNLOCK(inp);
|
2001-08-04 17:10:14 +00:00
|
|
|
}
|
2002-06-12 03:08:08 +00:00
|
|
|
INP_INFO_RUNLOCK(pcbinfo);
|
2001-08-04 17:10:14 +00:00
|
|
|
}
|
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
/*
|
|
|
|
* Lookup a PCB based on the local address and port.
|
|
|
|
*/
|
2006-02-04 07:59:17 +00:00
|
|
|
#define INP_LOOKUP_MAPPED_PCB_COST 3
|
1994-05-24 10:09:53 +00:00
|
|
|
struct inpcb *
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcblookup_local(struct inpcbinfo *pcbinfo, struct in_addr laddr,
|
|
|
|
u_int lport_arg, int wild_okay)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2006-01-22 01:16:25 +00:00
|
|
|
struct inpcb *inp;
|
2006-04-03 13:33:55 +00:00
|
|
|
struct tcptw *tw;
|
2006-02-04 07:59:17 +00:00
|
|
|
#ifdef INET6
|
|
|
|
int matchwild = 3 + INP_LOOKUP_MAPPED_PCB_COST;
|
|
|
|
#else
|
|
|
|
int matchwild = 3;
|
|
|
|
#endif
|
|
|
|
int wildcard;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
u_short lport = lport_arg;
|
1995-04-10 08:52:45 +00:00
|
|
|
|
2003-11-13 05:16:56 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(pcbinfo);
|
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (!wild_okay) {
|
|
|
|
struct inpcbhead *head;
|
|
|
|
/*
|
|
|
|
* Look for an unconnected (wildcard foreign addr) PCB that
|
|
|
|
* matches the local address and port we're looking for.
|
|
|
|
*/
|
|
|
|
head = &pcbinfo->hashbase[INP_PCBHASH(INADDR_ANY, lport, 0, pcbinfo->hashmask)];
|
2001-02-04 13:13:25 +00:00
|
|
|
LIST_FOREACH(inp, head, inp_hash) {
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
1999-12-21 11:14:12 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV4) == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
|
|
|
#endif
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (inp->inp_faddr.s_addr == INADDR_ANY &&
|
|
|
|
inp->inp_laddr.s_addr == laddr.s_addr &&
|
|
|
|
inp->inp_lport == lport) {
|
|
|
|
/*
|
|
|
|
* Found.
|
|
|
|
*/
|
|
|
|
return (inp);
|
|
|
|
}
|
1995-04-09 01:29:31 +00:00
|
|
|
}
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
/*
|
|
|
|
* Not found.
|
|
|
|
*/
|
|
|
|
return (NULL);
|
|
|
|
} else {
|
|
|
|
struct inpcbporthead *porthash;
|
|
|
|
struct inpcbport *phd;
|
|
|
|
struct inpcb *match = NULL;
|
|
|
|
/*
|
|
|
|
* Best fit PCB lookup.
|
|
|
|
*
|
|
|
|
* First see if this local port is in use by looking on the
|
|
|
|
* port hash list.
|
|
|
|
*/
|
2003-11-01 07:30:08 +00:00
|
|
|
retrylookup:
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
porthash = &pcbinfo->porthashbase[INP_PCBPORTHASH(lport,
|
|
|
|
pcbinfo->porthashmask)];
|
2001-02-04 13:13:25 +00:00
|
|
|
LIST_FOREACH(phd, porthash, phd_hash) {
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (phd->phd_port == lport)
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
}
|
|
|
|
if (phd != NULL) {
|
|
|
|
/*
|
|
|
|
* Port is in use by one or more PCBs. Look for best
|
|
|
|
* fit.
|
|
|
|
*/
|
2001-02-04 16:08:18 +00:00
|
|
|
LIST_FOREACH(inp, &phd->phd_pcblist, inp_portlist) {
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
wildcard = 0;
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
1999-12-21 11:14:12 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV4) == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
2006-02-04 07:59:17 +00:00
|
|
|
/*
|
|
|
|
* We never select the PCB that has
|
|
|
|
* INP_IPV6 flag and is bound to :: if
|
|
|
|
* we have another PCB which is bound
|
|
|
|
* to 0.0.0.0. If a PCB has the
|
|
|
|
* INP_IPV6 flag, then we set its cost
|
|
|
|
* higher than IPv4 only PCBs.
|
|
|
|
*
|
|
|
|
* Note that the case only happens
|
|
|
|
* when a socket is bound to ::, under
|
|
|
|
* the condition that the use of the
|
|
|
|
* mapped address is allowed.
|
|
|
|
*/
|
|
|
|
if ((inp->inp_vflag & INP_IPV6) != 0)
|
|
|
|
wildcard += INP_LOOKUP_MAPPED_PCB_COST;
|
1999-12-07 17:39:16 +00:00
|
|
|
#endif
|
2003-11-01 07:30:08 +00:00
|
|
|
/*
|
|
|
|
* Clean out old time_wait sockets if they
|
|
|
|
* are clogging up needed local ports.
|
|
|
|
*/
|
|
|
|
if ((inp->inp_vflag & INP_TIMEWAIT) != 0) {
|
2006-04-03 13:33:55 +00:00
|
|
|
tw = intotw(inp);
|
2006-04-04 12:26:07 +00:00
|
|
|
if (tw != NULL &&
|
|
|
|
tcp_twrecycleable(tw)) {
|
2003-11-13 05:18:23 +00:00
|
|
|
INP_LOCK(inp);
|
2006-04-03 13:33:55 +00:00
|
|
|
tcp_twclose(tw, 0);
|
2003-11-01 07:30:08 +00:00
|
|
|
match = NULL;
|
|
|
|
goto retrylookup;
|
|
|
|
}
|
|
|
|
}
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (inp->inp_faddr.s_addr != INADDR_ANY)
|
|
|
|
wildcard++;
|
|
|
|
if (inp->inp_laddr.s_addr != INADDR_ANY) {
|
|
|
|
if (laddr.s_addr == INADDR_ANY)
|
|
|
|
wildcard++;
|
|
|
|
else if (inp->inp_laddr.s_addr != laddr.s_addr)
|
|
|
|
continue;
|
|
|
|
} else {
|
|
|
|
if (laddr.s_addr != INADDR_ANY)
|
|
|
|
wildcard++;
|
|
|
|
}
|
|
|
|
if (wildcard < matchwild) {
|
|
|
|
match = inp;
|
|
|
|
matchwild = wildcard;
|
|
|
|
if (matchwild == 0) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
1995-03-02 19:29:42 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
return (match);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
}
|
2006-02-04 07:59:17 +00:00
|
|
|
#undef INP_LOOKUP_MAPPED_PCB_COST
|
1995-04-09 01:29:31 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Lookup PCB in hash list.
|
|
|
|
*/
|
|
|
|
struct inpcb *
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr,
|
|
|
|
u_int fport_arg, struct in_addr laddr, u_int lport_arg, int wildcard,
|
|
|
|
struct ifnet *ifp)
|
1995-04-09 01:29:31 +00:00
|
|
|
{
|
|
|
|
struct inpcbhead *head;
|
2006-01-22 01:16:25 +00:00
|
|
|
struct inpcb *inp;
|
1995-04-09 01:29:31 +00:00
|
|
|
u_short fport = fport_arg, lport = lport_arg;
|
|
|
|
|
2003-11-08 23:02:36 +00:00
|
|
|
INP_INFO_RLOCK_ASSERT(pcbinfo);
|
2006-04-22 19:15:20 +00:00
|
|
|
|
1995-04-09 01:29:31 +00:00
|
|
|
/*
|
|
|
|
* First look for an exact match.
|
|
|
|
*/
|
1997-03-03 09:23:37 +00:00
|
|
|
head = &pcbinfo->hashbase[INP_PCBHASH(faddr.s_addr, lport, fport, pcbinfo->hashmask)];
|
2001-02-04 13:13:25 +00:00
|
|
|
LIST_FOREACH(inp, head, inp_hash) {
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
1999-12-21 11:14:12 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV4) == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
|
|
|
#endif
|
1996-10-07 19:06:12 +00:00
|
|
|
if (inp->inp_faddr.s_addr == faddr.s_addr &&
|
1997-04-03 05:14:45 +00:00
|
|
|
inp->inp_laddr.s_addr == laddr.s_addr &&
|
|
|
|
inp->inp_fport == fport &&
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
inp->inp_lport == lport) {
|
|
|
|
/*
|
|
|
|
* Found.
|
|
|
|
*/
|
|
|
|
return (inp);
|
|
|
|
}
|
1996-10-07 19:06:12 +00:00
|
|
|
}
|
|
|
|
if (wildcard) {
|
|
|
|
struct inpcb *local_wild = NULL;
|
1999-12-07 17:39:16 +00:00
|
|
|
#if defined(INET6)
|
|
|
|
struct inpcb *local_wild_mapped = NULL;
|
|
|
|
#endif /* defined(INET6) */
|
1996-10-07 19:06:12 +00:00
|
|
|
|
1997-03-03 09:23:37 +00:00
|
|
|
head = &pcbinfo->hashbase[INP_PCBHASH(INADDR_ANY, lport, 0, pcbinfo->hashmask)];
|
2001-02-04 13:13:25 +00:00
|
|
|
LIST_FOREACH(inp, head, inp_hash) {
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
1999-12-21 11:14:12 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV4) == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
|
|
|
#endif
|
1996-10-07 19:06:12 +00:00
|
|
|
if (inp->inp_faddr.s_addr == INADDR_ANY &&
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
inp->inp_lport == lport) {
|
1999-12-07 17:39:16 +00:00
|
|
|
if (ifp && ifp->if_type == IFT_FAITH &&
|
|
|
|
(inp->inp_flags & INP_FAITH) == 0)
|
|
|
|
continue;
|
1996-10-07 19:06:12 +00:00
|
|
|
if (inp->inp_laddr.s_addr == laddr.s_addr)
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
return (inp);
|
1999-12-07 17:39:16 +00:00
|
|
|
else if (inp->inp_laddr.s_addr == INADDR_ANY) {
|
|
|
|
#if defined(INET6)
|
|
|
|
if (INP_CHECK_SOCKAF(inp->inp_socket,
|
|
|
|
AF_INET6))
|
|
|
|
local_wild_mapped = inp;
|
|
|
|
else
|
|
|
|
#endif /* defined(INET6) */
|
1996-10-07 19:06:12 +00:00
|
|
|
local_wild = inp;
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
1996-10-07 19:06:12 +00:00
|
|
|
}
|
|
|
|
}
|
1999-12-07 17:39:16 +00:00
|
|
|
#if defined(INET6)
|
|
|
|
if (local_wild == NULL)
|
|
|
|
return (local_wild_mapped);
|
|
|
|
#endif /* defined(INET6) */
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
return (local_wild);
|
1996-10-07 19:06:12 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
* Not found.
|
1996-10-07 19:06:12 +00:00
|
|
|
*/
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
return (NULL);
|
1995-04-09 01:29:31 +00:00
|
|
|
}
|
|
|
|
|
1995-04-10 08:52:45 +00:00
|
|
|
/*
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
* Insert PCB onto various hash lists.
|
1995-04-10 08:52:45 +00:00
|
|
|
*/
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
int
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbinshash(struct inpcb *inp)
|
1995-04-09 01:29:31 +00:00
|
|
|
{
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
struct inpcbhead *pcbhash;
|
|
|
|
struct inpcbporthead *pcbporthash;
|
|
|
|
struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
|
|
|
|
struct inpcbport *phd;
|
1999-12-07 17:39:16 +00:00
|
|
|
u_int32_t hashkey_faddr;
|
1995-04-09 01:29:31 +00:00
|
|
|
|
2003-11-08 23:02:36 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(pcbinfo);
|
2006-04-22 19:15:20 +00:00
|
|
|
INP_LOCK_ASSERT(inp);
|
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
|
|
|
if (inp->inp_vflag & INP_IPV6)
|
|
|
|
hashkey_faddr = inp->in6p_faddr.s6_addr32[3] /* XXX */;
|
|
|
|
else
|
|
|
|
#endif /* INET6 */
|
|
|
|
hashkey_faddr = inp->inp_faddr.s_addr;
|
|
|
|
|
|
|
|
pcbhash = &pcbinfo->hashbase[INP_PCBHASH(hashkey_faddr,
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
inp->inp_lport, inp->inp_fport, pcbinfo->hashmask)];
|
1995-04-09 01:29:31 +00:00
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
pcbporthash = &pcbinfo->porthashbase[INP_PCBPORTHASH(inp->inp_lport,
|
|
|
|
pcbinfo->porthashmask)];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Go through port list and look for a head for this lport.
|
|
|
|
*/
|
2001-02-04 13:13:25 +00:00
|
|
|
LIST_FOREACH(phd, pcbporthash, phd_hash) {
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (phd->phd_port == inp->inp_lport)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* If none exists, malloc one and tack it on.
|
|
|
|
*/
|
|
|
|
if (phd == NULL) {
|
|
|
|
MALLOC(phd, struct inpcbport *, sizeof(struct inpcbport), M_PCB, M_NOWAIT);
|
|
|
|
if (phd == NULL) {
|
|
|
|
return (ENOBUFS); /* XXX */
|
|
|
|
}
|
|
|
|
phd->phd_port = inp->inp_lport;
|
|
|
|
LIST_INIT(&phd->phd_pcblist);
|
|
|
|
LIST_INSERT_HEAD(pcbporthash, phd, phd_hash);
|
|
|
|
}
|
|
|
|
inp->inp_phd = phd;
|
|
|
|
LIST_INSERT_HEAD(&phd->phd_pcblist, inp, inp_portlist);
|
|
|
|
LIST_INSERT_HEAD(pcbhash, inp, inp_hash);
|
|
|
|
return (0);
|
1995-04-09 01:29:31 +00:00
|
|
|
}
|
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
/*
|
|
|
|
* Move PCB to the proper hash bucket when { faddr, fport } have been
|
|
|
|
* changed. NOTE: This does not handle the case of the lport changing (the
|
|
|
|
* hashed port list would have to be updated as well), so the lport must
|
|
|
|
* not change after in_pcbinshash() has been called.
|
|
|
|
*/
|
1995-04-09 01:29:31 +00:00
|
|
|
void
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbrehash(struct inpcb *inp)
|
1995-04-09 01:29:31 +00:00
|
|
|
{
|
2003-11-08 23:02:36 +00:00
|
|
|
struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
|
1995-04-09 01:29:31 +00:00
|
|
|
struct inpcbhead *head;
|
1999-12-07 17:39:16 +00:00
|
|
|
u_int32_t hashkey_faddr;
|
|
|
|
|
2003-11-08 23:02:36 +00:00
|
|
|
INP_INFO_WLOCK_ASSERT(pcbinfo);
|
2004-08-19 01:11:17 +00:00
|
|
|
INP_LOCK_ASSERT(inp);
|
2006-04-22 19:15:20 +00:00
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
|
|
|
if (inp->inp_vflag & INP_IPV6)
|
|
|
|
hashkey_faddr = inp->in6p_faddr.s6_addr32[3] /* XXX */;
|
|
|
|
else
|
|
|
|
#endif /* INET6 */
|
|
|
|
hashkey_faddr = inp->inp_faddr.s_addr;
|
1995-04-09 01:29:31 +00:00
|
|
|
|
2003-11-08 23:02:36 +00:00
|
|
|
head = &pcbinfo->hashbase[INP_PCBHASH(hashkey_faddr,
|
|
|
|
inp->inp_lport, inp->inp_fport, pcbinfo->hashmask)];
|
1995-04-09 01:29:31 +00:00
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
LIST_REMOVE(inp, inp_hash);
|
1995-04-09 01:29:31 +00:00
|
|
|
LIST_INSERT_HEAD(head, inp, inp_hash);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Remove PCB from various lists.
|
|
|
|
*/
|
1999-11-05 14:41:39 +00:00
|
|
|
void
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbremlists(struct inpcb *inp)
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
{
|
2003-11-08 23:02:36 +00:00
|
|
|
struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
|
|
|
|
|
|
|
|
INP_INFO_WLOCK_ASSERT(pcbinfo);
|
|
|
|
INP_LOCK_ASSERT(inp);
|
|
|
|
|
|
|
|
inp->inp_gencnt = ++pcbinfo->ipi_gencnt;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (inp->inp_lport) {
|
|
|
|
struct inpcbport *phd = inp->inp_phd;
|
|
|
|
|
|
|
|
LIST_REMOVE(inp, inp_hash);
|
|
|
|
LIST_REMOVE(inp, inp_portlist);
|
2001-02-04 13:13:25 +00:00
|
|
|
if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
LIST_REMOVE(phd, phd_hash);
|
|
|
|
free(phd, M_PCB);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
LIST_REMOVE(inp, inp_list);
|
2003-11-08 23:02:36 +00:00
|
|
|
pcbinfo->ipi_count--;
|
1995-04-09 01:29:31 +00:00
|
|
|
}
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
/*
|
|
|
|
* A set label operation has occurred at the socket layer, propagate the
|
|
|
|
* label change into the in_pcb for the socket.
|
|
|
|
*/
|
|
|
|
void
|
2006-01-22 01:16:25 +00:00
|
|
|
in_pcbsosetlabel(struct socket *so)
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
{
|
|
|
|
#ifdef MAC
|
|
|
|
struct inpcb *inp;
|
|
|
|
|
2006-04-01 16:04:42 +00:00
|
|
|
inp = sotoinpcb(so);
|
|
|
|
KASSERT(inp != NULL, ("in_pcbsosetlabel: so->so_pcb == NULL"));
|
2006-04-22 19:15:20 +00:00
|
|
|
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
INP_LOCK(inp);
|
2004-06-13 02:50:07 +00:00
|
|
|
SOCK_LOCK(so);
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
mac_inpcb_sosetlabel(so, inp);
|
2004-06-13 02:50:07 +00:00
|
|
|
SOCK_UNLOCK(so);
|
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
|
|
|
INP_UNLOCK(inp);
|
|
|
|
#endif
|
|
|
|
}
|
2005-01-02 01:50:57 +00:00
|
|
|
|
|
|
|
/*
|
2006-06-02 08:18:27 +00:00
|
|
|
* ipport_tick runs once per second, determining if random port allocation
|
|
|
|
* should be continued. If more than ipport_randomcps ports have been
|
|
|
|
* allocated in the last second, then we return to sequential port
|
|
|
|
* allocation. We return to random allocation only once we drop below
|
|
|
|
* ipport_randomcps for at least ipport_randomtime seconds.
|
2005-01-02 01:50:57 +00:00
|
|
|
*/
|
|
|
|
void
|
2006-01-22 01:16:25 +00:00
|
|
|
ipport_tick(void *xtp)
|
2005-01-02 01:50:57 +00:00
|
|
|
{
|
2006-06-02 08:18:27 +00:00
|
|
|
|
|
|
|
if (ipport_tcpallocs <= ipport_tcplastcount + ipport_randomcps) {
|
2005-01-02 01:50:57 +00:00
|
|
|
if (ipport_stoprandom > 0)
|
|
|
|
ipport_stoprandom--;
|
2006-06-02 08:18:27 +00:00
|
|
|
} else
|
|
|
|
ipport_stoprandom = ipport_randomtime;
|
2005-01-02 01:50:57 +00:00
|
|
|
ipport_tcplastcount = ipport_tcpallocs;
|
|
|
|
callout_reset(&ipport_tick_callout, hz, ipport_tick, NULL);
|
|
|
|
}
|