1994-05-24 10:09:53 +00:00
|
|
|
/*
|
1995-09-21 17:55:49 +00:00
|
|
|
* Copyright (c) 1982, 1986, 1991, 1993, 1995
|
1994-05-24 10:09:53 +00:00
|
|
|
* The Regents of the University of California. All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. All advertising materials mentioning features or use of this software
|
|
|
|
* must display the following acknowledgement:
|
|
|
|
* This product includes software developed by the University of
|
|
|
|
* California, Berkeley and its contributors.
|
|
|
|
* 4. Neither the name of the University nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1995-09-21 17:55:49 +00:00
|
|
|
* @(#)in_pcb.c 8.4 (Berkeley) 5/24/95
|
1999-08-28 01:08:13 +00:00
|
|
|
* $FreeBSD$
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
|
1999-12-22 19:13:38 +00:00
|
|
|
#include "opt_ipsec.h"
|
1999-12-07 17:39:16 +00:00
|
|
|
#include "opt_inet6.h"
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/systm.h>
|
|
|
|
#include <sys/malloc.h>
|
|
|
|
#include <sys/mbuf.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <sys/domain.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/protosw.h>
|
|
|
|
#include <sys/socket.h>
|
|
|
|
#include <sys/socketvar.h>
|
|
|
|
#include <sys/proc.h>
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
#include <sys/jail.h>
|
1996-01-19 08:00:58 +00:00
|
|
|
#include <sys/kernel.h>
|
|
|
|
#include <sys/sysctl.h>
|
1998-03-28 10:18:26 +00:00
|
|
|
|
1998-03-28 10:33:27 +00:00
|
|
|
#include <machine/limits.h>
|
|
|
|
|
2002-03-20 05:48:55 +00:00
|
|
|
#include <vm/uma.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
#include <net/if.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <net/if_types.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <net/route.h>
|
|
|
|
|
|
|
|
#include <netinet/in.h>
|
|
|
|
#include <netinet/in_pcb.h>
|
|
|
|
#include <netinet/in_var.h>
|
|
|
|
#include <netinet/ip_var.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
|
|
|
#include <netinet/ip6.h>
|
|
|
|
#include <netinet6/ip6_var.h>
|
|
|
|
#endif /* INET6 */
|
|
|
|
|
|
|
|
#ifdef IPSEC
|
|
|
|
#include <netinet6/ipsec.h>
|
|
|
|
#include <netkey/key.h>
|
|
|
|
#endif /* IPSEC */
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
struct in_addr zeroin_addr;
|
|
|
|
|
1996-01-19 08:00:58 +00:00
|
|
|
/*
|
|
|
|
* These configure the range of local port addresses assigned to
|
|
|
|
* "unspecified" outgoing connections/packets/whatever.
|
|
|
|
*/
|
1999-11-22 02:45:11 +00:00
|
|
|
int ipport_lowfirstauto = IPPORT_RESERVED - 1; /* 1023 */
|
|
|
|
int ipport_lowlastauto = IPPORT_RESERVEDSTART; /* 600 */
|
2002-03-22 03:28:11 +00:00
|
|
|
int ipport_firstauto = IPPORT_HIFIRSTAUTO; /* 49152 */
|
|
|
|
int ipport_lastauto = IPPORT_HILASTAUTO; /* 65535 */
|
1999-11-22 02:45:11 +00:00
|
|
|
int ipport_hifirstauto = IPPORT_HIFIRSTAUTO; /* 49152 */
|
|
|
|
int ipport_hilastauto = IPPORT_HILASTAUTO; /* 65535 */
|
1996-01-19 08:00:58 +00:00
|
|
|
|
1996-08-12 14:05:54 +00:00
|
|
|
#define RANGECHK(var, min, max) \
|
|
|
|
if ((var) < (min)) { (var) = (min); } \
|
|
|
|
else if ((var) > (max)) { (var) = (max); }
|
|
|
|
|
|
|
|
static int
|
2000-07-04 11:25:35 +00:00
|
|
|
sysctl_net_ipport_check(SYSCTL_HANDLER_ARGS)
|
1996-08-12 14:05:54 +00:00
|
|
|
{
|
|
|
|
int error = sysctl_handle_int(oidp,
|
|
|
|
oidp->oid_arg1, oidp->oid_arg2, req);
|
|
|
|
if (!error) {
|
|
|
|
RANGECHK(ipport_lowfirstauto, 1, IPPORT_RESERVED - 1);
|
|
|
|
RANGECHK(ipport_lowlastauto, 1, IPPORT_RESERVED - 1);
|
|
|
|
RANGECHK(ipport_firstauto, IPPORT_RESERVED, USHRT_MAX);
|
|
|
|
RANGECHK(ipport_lastauto, IPPORT_RESERVED, USHRT_MAX);
|
|
|
|
RANGECHK(ipport_hifirstauto, IPPORT_RESERVED, USHRT_MAX);
|
|
|
|
RANGECHK(ipport_hilastauto, IPPORT_RESERVED, USHRT_MAX);
|
|
|
|
}
|
|
|
|
return error;
|
|
|
|
}
|
1996-02-22 21:32:23 +00:00
|
|
|
|
1996-08-12 14:05:54 +00:00
|
|
|
#undef RANGECHK
|
1996-01-19 08:00:58 +00:00
|
|
|
|
1996-08-12 14:05:54 +00:00
|
|
|
SYSCTL_NODE(_net_inet_ip, IPPROTO_IP, portrange, CTLFLAG_RW, 0, "IP Ports");
|
|
|
|
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, lowfirst, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_lowfirstauto, 0, &sysctl_net_ipport_check, "I", "");
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, lowlast, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_lowlastauto, 0, &sysctl_net_ipport_check, "I", "");
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, first, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_firstauto, 0, &sysctl_net_ipport_check, "I", "");
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, last, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_lastauto, 0, &sysctl_net_ipport_check, "I", "");
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, hifirst, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_hifirstauto, 0, &sysctl_net_ipport_check, "I", "");
|
|
|
|
SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, hilast, CTLTYPE_INT|CTLFLAG_RW,
|
|
|
|
&ipport_hilastauto, 0, &sysctl_net_ipport_check, "I", "");
|
1995-11-14 20:34:56 +00:00
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
/*
|
|
|
|
* in_pcb.c: manage the Protocol Control Blocks.
|
|
|
|
*
|
|
|
|
* NOTE: It is assumed that most of these functions will be called at
|
|
|
|
* splnet(). XXX - There are, unfortunately, a few exceptions to this
|
|
|
|
* rule that should be fixed.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate a PCB and associate it with the socket.
|
|
|
|
*/
|
1994-05-24 10:09:53 +00:00
|
|
|
int
|
2001-09-12 08:38:13 +00:00
|
|
|
in_pcballoc(so, pcbinfo, td)
|
1994-05-24 10:09:53 +00:00
|
|
|
struct socket *so;
|
1995-04-09 01:29:31 +00:00
|
|
|
struct inpcbinfo *pcbinfo;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *td;
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
|
|
|
register struct inpcb *inp;
|
2001-07-26 19:19:49 +00:00
|
|
|
#ifdef IPSEC
|
|
|
|
int error;
|
|
|
|
#endif
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2002-03-20 05:48:55 +00:00
|
|
|
inp = uma_zalloc(pcbinfo->ipi_zone, M_WAITOK);
|
1994-05-24 10:09:53 +00:00
|
|
|
if (inp == NULL)
|
|
|
|
return (ENOBUFS);
|
|
|
|
bzero((caddr_t)inp, sizeof(*inp));
|
1998-03-24 18:06:34 +00:00
|
|
|
inp->inp_gencnt = ++pcbinfo->ipi_gencnt;
|
1995-04-09 01:29:31 +00:00
|
|
|
inp->inp_pcbinfo = pcbinfo;
|
1994-05-24 10:09:53 +00:00
|
|
|
inp->inp_socket = so;
|
2001-07-26 19:19:49 +00:00
|
|
|
#ifdef IPSEC
|
|
|
|
error = ipsec_init_policy(so, &inp->inp_sp);
|
|
|
|
if (error != 0) {
|
2002-03-20 05:48:55 +00:00
|
|
|
uma_zfree(pcbinfo->ipi_zone, inp);
|
2001-07-26 19:19:49 +00:00
|
|
|
return error;
|
|
|
|
}
|
|
|
|
#endif /*IPSEC*/
|
2000-04-02 03:49:25 +00:00
|
|
|
#if defined(INET6)
|
2001-06-11 12:39:29 +00:00
|
|
|
if (INP_SOCKAF(so) == AF_INET6 && !ip6_mapped_addr_on)
|
|
|
|
inp->inp_flags |= IN6P_IPV6_V6ONLY;
|
2000-04-02 03:49:25 +00:00
|
|
|
#endif
|
1995-04-09 01:29:31 +00:00
|
|
|
LIST_INSERT_HEAD(pcbinfo->listhead, inp, inp_list);
|
1998-03-24 18:06:34 +00:00
|
|
|
pcbinfo->ipi_count++;
|
1994-05-24 10:09:53 +00:00
|
|
|
so->so_pcb = (caddr_t)inp;
|
2001-06-11 12:39:29 +00:00
|
|
|
#ifdef INET6
|
|
|
|
if (ip6_auto_flowlabel)
|
|
|
|
inp->inp_flags |= IN6P_AUTOFLOWLABEL;
|
|
|
|
#endif
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
2001-09-12 08:38:13 +00:00
|
|
|
in_pcbbind(inp, nam, td)
|
1994-05-24 10:09:53 +00:00
|
|
|
register struct inpcb *inp;
|
1997-08-16 19:16:27 +00:00
|
|
|
struct sockaddr *nam;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *td;
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2001-09-12 08:38:13 +00:00
|
|
|
struct proc *p = td->td_proc;
|
1994-05-24 10:09:53 +00:00
|
|
|
register struct socket *so = inp->inp_socket;
|
1996-10-30 06:13:10 +00:00
|
|
|
unsigned short *lastport;
|
1995-04-09 01:29:31 +00:00
|
|
|
struct sockaddr_in *sin;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
|
1994-05-24 10:09:53 +00:00
|
|
|
u_short lport = 0;
|
|
|
|
int wild = 0, reuseport = (so->so_options & SO_REUSEPORT);
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
int error, prison = 0;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1996-12-13 21:29:07 +00:00
|
|
|
if (TAILQ_EMPTY(&in_ifaddrhead)) /* XXX broken! */
|
1994-05-24 10:09:53 +00:00
|
|
|
return (EADDRNOTAVAIL);
|
|
|
|
if (inp->inp_lport || inp->inp_laddr.s_addr != INADDR_ANY)
|
|
|
|
return (EINVAL);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if ((so->so_options & (SO_REUSEADDR|SO_REUSEPORT)) == 0)
|
1996-10-07 19:06:12 +00:00
|
|
|
wild = 1;
|
1994-05-24 10:09:53 +00:00
|
|
|
if (nam) {
|
1997-08-16 19:16:27 +00:00
|
|
|
sin = (struct sockaddr_in *)nam;
|
|
|
|
if (nam->sa_len != sizeof (*sin))
|
1994-05-24 10:09:53 +00:00
|
|
|
return (EINVAL);
|
|
|
|
#ifdef notdef
|
|
|
|
/*
|
|
|
|
* We should check the family, but old programs
|
|
|
|
* incorrectly fail to initialize it.
|
|
|
|
*/
|
|
|
|
if (sin->sin_family != AF_INET)
|
|
|
|
return (EAFNOSUPPORT);
|
|
|
|
#endif
|
2000-09-17 13:35:42 +00:00
|
|
|
if (sin->sin_addr.s_addr != INADDR_ANY)
|
2002-02-27 18:32:23 +00:00
|
|
|
if (prison_ip(td->td_ucred, 0, &sin->sin_addr.s_addr))
|
2000-09-17 13:35:42 +00:00
|
|
|
return(EINVAL);
|
1994-05-24 10:09:53 +00:00
|
|
|
lport = sin->sin_port;
|
|
|
|
if (IN_MULTICAST(ntohl(sin->sin_addr.s_addr))) {
|
|
|
|
/*
|
|
|
|
* Treat SO_REUSEADDR as SO_REUSEPORT for multicast;
|
|
|
|
* allow complete duplication of binding if
|
|
|
|
* SO_REUSEPORT is set, or if SO_REUSEADDR is set
|
|
|
|
* and a multicast address is bound on both
|
|
|
|
* new and duplicated sockets.
|
|
|
|
*/
|
|
|
|
if (so->so_options & SO_REUSEADDR)
|
|
|
|
reuseport = SO_REUSEADDR|SO_REUSEPORT;
|
|
|
|
} else if (sin->sin_addr.s_addr != INADDR_ANY) {
|
|
|
|
sin->sin_port = 0; /* yech... */
|
2001-11-06 00:48:01 +00:00
|
|
|
bzero(&sin->sin_zero, sizeof(sin->sin_zero));
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ifa_ifwithaddr((struct sockaddr *)sin) == 0)
|
|
|
|
return (EADDRNOTAVAIL);
|
|
|
|
}
|
|
|
|
if (lport) {
|
|
|
|
struct inpcb *t;
|
|
|
|
/* GROSS */
|
1997-08-16 19:16:27 +00:00
|
|
|
if (ntohs(lport) < IPPORT_RESERVED && p &&
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
suser_xxx(0, p, PRISON_ROOT))
|
1995-09-21 17:55:49 +00:00
|
|
|
return (EACCES);
|
2002-02-27 18:32:23 +00:00
|
|
|
if (td && jailed(td->td_ucred))
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
prison = 1;
|
1999-09-19 02:17:02 +00:00
|
|
|
if (so->so_cred->cr_uid != 0 &&
|
1998-09-17 18:42:16 +00:00
|
|
|
!IN_MULTICAST(ntohl(sin->sin_addr.s_addr))) {
|
1998-03-01 19:39:29 +00:00
|
|
|
t = in_pcblookup_local(inp->inp_pcbinfo,
|
1999-11-22 02:45:11 +00:00
|
|
|
sin->sin_addr, lport,
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
prison ? 0 : INPLOOKUP_WILDCARD);
|
1998-09-17 18:42:16 +00:00
|
|
|
if (t &&
|
|
|
|
(ntohl(sin->sin_addr.s_addr) != INADDR_ANY ||
|
|
|
|
ntohl(t->inp_laddr.s_addr) != INADDR_ANY ||
|
|
|
|
(t->inp_socket->so_options &
|
|
|
|
SO_REUSEPORT) == 0) &&
|
1999-09-19 02:17:02 +00:00
|
|
|
(so->so_cred->cr_uid !=
|
1999-12-07 17:39:16 +00:00
|
|
|
t->inp_socket->so_cred->cr_uid)) {
|
|
|
|
#if defined(INET6)
|
2001-06-11 12:39:29 +00:00
|
|
|
if (ntohl(sin->sin_addr.s_addr) !=
|
1999-12-07 17:39:16 +00:00
|
|
|
INADDR_ANY ||
|
|
|
|
ntohl(t->inp_laddr.s_addr) !=
|
|
|
|
INADDR_ANY ||
|
|
|
|
INP_SOCKAF(so) ==
|
|
|
|
INP_SOCKAF(t->inp_socket))
|
|
|
|
#endif /* defined(INET6) */
|
1998-03-01 19:39:29 +00:00
|
|
|
return (EADDRINUSE);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
1998-03-01 19:39:29 +00:00
|
|
|
}
|
2001-02-28 09:38:48 +00:00
|
|
|
if (prison &&
|
2002-02-27 18:32:23 +00:00
|
|
|
prison_ip(td->td_ucred, 0, &sin->sin_addr.s_addr))
|
2001-02-28 09:38:48 +00:00
|
|
|
return (EADDRNOTAVAIL);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
t = in_pcblookup_local(pcbinfo, sin->sin_addr,
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
lport, prison ? 0 : wild);
|
1999-12-07 17:39:16 +00:00
|
|
|
if (t &&
|
|
|
|
(reuseport & t->inp_socket->so_options) == 0) {
|
|
|
|
#if defined(INET6)
|
2001-06-11 12:39:29 +00:00
|
|
|
if (ntohl(sin->sin_addr.s_addr) !=
|
1999-12-07 17:39:16 +00:00
|
|
|
INADDR_ANY ||
|
|
|
|
ntohl(t->inp_laddr.s_addr) !=
|
|
|
|
INADDR_ANY ||
|
|
|
|
INP_SOCKAF(so) ==
|
|
|
|
INP_SOCKAF(t->inp_socket))
|
|
|
|
#endif /* defined(INET6) */
|
1994-05-24 10:09:53 +00:00
|
|
|
return (EADDRINUSE);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
inp->inp_laddr = sin->sin_addr;
|
|
|
|
}
|
1996-02-22 21:32:23 +00:00
|
|
|
if (lport == 0) {
|
|
|
|
ushort first, last;
|
|
|
|
int count;
|
|
|
|
|
2000-09-17 13:35:42 +00:00
|
|
|
if (inp->inp_laddr.s_addr != INADDR_ANY)
|
2002-02-27 18:32:23 +00:00
|
|
|
if (prison_ip(td->td_ucred, 0, &inp->inp_laddr.s_addr )) {
|
2001-03-12 21:53:23 +00:00
|
|
|
inp->inp_laddr.s_addr = INADDR_ANY;
|
2000-09-17 13:35:42 +00:00
|
|
|
return (EINVAL);
|
2001-03-12 21:53:23 +00:00
|
|
|
}
|
1996-08-23 18:59:07 +00:00
|
|
|
inp->inp_flags |= INP_ANONPORT;
|
|
|
|
|
1996-02-22 21:32:23 +00:00
|
|
|
if (inp->inp_flags & INP_HIGHPORT) {
|
|
|
|
first = ipport_hifirstauto; /* sysctl */
|
|
|
|
last = ipport_hilastauto;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
lastport = &pcbinfo->lasthi;
|
1996-02-22 21:32:23 +00:00
|
|
|
} else if (inp->inp_flags & INP_LOWPORT) {
|
2001-03-12 21:53:23 +00:00
|
|
|
if (p && (error = suser_xxx(0, p, PRISON_ROOT))) {
|
|
|
|
inp->inp_laddr.s_addr = INADDR_ANY;
|
1997-04-27 20:01:29 +00:00
|
|
|
return error;
|
2001-03-12 21:53:23 +00:00
|
|
|
}
|
1996-08-12 14:05:54 +00:00
|
|
|
first = ipport_lowfirstauto; /* 1023 */
|
|
|
|
last = ipport_lowlastauto; /* 600 */
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
lastport = &pcbinfo->lastlow;
|
1996-02-22 21:32:23 +00:00
|
|
|
} else {
|
|
|
|
first = ipport_firstauto; /* sysctl */
|
|
|
|
last = ipport_lastauto;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
lastport = &pcbinfo->lastport;
|
1996-02-22 21:32:23 +00:00
|
|
|
}
|
|
|
|
/*
|
|
|
|
* Simple check to ensure all ports are not used up causing
|
|
|
|
* a deadlock here.
|
|
|
|
*
|
|
|
|
* We split the two cases (up and down) so that the direction
|
|
|
|
* is not being tested on each round of the loop.
|
|
|
|
*/
|
|
|
|
if (first > last) {
|
|
|
|
/*
|
|
|
|
* counting down
|
|
|
|
*/
|
|
|
|
count = first - last;
|
|
|
|
|
|
|
|
do {
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (count-- < 0) { /* completely used? */
|
|
|
|
inp->inp_laddr.s_addr = INADDR_ANY;
|
2001-01-23 07:27:56 +00:00
|
|
|
return (EADDRNOTAVAIL);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
}
|
1996-02-22 21:32:23 +00:00
|
|
|
--*lastport;
|
|
|
|
if (*lastport > first || *lastport < last)
|
|
|
|
*lastport = first;
|
|
|
|
lport = htons(*lastport);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
} while (in_pcblookup_local(pcbinfo,
|
|
|
|
inp->inp_laddr, lport, wild));
|
1996-02-22 21:32:23 +00:00
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* counting up
|
|
|
|
*/
|
|
|
|
count = last - first;
|
|
|
|
|
|
|
|
do {
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (count-- < 0) { /* completely used? */
|
|
|
|
/*
|
|
|
|
* Undo any address bind that may have
|
|
|
|
* occurred above.
|
|
|
|
*/
|
|
|
|
inp->inp_laddr.s_addr = INADDR_ANY;
|
2001-01-23 07:27:56 +00:00
|
|
|
return (EADDRNOTAVAIL);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
}
|
1996-02-22 21:32:23 +00:00
|
|
|
++*lastport;
|
|
|
|
if (*lastport < first || *lastport > last)
|
|
|
|
*lastport = first;
|
|
|
|
lport = htons(*lastport);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
} while (in_pcblookup_local(pcbinfo,
|
|
|
|
inp->inp_laddr, lport, wild));
|
1996-02-22 21:32:23 +00:00
|
|
|
}
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
inp->inp_lport = lport;
|
2002-02-27 18:32:23 +00:00
|
|
|
if (prison_ip(td->td_ucred, 0, &inp->inp_laddr.s_addr)) {
|
2001-03-12 21:53:23 +00:00
|
|
|
inp->inp_laddr.s_addr = INADDR_ANY;
|
|
|
|
inp->inp_lport = 0;
|
2001-12-13 04:01:23 +00:00
|
|
|
return (EINVAL);
|
2001-03-12 21:53:23 +00:00
|
|
|
}
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (in_pcbinshash(inp) != 0) {
|
|
|
|
inp->inp_laddr.s_addr = INADDR_ANY;
|
|
|
|
inp->inp_lport = 0;
|
|
|
|
return (EAGAIN);
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
1995-02-08 20:22:09 +00:00
|
|
|
/*
|
|
|
|
* Transform old in_pcbconnect() into an inner subroutine for new
|
|
|
|
* in_pcbconnect(): Do some validity-checking on the remote
|
|
|
|
* address (in mbuf 'nam') and then determine local host address
|
|
|
|
* (i.e., which interface) to use to access that remote host.
|
|
|
|
*
|
|
|
|
* This preserves definition of in_pcbconnect(), while supporting a
|
|
|
|
* slightly different version for T/TCP. (This is more than
|
|
|
|
* a bit of a kludge, but cleaning up the internal interfaces would
|
|
|
|
* have forced minor changes in every protocol).
|
|
|
|
*/
|
|
|
|
|
|
|
|
int
|
|
|
|
in_pcbladdr(inp, nam, plocal_sin)
|
|
|
|
register struct inpcb *inp;
|
1997-08-16 19:16:27 +00:00
|
|
|
struct sockaddr *nam;
|
1995-02-08 20:22:09 +00:00
|
|
|
struct sockaddr_in **plocal_sin;
|
|
|
|
{
|
1994-05-24 10:09:53 +00:00
|
|
|
struct in_ifaddr *ia;
|
1997-08-16 19:16:27 +00:00
|
|
|
register struct sockaddr_in *sin = (struct sockaddr_in *)nam;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1997-08-16 19:16:27 +00:00
|
|
|
if (nam->sa_len != sizeof (*sin))
|
1994-05-24 10:09:53 +00:00
|
|
|
return (EINVAL);
|
|
|
|
if (sin->sin_family != AF_INET)
|
|
|
|
return (EAFNOSUPPORT);
|
|
|
|
if (sin->sin_port == 0)
|
|
|
|
return (EADDRNOTAVAIL);
|
1996-12-13 21:29:07 +00:00
|
|
|
if (!TAILQ_EMPTY(&in_ifaddrhead)) {
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* If the destination address is INADDR_ANY,
|
|
|
|
* use the primary local address.
|
|
|
|
* If the supplied address is INADDR_BROADCAST,
|
|
|
|
* and the primary interface supports broadcast,
|
|
|
|
* choose the broadcast address for that interface.
|
|
|
|
*/
|
|
|
|
if (sin->sin_addr.s_addr == INADDR_ANY)
|
2001-02-04 13:13:25 +00:00
|
|
|
sin->sin_addr = IA_SIN(TAILQ_FIRST(&in_ifaddrhead))->sin_addr;
|
1994-05-24 10:09:53 +00:00
|
|
|
else if (sin->sin_addr.s_addr == (u_long)INADDR_BROADCAST &&
|
2001-02-04 13:13:25 +00:00
|
|
|
(TAILQ_FIRST(&in_ifaddrhead)->ia_ifp->if_flags & IFF_BROADCAST))
|
|
|
|
sin->sin_addr = satosin(&TAILQ_FIRST(&in_ifaddrhead)->ia_broadaddr)->sin_addr;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
if (inp->inp_laddr.s_addr == INADDR_ANY) {
|
|
|
|
register struct route *ro;
|
|
|
|
|
|
|
|
ia = (struct in_ifaddr *)0;
|
1995-05-30 08:16:23 +00:00
|
|
|
/*
|
1994-05-24 10:09:53 +00:00
|
|
|
* If route is known or can be allocated now,
|
|
|
|
* our src addr is taken from the i/f, else punt.
|
2002-01-21 20:04:22 +00:00
|
|
|
* Note that we should check the address family of the cached
|
|
|
|
* destination, in case of sharing the cache with IPv6.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
ro = &inp->inp_route;
|
|
|
|
if (ro->ro_rt &&
|
2002-01-21 20:04:22 +00:00
|
|
|
(ro->ro_dst.sa_family != AF_INET ||
|
|
|
|
satosin(&ro->ro_dst)->sin_addr.s_addr !=
|
|
|
|
sin->sin_addr.s_addr ||
|
|
|
|
inp->inp_socket->so_options & SO_DONTROUTE)) {
|
1994-05-24 10:09:53 +00:00
|
|
|
RTFREE(ro->ro_rt);
|
|
|
|
ro->ro_rt = (struct rtentry *)0;
|
|
|
|
}
|
|
|
|
if ((inp->inp_socket->so_options & SO_DONTROUTE) == 0 && /*XXX*/
|
|
|
|
(ro->ro_rt == (struct rtentry *)0 ||
|
|
|
|
ro->ro_rt->rt_ifp == (struct ifnet *)0)) {
|
|
|
|
/* No route yet, so try to acquire one */
|
2002-01-21 20:04:22 +00:00
|
|
|
bzero(&ro->ro_dst, sizeof(struct sockaddr_in));
|
1994-05-24 10:09:53 +00:00
|
|
|
ro->ro_dst.sa_family = AF_INET;
|
|
|
|
ro->ro_dst.sa_len = sizeof(struct sockaddr_in);
|
|
|
|
((struct sockaddr_in *) &ro->ro_dst)->sin_addr =
|
|
|
|
sin->sin_addr;
|
|
|
|
rtalloc(ro);
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* If we found a route, use the address
|
|
|
|
* corresponding to the outgoing interface
|
|
|
|
* unless it is the loopback (in case a route
|
|
|
|
* to our address on another net goes to loopback).
|
|
|
|
*/
|
|
|
|
if (ro->ro_rt && !(ro->ro_rt->rt_ifp->if_flags & IFF_LOOPBACK))
|
|
|
|
ia = ifatoia(ro->ro_rt->rt_ifa);
|
|
|
|
if (ia == 0) {
|
|
|
|
u_short fport = sin->sin_port;
|
|
|
|
|
|
|
|
sin->sin_port = 0;
|
|
|
|
ia = ifatoia(ifa_ifwithdstaddr(sintosa(sin)));
|
|
|
|
if (ia == 0)
|
|
|
|
ia = ifatoia(ifa_ifwithnet(sintosa(sin)));
|
|
|
|
sin->sin_port = fport;
|
|
|
|
if (ia == 0)
|
2001-02-04 13:13:25 +00:00
|
|
|
ia = TAILQ_FIRST(&in_ifaddrhead);
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ia == 0)
|
|
|
|
return (EADDRNOTAVAIL);
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* If the destination address is multicast and an outgoing
|
|
|
|
* interface has been set as a multicast option, use the
|
|
|
|
* address of that interface as our source address.
|
|
|
|
*/
|
|
|
|
if (IN_MULTICAST(ntohl(sin->sin_addr.s_addr)) &&
|
|
|
|
inp->inp_moptions != NULL) {
|
|
|
|
struct ip_moptions *imo;
|
|
|
|
struct ifnet *ifp;
|
|
|
|
|
|
|
|
imo = inp->inp_moptions;
|
|
|
|
if (imo->imo_multicast_ifp != NULL) {
|
|
|
|
ifp = imo->imo_multicast_ifp;
|
2001-02-04 16:08:18 +00:00
|
|
|
TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link)
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ia->ia_ifp == ifp)
|
|
|
|
break;
|
|
|
|
if (ia == 0)
|
|
|
|
return (EADDRNOTAVAIL);
|
|
|
|
}
|
|
|
|
}
|
1995-02-08 20:22:09 +00:00
|
|
|
/*
|
|
|
|
* Don't do pcblookup call here; return interface in plocal_sin
|
|
|
|
* and exit to caller, that will do the lookup.
|
|
|
|
*/
|
|
|
|
*plocal_sin = &ia->ia_addr;
|
1995-05-30 08:16:23 +00:00
|
|
|
|
1995-02-08 20:22:09 +00:00
|
|
|
}
|
|
|
|
return(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Outer subroutine:
|
|
|
|
* Connect from a socket to a specified address.
|
|
|
|
* Both address and port must be specified in argument sin.
|
|
|
|
* If don't have a local address for this socket yet,
|
|
|
|
* then pick one.
|
|
|
|
*/
|
|
|
|
int
|
2001-09-12 08:38:13 +00:00
|
|
|
in_pcbconnect(inp, nam, td)
|
1995-02-08 20:22:09 +00:00
|
|
|
register struct inpcb *inp;
|
1997-08-16 19:16:27 +00:00
|
|
|
struct sockaddr *nam;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *td;
|
1995-02-08 20:22:09 +00:00
|
|
|
{
|
|
|
|
struct sockaddr_in *ifaddr;
|
2000-09-17 13:35:42 +00:00
|
|
|
struct sockaddr_in *sin = (struct sockaddr_in *)nam;
|
|
|
|
struct sockaddr_in sa;
|
2001-02-21 06:39:57 +00:00
|
|
|
struct ucred *cred;
|
1995-02-08 20:22:09 +00:00
|
|
|
int error;
|
|
|
|
|
2001-02-21 06:39:57 +00:00
|
|
|
cred = inp->inp_socket->so_cred;
|
|
|
|
if (inp->inp_laddr.s_addr == INADDR_ANY && jailed(cred)) {
|
2000-09-17 13:35:42 +00:00
|
|
|
bzero(&sa, sizeof (sa));
|
o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
so as to enforce these protections, in particular, in kern_mib.c
protection sysctl access to the hostname and securelevel, as well as
kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
mib entries (osname, osrelease, osversion) so that they don't return
a pointer to the text in the struct linux_prison, rather, a copy
to an array passed into the calls. Likewise, update linprocfs to
use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
accessing struct prison.
Reviewed by: jhb
2001-12-03 16:12:27 +00:00
|
|
|
sa.sin_addr.s_addr = htonl(prison_getip(cred));
|
2000-09-17 13:35:42 +00:00
|
|
|
sa.sin_len=sizeof (sa);
|
|
|
|
sa.sin_family = AF_INET;
|
2001-09-12 08:38:13 +00:00
|
|
|
error = in_pcbbind(inp, (struct sockaddr *)&sa, td);
|
2000-09-17 13:35:42 +00:00
|
|
|
if (error)
|
2001-12-13 04:01:23 +00:00
|
|
|
return (error);
|
2000-09-17 13:35:42 +00:00
|
|
|
}
|
1995-02-08 20:22:09 +00:00
|
|
|
/*
|
|
|
|
* Call inner routine, to assign local interface address.
|
|
|
|
*/
|
1999-01-27 22:42:27 +00:00
|
|
|
if ((error = in_pcbladdr(inp, nam, &ifaddr)) != 0)
|
1995-02-08 20:22:09 +00:00
|
|
|
return(error);
|
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (in_pcblookup_hash(inp->inp_pcbinfo, sin->sin_addr, sin->sin_port,
|
1994-05-24 10:09:53 +00:00
|
|
|
inp->inp_laddr.s_addr ? inp->inp_laddr : ifaddr->sin_addr,
|
1999-12-07 17:39:16 +00:00
|
|
|
inp->inp_lport, 0, NULL) != NULL) {
|
1994-05-24 10:09:53 +00:00
|
|
|
return (EADDRINUSE);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
if (inp->inp_laddr.s_addr == INADDR_ANY) {
|
1999-06-25 23:46:47 +00:00
|
|
|
if (inp->inp_lport == 0) {
|
2001-09-12 08:38:13 +00:00
|
|
|
error = in_pcbbind(inp, (struct sockaddr *)0, td);
|
1999-06-25 23:46:47 +00:00
|
|
|
if (error)
|
2001-03-16 19:36:23 +00:00
|
|
|
return (error);
|
1999-06-25 23:46:47 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
inp->inp_laddr = ifaddr->sin_addr;
|
|
|
|
}
|
|
|
|
inp->inp_faddr = sin->sin_addr;
|
|
|
|
inp->inp_fport = sin->sin_port;
|
1995-04-09 01:29:31 +00:00
|
|
|
in_pcbrehash(inp);
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
void
|
1994-05-24 10:09:53 +00:00
|
|
|
in_pcbdisconnect(inp)
|
|
|
|
struct inpcb *inp;
|
|
|
|
{
|
|
|
|
|
|
|
|
inp->inp_faddr.s_addr = INADDR_ANY;
|
|
|
|
inp->inp_fport = 0;
|
1995-04-09 01:29:31 +00:00
|
|
|
in_pcbrehash(inp);
|
1994-05-24 10:09:53 +00:00
|
|
|
if (inp->inp_socket->so_state & SS_NOFDREF)
|
|
|
|
in_pcbdetach(inp);
|
|
|
|
}
|
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
void
|
1994-05-24 10:09:53 +00:00
|
|
|
in_pcbdetach(inp)
|
|
|
|
struct inpcb *inp;
|
|
|
|
{
|
|
|
|
struct socket *so = inp->inp_socket;
|
1998-03-24 18:06:34 +00:00
|
|
|
struct inpcbinfo *ipi = inp->inp_pcbinfo;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef IPSEC
|
2000-07-04 16:35:15 +00:00
|
|
|
ipsec4_delete_pcbpolicy(inp);
|
1999-12-07 17:39:16 +00:00
|
|
|
#endif /*IPSEC*/
|
1998-03-24 18:06:34 +00:00
|
|
|
inp->inp_gencnt = ++ipi->ipi_gencnt;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
in_pcbremlists(inp);
|
1994-05-24 10:09:53 +00:00
|
|
|
so->so_pcb = 0;
|
2001-11-17 03:07:11 +00:00
|
|
|
sotryfree(so);
|
1994-05-24 10:09:53 +00:00
|
|
|
if (inp->inp_options)
|
|
|
|
(void)m_free(inp->inp_options);
|
2001-11-22 04:50:44 +00:00
|
|
|
if (inp->inp_route.ro_rt)
|
|
|
|
rtfree(inp->inp_route.ro_rt);
|
1994-05-24 10:09:53 +00:00
|
|
|
ip_freemoptions(inp->inp_moptions);
|
1999-12-07 17:39:16 +00:00
|
|
|
inp->inp_vflag = 0;
|
2002-03-20 05:48:55 +00:00
|
|
|
uma_zfree(ipi->ipi_zone, inp);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
1997-02-18 20:46:36 +00:00
|
|
|
/*
|
|
|
|
* The calling convention of in_setsockaddr() and in_setpeeraddr() was
|
|
|
|
* modified to match the pru_sockaddr() and pru_peeraddr() entry points
|
|
|
|
* in struct pr_usrreqs, so that protocols can just reference then directly
|
|
|
|
* without the need for a wrapper function. The socket must have a valid
|
|
|
|
* (i.e., non-nil) PCB, but it should be impossible to get an invalid one
|
|
|
|
* except through a kernel programming error, so it is acceptable to panic
|
1997-08-16 19:16:27 +00:00
|
|
|
* (or in this case trap) if the PCB is invalid. (Actually, we don't trap
|
|
|
|
* because there actually /is/ a programming error somewhere... XXX)
|
1997-02-18 20:46:36 +00:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
in_setsockaddr(so, nam)
|
|
|
|
struct socket *so;
|
1997-08-16 19:16:27 +00:00
|
|
|
struct sockaddr **nam;
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
1997-05-19 01:28:39 +00:00
|
|
|
int s;
|
|
|
|
register struct inpcb *inp;
|
1994-05-24 10:09:53 +00:00
|
|
|
register struct sockaddr_in *sin;
|
1995-05-30 08:16:23 +00:00
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
/*
|
|
|
|
* Do the malloc first in case it blocks.
|
|
|
|
*/
|
2000-12-08 21:51:06 +00:00
|
|
|
MALLOC(sin, struct sockaddr_in *, sizeof *sin, M_SONAME,
|
|
|
|
M_WAITOK | M_ZERO);
|
1997-12-25 06:57:36 +00:00
|
|
|
sin->sin_family = AF_INET;
|
|
|
|
sin->sin_len = sizeof(*sin);
|
|
|
|
|
1997-05-19 01:28:39 +00:00
|
|
|
s = splnet();
|
|
|
|
inp = sotoinpcb(so);
|
1997-05-19 00:18:30 +00:00
|
|
|
if (!inp) {
|
|
|
|
splx(s);
|
1997-12-25 06:57:36 +00:00
|
|
|
free(sin, M_SONAME);
|
2000-05-19 00:55:21 +00:00
|
|
|
return ECONNRESET;
|
1997-05-19 00:18:30 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
sin->sin_port = inp->inp_lport;
|
|
|
|
sin->sin_addr = inp->inp_laddr;
|
1997-05-19 00:18:30 +00:00
|
|
|
splx(s);
|
1997-12-25 06:57:36 +00:00
|
|
|
|
|
|
|
*nam = (struct sockaddr *)sin;
|
1997-02-18 20:46:36 +00:00
|
|
|
return 0;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
1997-02-18 20:46:36 +00:00
|
|
|
int
|
|
|
|
in_setpeeraddr(so, nam)
|
|
|
|
struct socket *so;
|
1997-08-16 19:16:27 +00:00
|
|
|
struct sockaddr **nam;
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
1997-05-19 01:28:39 +00:00
|
|
|
int s;
|
|
|
|
struct inpcb *inp;
|
1994-05-24 10:09:53 +00:00
|
|
|
register struct sockaddr_in *sin;
|
1995-05-30 08:16:23 +00:00
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
/*
|
|
|
|
* Do the malloc first in case it blocks.
|
|
|
|
*/
|
2000-12-08 21:51:06 +00:00
|
|
|
MALLOC(sin, struct sockaddr_in *, sizeof *sin, M_SONAME,
|
|
|
|
M_WAITOK | M_ZERO);
|
1997-12-25 06:57:36 +00:00
|
|
|
sin->sin_family = AF_INET;
|
|
|
|
sin->sin_len = sizeof(*sin);
|
|
|
|
|
1997-05-19 01:28:39 +00:00
|
|
|
s = splnet();
|
|
|
|
inp = sotoinpcb(so);
|
1997-05-19 00:18:30 +00:00
|
|
|
if (!inp) {
|
|
|
|
splx(s);
|
1997-12-25 06:57:36 +00:00
|
|
|
free(sin, M_SONAME);
|
2000-05-19 00:55:21 +00:00
|
|
|
return ECONNRESET;
|
1997-05-19 00:18:30 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
sin->sin_port = inp->inp_fport;
|
|
|
|
sin->sin_addr = inp->inp_faddr;
|
1997-05-19 00:18:30 +00:00
|
|
|
splx(s);
|
1997-12-25 06:57:36 +00:00
|
|
|
|
|
|
|
*nam = (struct sockaddr *)sin;
|
1997-02-18 20:46:36 +00:00
|
|
|
return 0;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
void
|
2001-02-26 21:19:47 +00:00
|
|
|
in_pcbnotifyall(head, faddr, errno, notify)
|
1995-04-09 01:29:31 +00:00
|
|
|
struct inpcbhead *head;
|
1994-05-24 10:09:53 +00:00
|
|
|
struct in_addr faddr;
|
2001-02-26 21:19:47 +00:00
|
|
|
int errno;
|
2002-03-19 21:25:46 +00:00
|
|
|
void (*notify)(struct inpcb *, int);
|
2001-02-22 21:23:45 +00:00
|
|
|
{
|
2001-02-26 21:19:47 +00:00
|
|
|
struct inpcb *inp, *ninp;
|
|
|
|
int s;
|
2001-02-22 21:23:45 +00:00
|
|
|
|
|
|
|
s = splnet();
|
2001-02-26 21:19:47 +00:00
|
|
|
for (inp = LIST_FIRST(head); inp != NULL; inp = ninp) {
|
|
|
|
ninp = LIST_NEXT(inp, inp_list);
|
2001-02-22 21:23:45 +00:00
|
|
|
#ifdef INET6
|
2001-02-26 21:19:47 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV4) == 0)
|
2001-02-22 21:23:45 +00:00
|
|
|
continue;
|
|
|
|
#endif
|
|
|
|
if (inp->inp_faddr.s_addr != faddr.s_addr ||
|
2001-02-26 21:19:47 +00:00
|
|
|
inp->inp_socket == NULL)
|
2001-02-22 21:23:45 +00:00
|
|
|
continue;
|
2001-02-26 21:19:47 +00:00
|
|
|
(*notify)(inp, errno);
|
2001-02-22 21:23:45 +00:00
|
|
|
}
|
|
|
|
splx(s);
|
|
|
|
}
|
|
|
|
|
2001-08-04 17:10:14 +00:00
|
|
|
void
|
|
|
|
in_pcbpurgeif0(head, ifp)
|
|
|
|
struct inpcb *head;
|
|
|
|
struct ifnet *ifp;
|
|
|
|
{
|
|
|
|
struct inpcb *inp;
|
|
|
|
struct ip_moptions *imo;
|
|
|
|
int i, gap;
|
|
|
|
|
|
|
|
for (inp = head; inp != NULL; inp = LIST_NEXT(inp, inp_list)) {
|
|
|
|
imo = inp->inp_moptions;
|
|
|
|
if ((inp->inp_vflag & INP_IPV4) &&
|
|
|
|
imo != NULL) {
|
|
|
|
/*
|
|
|
|
* Unselect the outgoing interface if it is being
|
|
|
|
* detached.
|
|
|
|
*/
|
|
|
|
if (imo->imo_multicast_ifp == ifp)
|
|
|
|
imo->imo_multicast_ifp = NULL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Drop multicast group membership if we joined
|
|
|
|
* through the interface being detached.
|
|
|
|
*/
|
|
|
|
for (i = 0, gap = 0; i < imo->imo_num_memberships;
|
|
|
|
i++) {
|
|
|
|
if (imo->imo_membership[i]->inm_ifp == ifp) {
|
|
|
|
in_delmulti(imo->imo_membership[i]);
|
|
|
|
gap++;
|
|
|
|
} else if (gap != 0)
|
|
|
|
imo->imo_membership[i - gap] =
|
|
|
|
imo->imo_membership[i];
|
|
|
|
}
|
|
|
|
imo->imo_num_memberships -= gap;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Check for alternatives when higher level complains
|
|
|
|
* about service problems. For now, invalidate cached
|
|
|
|
* routing information. If the route was created dynamically
|
|
|
|
* (by a redirect), time to try a default gateway again.
|
|
|
|
*/
|
1994-05-25 09:21:21 +00:00
|
|
|
void
|
1994-05-24 10:09:53 +00:00
|
|
|
in_losing(inp)
|
|
|
|
struct inpcb *inp;
|
|
|
|
{
|
|
|
|
register struct rtentry *rt;
|
|
|
|
struct rt_addrinfo info;
|
|
|
|
|
|
|
|
if ((rt = inp->inp_route.ro_rt)) {
|
|
|
|
bzero((caddr_t)&info, sizeof(info));
|
2001-10-17 18:07:05 +00:00
|
|
|
info.rti_flags = rt->rt_flags;
|
|
|
|
info.rti_info[RTAX_DST] = rt_key(rt);
|
1994-05-24 10:09:53 +00:00
|
|
|
info.rti_info[RTAX_GATEWAY] = rt->rt_gateway;
|
|
|
|
info.rti_info[RTAX_NETMASK] = rt_mask(rt);
|
|
|
|
rt_missmsg(RTM_LOSING, &info, rt->rt_flags, 0);
|
|
|
|
if (rt->rt_flags & RTF_DYNAMIC)
|
2001-10-17 18:07:05 +00:00
|
|
|
(void) rtrequest1(RTM_DELETE, &info, NULL);
|
|
|
|
inp->inp_route.ro_rt = NULL;
|
Backout CSRG revision 7.22 to this file (if in_losing notices an
RTF_DYNAMIC route, it got freed twice). I am not sure what was
the actual problem in 1992, but the current behavior is memory
leak if PCB holds a reference to a dynamically created/modified
routing table entry. (rt_refcnt>0 and we don't call rtfree().)
My test bed was:
1. Set net.inet.tcp.msl to a low value (for test purposes), e.g.,
5 seconds, to speed up the transition of TCP connection to a
"closed" state.
2. Add a network route which causes ICMP redirect from the gateway.
3. ping(8) host H that matches this route; this creates RTF_DYNAMIC
RTF_HOST route to H. (I was forced to use ICMP to cause gateway
to generate ICMP host redirect, because gateway in question is a
4.2-STABLE system vulnerable to a problem that was fixed later in
ip_icmp.c,v 1.39.2.6, and TCP packets with DF bit set were
triggering this bug.)
4. telnet(1) to H
5. Block access to H with ipfw(8)
6. Send something in telnet(1) session; this causes EPERM, followed
by an in_losing() call in a few seconds.
7. Delete ipfw(8) rule blocking access to H, and wait for TCP
connection moving to a CLOSED state; PCB is freed.
8. Delete host route to H.
9. Watch with netstat(1) that `rttrash' increased.
10. Repeat steps 3-9, and watch `rttrash' increases.
PR: kern/25421
MFC after: 2 weeks
2001-06-29 12:07:29 +00:00
|
|
|
rtfree(rt);
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* A new route can be allocated
|
|
|
|
* the next time output is attempted.
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* After a routing change, flush old routing
|
|
|
|
* and allocate a (hopefully) better one.
|
|
|
|
*/
|
2001-02-22 21:23:45 +00:00
|
|
|
void
|
1994-05-24 10:09:53 +00:00
|
|
|
in_rtchange(inp, errno)
|
|
|
|
register struct inpcb *inp;
|
|
|
|
int errno;
|
|
|
|
{
|
|
|
|
if (inp->inp_route.ro_rt) {
|
|
|
|
rtfree(inp->inp_route.ro_rt);
|
|
|
|
inp->inp_route.ro_rt = 0;
|
|
|
|
/*
|
|
|
|
* A new route can be allocated the next time
|
|
|
|
* output is attempted.
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
/*
|
|
|
|
* Lookup a PCB based on the local address and port.
|
|
|
|
*/
|
1994-05-24 10:09:53 +00:00
|
|
|
struct inpcb *
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
in_pcblookup_local(pcbinfo, laddr, lport_arg, wild_okay)
|
1996-10-07 19:06:12 +00:00
|
|
|
struct inpcbinfo *pcbinfo;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
struct in_addr laddr;
|
|
|
|
u_int lport_arg;
|
1996-10-07 19:06:12 +00:00
|
|
|
int wild_okay;
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
1998-12-07 21:58:50 +00:00
|
|
|
register struct inpcb *inp;
|
1994-05-24 10:09:53 +00:00
|
|
|
int matchwild = 3, wildcard;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
u_short lport = lport_arg;
|
1995-04-10 08:52:45 +00:00
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (!wild_okay) {
|
|
|
|
struct inpcbhead *head;
|
|
|
|
/*
|
|
|
|
* Look for an unconnected (wildcard foreign addr) PCB that
|
|
|
|
* matches the local address and port we're looking for.
|
|
|
|
*/
|
|
|
|
head = &pcbinfo->hashbase[INP_PCBHASH(INADDR_ANY, lport, 0, pcbinfo->hashmask)];
|
2001-02-04 13:13:25 +00:00
|
|
|
LIST_FOREACH(inp, head, inp_hash) {
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
1999-12-21 11:14:12 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV4) == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
|
|
|
#endif
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (inp->inp_faddr.s_addr == INADDR_ANY &&
|
|
|
|
inp->inp_laddr.s_addr == laddr.s_addr &&
|
|
|
|
inp->inp_lport == lport) {
|
|
|
|
/*
|
|
|
|
* Found.
|
|
|
|
*/
|
|
|
|
return (inp);
|
|
|
|
}
|
1995-04-09 01:29:31 +00:00
|
|
|
}
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
/*
|
|
|
|
* Not found.
|
|
|
|
*/
|
|
|
|
return (NULL);
|
|
|
|
} else {
|
|
|
|
struct inpcbporthead *porthash;
|
|
|
|
struct inpcbport *phd;
|
|
|
|
struct inpcb *match = NULL;
|
|
|
|
/*
|
|
|
|
* Best fit PCB lookup.
|
|
|
|
*
|
|
|
|
* First see if this local port is in use by looking on the
|
|
|
|
* port hash list.
|
|
|
|
*/
|
|
|
|
porthash = &pcbinfo->porthashbase[INP_PCBPORTHASH(lport,
|
|
|
|
pcbinfo->porthashmask)];
|
2001-02-04 13:13:25 +00:00
|
|
|
LIST_FOREACH(phd, porthash, phd_hash) {
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (phd->phd_port == lport)
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
}
|
|
|
|
if (phd != NULL) {
|
|
|
|
/*
|
|
|
|
* Port is in use by one or more PCBs. Look for best
|
|
|
|
* fit.
|
|
|
|
*/
|
2001-02-04 16:08:18 +00:00
|
|
|
LIST_FOREACH(inp, &phd->phd_pcblist, inp_portlist) {
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
wildcard = 0;
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
1999-12-21 11:14:12 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV4) == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
|
|
|
#endif
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (inp->inp_faddr.s_addr != INADDR_ANY)
|
|
|
|
wildcard++;
|
|
|
|
if (inp->inp_laddr.s_addr != INADDR_ANY) {
|
|
|
|
if (laddr.s_addr == INADDR_ANY)
|
|
|
|
wildcard++;
|
|
|
|
else if (inp->inp_laddr.s_addr != laddr.s_addr)
|
|
|
|
continue;
|
|
|
|
} else {
|
|
|
|
if (laddr.s_addr != INADDR_ANY)
|
|
|
|
wildcard++;
|
|
|
|
}
|
|
|
|
if (wildcard < matchwild) {
|
|
|
|
match = inp;
|
|
|
|
matchwild = wildcard;
|
|
|
|
if (matchwild == 0) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
1995-03-02 19:29:42 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
return (match);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
}
|
1995-04-09 01:29:31 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Lookup PCB in hash list.
|
|
|
|
*/
|
|
|
|
struct inpcb *
|
1999-12-07 17:39:16 +00:00
|
|
|
in_pcblookup_hash(pcbinfo, faddr, fport_arg, laddr, lport_arg, wildcard,
|
|
|
|
ifp)
|
1995-04-09 01:29:31 +00:00
|
|
|
struct inpcbinfo *pcbinfo;
|
|
|
|
struct in_addr faddr, laddr;
|
|
|
|
u_int fport_arg, lport_arg;
|
1996-10-07 19:06:12 +00:00
|
|
|
int wildcard;
|
1999-12-07 17:39:16 +00:00
|
|
|
struct ifnet *ifp;
|
1995-04-09 01:29:31 +00:00
|
|
|
{
|
|
|
|
struct inpcbhead *head;
|
|
|
|
register struct inpcb *inp;
|
|
|
|
u_short fport = fport_arg, lport = lport_arg;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* First look for an exact match.
|
|
|
|
*/
|
1997-03-03 09:23:37 +00:00
|
|
|
head = &pcbinfo->hashbase[INP_PCBHASH(faddr.s_addr, lport, fport, pcbinfo->hashmask)];
|
2001-02-04 13:13:25 +00:00
|
|
|
LIST_FOREACH(inp, head, inp_hash) {
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
1999-12-21 11:14:12 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV4) == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
|
|
|
#endif
|
1996-10-07 19:06:12 +00:00
|
|
|
if (inp->inp_faddr.s_addr == faddr.s_addr &&
|
1997-04-03 05:14:45 +00:00
|
|
|
inp->inp_laddr.s_addr == laddr.s_addr &&
|
|
|
|
inp->inp_fport == fport &&
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
inp->inp_lport == lport) {
|
|
|
|
/*
|
|
|
|
* Found.
|
|
|
|
*/
|
|
|
|
return (inp);
|
|
|
|
}
|
1996-10-07 19:06:12 +00:00
|
|
|
}
|
|
|
|
if (wildcard) {
|
|
|
|
struct inpcb *local_wild = NULL;
|
1999-12-07 17:39:16 +00:00
|
|
|
#if defined(INET6)
|
|
|
|
struct inpcb *local_wild_mapped = NULL;
|
|
|
|
#endif /* defined(INET6) */
|
1996-10-07 19:06:12 +00:00
|
|
|
|
1997-03-03 09:23:37 +00:00
|
|
|
head = &pcbinfo->hashbase[INP_PCBHASH(INADDR_ANY, lport, 0, pcbinfo->hashmask)];
|
2001-02-04 13:13:25 +00:00
|
|
|
LIST_FOREACH(inp, head, inp_hash) {
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
1999-12-21 11:14:12 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV4) == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
|
|
|
#endif
|
1996-10-07 19:06:12 +00:00
|
|
|
if (inp->inp_faddr.s_addr == INADDR_ANY &&
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
inp->inp_lport == lport) {
|
1999-12-07 17:39:16 +00:00
|
|
|
if (ifp && ifp->if_type == IFT_FAITH &&
|
|
|
|
(inp->inp_flags & INP_FAITH) == 0)
|
|
|
|
continue;
|
1996-10-07 19:06:12 +00:00
|
|
|
if (inp->inp_laddr.s_addr == laddr.s_addr)
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
return (inp);
|
1999-12-07 17:39:16 +00:00
|
|
|
else if (inp->inp_laddr.s_addr == INADDR_ANY) {
|
|
|
|
#if defined(INET6)
|
|
|
|
if (INP_CHECK_SOCKAF(inp->inp_socket,
|
|
|
|
AF_INET6))
|
|
|
|
local_wild_mapped = inp;
|
|
|
|
else
|
|
|
|
#endif /* defined(INET6) */
|
1996-10-07 19:06:12 +00:00
|
|
|
local_wild = inp;
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
1996-10-07 19:06:12 +00:00
|
|
|
}
|
|
|
|
}
|
1999-12-07 17:39:16 +00:00
|
|
|
#if defined(INET6)
|
|
|
|
if (local_wild == NULL)
|
|
|
|
return (local_wild_mapped);
|
|
|
|
#endif /* defined(INET6) */
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
return (local_wild);
|
1996-10-07 19:06:12 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
* Not found.
|
1996-10-07 19:06:12 +00:00
|
|
|
*/
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
return (NULL);
|
1995-04-09 01:29:31 +00:00
|
|
|
}
|
|
|
|
|
1995-04-10 08:52:45 +00:00
|
|
|
/*
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
* Insert PCB onto various hash lists.
|
1995-04-10 08:52:45 +00:00
|
|
|
*/
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
int
|
1995-04-09 01:29:31 +00:00
|
|
|
in_pcbinshash(inp)
|
|
|
|
struct inpcb *inp;
|
|
|
|
{
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
struct inpcbhead *pcbhash;
|
|
|
|
struct inpcbporthead *pcbporthash;
|
|
|
|
struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
|
|
|
|
struct inpcbport *phd;
|
1999-12-07 17:39:16 +00:00
|
|
|
u_int32_t hashkey_faddr;
|
1995-04-09 01:29:31 +00:00
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
#ifdef INET6
|
|
|
|
if (inp->inp_vflag & INP_IPV6)
|
|
|
|
hashkey_faddr = inp->in6p_faddr.s6_addr32[3] /* XXX */;
|
|
|
|
else
|
|
|
|
#endif /* INET6 */
|
|
|
|
hashkey_faddr = inp->inp_faddr.s_addr;
|
|
|
|
|
|
|
|
pcbhash = &pcbinfo->hashbase[INP_PCBHASH(hashkey_faddr,
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
inp->inp_lport, inp->inp_fport, pcbinfo->hashmask)];
|
1995-04-09 01:29:31 +00:00
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
pcbporthash = &pcbinfo->porthashbase[INP_PCBPORTHASH(inp->inp_lport,
|
|
|
|
pcbinfo->porthashmask)];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Go through port list and look for a head for this lport.
|
|
|
|
*/
|
2001-02-04 13:13:25 +00:00
|
|
|
LIST_FOREACH(phd, pcbporthash, phd_hash) {
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (phd->phd_port == inp->inp_lport)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* If none exists, malloc one and tack it on.
|
|
|
|
*/
|
|
|
|
if (phd == NULL) {
|
|
|
|
MALLOC(phd, struct inpcbport *, sizeof(struct inpcbport), M_PCB, M_NOWAIT);
|
|
|
|
if (phd == NULL) {
|
|
|
|
return (ENOBUFS); /* XXX */
|
|
|
|
}
|
|
|
|
phd->phd_port = inp->inp_lport;
|
|
|
|
LIST_INIT(&phd->phd_pcblist);
|
|
|
|
LIST_INSERT_HEAD(pcbporthash, phd, phd_hash);
|
|
|
|
}
|
|
|
|
inp->inp_phd = phd;
|
|
|
|
LIST_INSERT_HEAD(&phd->phd_pcblist, inp, inp_portlist);
|
|
|
|
LIST_INSERT_HEAD(pcbhash, inp, inp_hash);
|
|
|
|
return (0);
|
1995-04-09 01:29:31 +00:00
|
|
|
}
|
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
/*
|
|
|
|
* Move PCB to the proper hash bucket when { faddr, fport } have been
|
|
|
|
* changed. NOTE: This does not handle the case of the lport changing (the
|
|
|
|
* hashed port list would have to be updated as well), so the lport must
|
|
|
|
* not change after in_pcbinshash() has been called.
|
|
|
|
*/
|
1995-04-09 01:29:31 +00:00
|
|
|
void
|
|
|
|
in_pcbrehash(inp)
|
|
|
|
struct inpcb *inp;
|
|
|
|
{
|
|
|
|
struct inpcbhead *head;
|
1999-12-07 17:39:16 +00:00
|
|
|
u_int32_t hashkey_faddr;
|
|
|
|
|
|
|
|
#ifdef INET6
|
|
|
|
if (inp->inp_vflag & INP_IPV6)
|
|
|
|
hashkey_faddr = inp->in6p_faddr.s6_addr32[3] /* XXX */;
|
|
|
|
else
|
|
|
|
#endif /* INET6 */
|
|
|
|
hashkey_faddr = inp->inp_faddr.s_addr;
|
1995-04-09 01:29:31 +00:00
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
head = &inp->inp_pcbinfo->hashbase[INP_PCBHASH(hashkey_faddr,
|
1997-03-03 09:23:37 +00:00
|
|
|
inp->inp_lport, inp->inp_fport, inp->inp_pcbinfo->hashmask)];
|
1995-04-09 01:29:31 +00:00
|
|
|
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
LIST_REMOVE(inp, inp_hash);
|
1995-04-09 01:29:31 +00:00
|
|
|
LIST_INSERT_HEAD(head, inp, inp_hash);
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Remove PCB from various lists.
|
|
|
|
*/
|
1999-11-05 14:41:39 +00:00
|
|
|
void
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
in_pcbremlists(inp)
|
|
|
|
struct inpcb *inp;
|
|
|
|
{
|
1998-05-15 20:11:40 +00:00
|
|
|
inp->inp_gencnt = ++inp->inp_pcbinfo->ipi_gencnt;
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
if (inp->inp_lport) {
|
|
|
|
struct inpcbport *phd = inp->inp_phd;
|
|
|
|
|
|
|
|
LIST_REMOVE(inp, inp_hash);
|
|
|
|
LIST_REMOVE(inp, inp_portlist);
|
2001-02-04 13:13:25 +00:00
|
|
|
if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
|
Improved connection establishment performance by doing local port lookups via
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
1998-01-27 09:15:13 +00:00
|
|
|
LIST_REMOVE(phd, phd_hash);
|
|
|
|
free(phd, M_PCB);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
LIST_REMOVE(inp, inp_list);
|
1998-03-24 18:06:34 +00:00
|
|
|
inp->inp_pcbinfo->ipi_count--;
|
1995-04-09 01:29:31 +00:00
|
|
|
}
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
|
|
|
|
int
|
|
|
|
prison_xinpcb(struct proc *p, struct inpcb *inp)
|
|
|
|
{
|
2001-02-21 06:39:57 +00:00
|
|
|
if (!jailed(p->p_ucred))
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
return (0);
|
o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
so as to enforce these protections, in particular, in kern_mib.c
protection sysctl access to the hostname and securelevel, as well as
kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
mib entries (osname, osrelease, osversion) so that they don't return
a pointer to the text in the struct linux_prison, rather, a copy
to an array passed into the calls. Likewise, update linprocfs to
use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
accessing struct prison.
Reviewed by: jhb
2001-12-03 16:12:27 +00:00
|
|
|
if (ntohl(inp->inp_laddr.s_addr) == prison_getip(p->p_ucred))
|
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
|
|
|
return (0);
|
|
|
|
return (1);
|
|
|
|
}
|