Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.
Convert PF_LOCAL PCB structures to be type-stable and add a version number.
Define an external format for infomation about socket structures and use
it in several places.
Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time. This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.
It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).
its own zone; this is used particularly by TCP which allocates both inpcb and
tcpcb in a single allocation. (Some hackery ensures that the tcpcb is
reasonably aligned.) Also keep track of the number of pcbs of each type
allocated, and keep a generation count (instance version number) for future
use.
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
This will not make any of object files that LINT create change; there
might be differences with INET disabled, but hardly anything compiled
before without INET anyway. Now the 'obvious' things will give a
proper error if compiled without inet - ipx_ip, ipfw, tcp_debug. The
only thing that _should_ work (but can't be made to compile reasonably
easily) is sppp :-(
This commit move struct arpcom from <netinet/if_ether.h> to
<net/if_arp.h>.
these are quite extensive additions to the ipfw code.
they include a change to the API because the old method was
broken, but the user view is kept the same.
The new code allows a particular match to skip forward to a particular
line number, so that blocks of rules can be
used without checking all the intervening rules.
There are also many more ways of rejecting
connections especially TCP related, and
many many more ...
see the man page for a complete description.
switch. I needed 'LINT' to compile for other reasons so I kinda got the
blood on my hands. Note: I don't know how to test this, I don't know if
it works correctly.
cache lines. Removed the struct ip proto since only a couple of chars
were actually being used in it. Changed the order of compares in the
PCB hash lookup to take advantage of partial cache line fills (on PPro).
Discussed-with: wollman
the quality of the hash distribution. This does not fix a problem dealing
with poor distribution when using lots of IP aliases and listening
on the same port on every one of them...some other day perhaps; fixing
that requires significant code changes.
The use of xor was inspired by David S. Miller <davem@jenolan.rutgers.edu>
set it in the first place, independent of whether sin->sin_port
is set.
The result is that diverted packets that are being forwarded
will be diverted once and only once on the way in (ip_input())
and again, once and only once on the way out (ip_output()) -
twice in total. ICMP packets that don't contain a port will
now also be diverted.
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
to TAILQs. Fix places which referenced these for no good reason
that I can see (the references remain, but were fixed to compile
again; they are still questionable).
This stuff should not be too destructive if the IPDIVERT is not compiled in..
be aware that this changes the size of the ip_fw struct
so ipfw needs to be recompiled to use it.. more changes coming to clean this up.