Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.
Convert PF_LOCAL PCB structures to be type-stable and add a version number.
Define an external format for infomation about socket structures and use
it in several places.
Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time. This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.
It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).
* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.
* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.
* kill mono_time.
* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)
* nanosleep, select & poll takes long sleeps one day at a time
Reviewed by: bde
Tested by: ache and others
declaration macros so that a semicolon can be added when the macros
are invoked without giving a (pedantic) syntax error. Invocations
need to be followed by a semicolon so that programs like indent and
gtags don't get confused.
Fixed the one invocation that wasn't followed by a trailing semicolon.
since that might cause in_pcballoc to call MALLOC with M_WAITOK during
a software interrupt.
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
switch. I needed 'LINT' to compile for other reasons so I kinda got the
blood on my hands. Note: I don't know how to test this, I don't know if
it works correctly.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.
2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
to removing the connection from the queue. The problem here is that
falloc() may block and this would allow another process to accept the
connection instead. If this happens to leave the queue empty, then the
system will panic with an "accept: nothing queued".
Also changed a wakeup() to a wakeup_one() to avoid the "thundering herd"
problem on new connections in Apache (or any other application that has
multiple processes blocked in accept() for the same socket).
all of the configurables and instrumentation related to
inter-process communication mechanisms. Some variables,
like mbuf statistics, are instrumented here for the first
time.
For mbuf statistics: also keep track of m_copym() and
m_pullup() failures, and provide for the user's inspection
the compiled-in values of MSIZE, MHLEN, MCLBYTES, and MINCLSIZE.
sb_max * MCLBYTES / (MSIZE + MCLBYTES)
used in sbreserve() to overflow, causing all socket creation attempts
to fail. Force the calculation to use u_quad_t's, which makes overflow
less likely.
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
using a sockaddr_dl.
Fix the other packet-information socket options (SO_TIMESTAMP, IP_RECVDSTADDR)
to work for multicast UDP and raw sockets as well. (They previously only
worked for unicast UDP).
(yes I had tested the hell out of this).
I've also temporarily disabled the code so that it behaves as it previously
did (tail drop's the syns) pending discussion with fenner about some socket
state flags that I don't fully understand.
Submitted by: fenner
half way through the range rather than possibly colliding with
fixed elements. Increase the size of the arrays to take this into account..
remember that each element in the array is now only 1 ponter so this
isn't that much..
also note a possible bug in debugging code in uipc_socket2.c (add XXX)
pr_usrreq mechanism which was poorly designed and error-prone. This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner. This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out). This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted). Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.
the high kernel calls into a protocol stack to perform requests on the
user's behalf. We replace the pr_usrreq() entry in struct protosw with a
pointer to a structure containing pointers to functions which implement
the various reuqests; each function is declared with the correct type and
number of arguments. (This is unlike the current scheme in which a quarter
of the requests take arguments of type other than (struct mbuf *) and the
difference is papered over with casts.) There are a few benefits to this
new scheme:
1) Arguments are passed with their correct types, and null-pointer dummies
are no longer necessary.
2) There should be slightly better caching effects from eliminating
the prximity to extraneous code and th switch in pr_usrreq().
3) It becomes much easier to change the types of the arguments to something
other than `struct mbuf *' (e.g.,pushing the work of sosend() into
the protocol as advocated by Van Jacobson).
There is one principal drawback: existing protocol stacks need to
be modified. This is alleviated by compatibility code in
uipc_socket2.c and uipc_domain.c which emulates the new interface
in terms of the old and vice versa.
This idea is not original to me. I read about what Jacobson did
in one of his papers and have tried to implement the first steps
towards something like that here. Much work remains to be done.
the obsolete soqinsque and soqremque functions as well as collapsing
so_q0len and so_qlen into a single queue length of unaccepted connections.
Now the queue of unaccepted & complete connections is checked directly
for queued sockets. The new code should be functionally equivilent to
the old while being substantially faster - especially in cases where
large numbers of connections are often queued for accept (e.g. http).
the range [210:260] by sweeping the problem under the rug. This change
has the following effects:
1) A new MIB variable in the kern branch is defined to allow modification
of the socket buffer layer's ``wastage factor'' (which determines how
much unused-but-allocated space in mbufs and mbuf clusters is allowed
in a socket buffer).
2) The default value of the wastage factor is changed from 2 to 8.
The original value was chosen when MINCLSIZE was 7*MLEN (!), and is not
appropriate for an environment where MINCLSIZE is much less.
The real solution to this problem is to scrap both mbufs and sockbufs
and completely redesign the buffering mechanism used at both levels.
(maximum size of a socket buffer) tunable.
Permit callers of listen(2) to specify a negative backlog, which
is translated into somaxconn. Previously, a negative backlog was
silently translated into 0.