families.
The protosw[] array of any particular protocol family ("domain") is of fixed size
defined at compile time. This made it impossible to dynamically add or remove any
protocols to or from it. We work around this by introducing so called SPACER's
which are embedded into the protosw[] array at compile time. The SPACER's have
a special protocol number (32767) to indicate the fact that they are SPACER's but
are otherwise NULL. Only as many protocols can be dynamically loaded as SPACER's
are provided in the protosw[] structure.
The pr_usrreqs structure is treated more special and contains pointers to dummy
functions only returning EOPNOTSUPP. This is needed because the use of those
functions pointers is usually not checked within the kernel because until now it
was assumed to be a valid function pointer. Instead of fixing all potential
callers we just return a proper error code.
Two new functions provide a clean API to register and unregister a protocol. The
register function expects a pointer to a valid and complete struct protosw including
a pointer to struct pru_usrreqs provided by the caller. Upon successful registration
the pr_init() function will be called to finish initialization of the protocol. The
unregister function restores the SPACER in place of the protocol again. It is the
responseability of the caller to ensure proper closing of all sockets and freeing
of memory allocation by the unloading protocol.
sys/protosw.h
o Define generic PROTO_SPACER to be 32767
o Prototypes for all pru_*_notsupp() functions
o Prototypes for pf_proto_[un]register() functions
kern/uipc_domain.c
o Global struct pr_usrreqs nousrreqs containing valid pointers to the
pru_*_notsupp() functions
o New functions pf_proto_[un]register()
kern/uipc_socket2.c
o New functions bodies for all pru_*_notsupp() functions
o remove irrlevant spl
Notes:
1. We don't lock domain list traversals as this is safe until we start
removing domains.
2. The calculation of max_datalen in net_init_domain appears safe as
noone depends on max_hdr and max_datalen having consistent values.
3. Giant is still held for fast and slow timeouts; this must stay until
each timeout routine is properly locked (coming soon).
Sponsored by: FreeBSD Fondation
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.
o Determine the lock strategy for each members in struct socket.
o Lock down the following members:
- so_count
- so_options
- so_linger
- so_state
o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:
- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()
Reviewed by: alfred
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.
TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.
Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks
Get rid of the spl wrapper kludge, it doesn't seem to be needed between
init calls since all that's running is the domain/protocol timers and they
are safe since domain list modifications are splnet() protected (which
blocks the timers)
Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.
Convert PF_LOCAL PCB structures to be type-stable and add a version number.
Define an external format for infomation about socket structures and use
it in several places.
Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time. This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.
It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.
2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
pr_usrreq mechanism which was poorly designed and error-prone. This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner. This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out). This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted). Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.
the high kernel calls into a protocol stack to perform requests on the
user's behalf. We replace the pr_usrreq() entry in struct protosw with a
pointer to a structure containing pointers to functions which implement
the various reuqests; each function is declared with the correct type and
number of arguments. (This is unlike the current scheme in which a quarter
of the requests take arguments of type other than (struct mbuf *) and the
difference is papered over with casts.) There are a few benefits to this
new scheme:
1) Arguments are passed with their correct types, and null-pointer dummies
are no longer necessary.
2) There should be slightly better caching effects from eliminating
the prximity to extraneous code and th switch in pr_usrreq().
3) It becomes much easier to change the types of the arguments to something
other than `struct mbuf *' (e.g.,pushing the work of sosend() into
the protocol as advocated by Van Jacobson).
There is one principal drawback: existing protocol stacks need to
be modified. This is alleviated by compatibility code in
uipc_socket2.c and uipc_domain.c which emulates the new interface
in terms of the old and vice versa.
This idea is not original to me. I read about what Jacobson did
in one of his papers and have tried to implement the first steps
towards something like that here. Much work remains to be done.
*' instead of caddr_t and it isn't optional (it never was). Most of the
netipx (and netns) pr_ctlinput functions abuse the second arg instead of
using the third arg but fixing this is beyond the scope of this round
of changes.
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..
NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..
certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)
The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.
there may even be LKMs.) Also, change the internal name of `unixdomain'
to `localdomain' since AF_LOCAL is now the preferred name of this family.
Declare netisr correctly and in the right place.