Commit Graph

84 Commits

Author SHA1 Message Date
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Robert Watson
2209e8f159 Adopt the slightly weaker consistency locking approach used in IPv4 raw
sockets for IPv6 raw sockets: separately lock the inpcb for determining
the destination address for a connect()'d raw socket at the rip6_send()
layer, and then re-acquire the inpcb lock in the rip6_output() layer to
query other options on the socket.  Previously, the global raw IP socket
lock was used, which while correct and marginally more consistent, could
add significantly to global raw IP socket lock contention.

MFC after:	1 week
2008-07-30 09:26:27 +00:00
Robert Watson
2f1ff0cd80 Since we fail IPv6 raw socket allocation if inp->in6p_icmp6filt can't
be allocated, there's no need to conditionize use and freeing of it
later.

MFC after:	1 week
2008-07-29 18:09:46 +00:00
Alexander Motin
6c5bbf5ce1 Move inpcb lock higher to protect some nonbinding fields reading.
It fixes nothing at this time, but decided to be more correct.
2008-07-28 19:32:18 +00:00
Alexander Motin
b11e21ae80 According to in_pcb.h protocol binding information has double locking.
It allows access it while list travercing holding only global pcbinfo lock.
2008-07-27 20:30:34 +00:00
Bjoern A. Zeeb
f2f877d38c Change the parameters to in6_selectsrc():
- pass in the inp instead of both in6p_moptions and laddr.
 - pass in cred for upcoming prison checks.

Reviewed by:	rwatson
2008-07-08 18:41:36 +00:00
Robert Watson
0ae76120da Improve approximation of style(9) in raw socket code. 2008-07-05 18:03:39 +00:00
Robert Watson
9ad11dd8a4 With IPv4 raw sockets, read lock rather than write lock the inpcb when
receiving or transmitting.

With IPv6 raw sockets, read lock rather than write lock the inpcb when
receiving.  Unfortunately, IPv6 source address selection appears to
require a write lock on the inpcb for the time being.

MFC after:	3 months
2008-04-21 12:06:41 +00:00
Robert Watson
8501a69cc9 Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.

This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.

MFC after:	3 months
Tested by:	kris (superset of committered patch)
2008-04-17 21:38:18 +00:00
Bjoern A. Zeeb
79ba395267 Replace the last susers calls in netinet6/ with privilege checks.
Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).

Leave a few comments to be addressed later.

Reviewed by:	rwatson (older version, before addressing his comments)
2008-01-24 08:25:59 +00:00
David E. O'Brien
9233d8f3ad un-__P() 2008-01-08 19:08:58 +00:00
David E. O'Brien
b48287a32a Clean up VCS Ids. 2007-12-10 16:03:40 +00:00
Xin LI
2a463222be Space cleanup
Approved by:	re (rwatson)
2007-07-05 16:29:40 +00:00
Xin LI
1272577e22 ANSIfy[1] plus some style cleanup nearby.
Discussed with:	gnn, rwatson
Submitted by:	Karl Sj?dahl - dunceor <dunceor gmail com> [1]
Approved by:	re (rwatson)
2007-07-05 16:23:49 +00:00
George V. Neville-Neil
b2630c2934 Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC
option is now deprecated, as well as the KAME IPsec code.
What was FAST_IPSEC is now IPSEC.

Approved by: re
Sponsored by: Secure Computing
2007-07-03 12:13:45 +00:00
George V. Neville-Neil
2cb64cb272 Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by:    bz
Approved by:    re
Supported by:   Secure Computing
2007-07-01 11:41:27 +00:00
Robert Watson
c2259ba44f Include priv.h to pick up suser(9) definitions, missed in an earlier
commit.

Warnings spotted by:	kris
2007-06-13 22:42:43 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Robert Watson
54d642bbe5 Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr
protocol entry points using functions named proto_getsockaddr and
proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr.
While it's true that sockaddrs are allocated and set, the net effect is
to retrieve (get) the socket address or peer address from a socket, not
set it, so align names to that intent.
2007-05-11 10:20:51 +00:00
Bruce M Simpson
1291e2a0eb Fix tinderbox. ip6_mrouter should be defined in raw_ip6.c as it is
tested to determine if the userland socket is open; this, in turn, is
used to determine if the module has been loaded.

Tested with:	LINT
2007-02-24 21:09:35 +00:00
Bruce M Simpson
6be2e366d6 Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel.
It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko,
if and only if IPv6 support is enabled for loadable modules.
Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).
2007-02-24 11:38:47 +00:00
Robert Watson
a152f8a361 Change semantics of socket close and detach. Add a new protocol switch
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket.  pru_abort is now a
notification of close also, and no longer detaches.  pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket.  This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.

This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree().  With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.

Reviewed by:	gnn
2006-07-21 17:11:15 +00:00
Stephan Uphoff
d915b28015 Fix race conditions on enumerating pcb lists by moving the initialization
( and where appropriate the destruction) of the pcb mutex to the init/finit
functions of the pcb zones.
This allows locking of the pcb entries and race condition free comparison
of the generation count.
Rearrange locking a bit to avoid extra locking operation to update the generation
count in in_pcballoc(). (in_pcballoc now returns the pcb locked)

I am planning to convert pcb list handling from a type safe to a reference count
model soon. ( As this allows really freeing the PCBs)

Reviewed by:	rwatson@, mohans@
MFC after:	1 week
2006-07-18 22:34:27 +00:00
Robert Watson
1e0acb6801 Use suser_cred() instead of a direct comparison of cr_uid with 0 in
rip6_output().

MFC after:	1 week
2006-06-25 13:54:59 +00:00
Robert Watson
ff7425ced0 Don't use spl around call to in_pcballoc() in IPv6 raw socket support;
all necessary synchronization appears present.

MFC after:	3 months
2006-04-12 03:07:22 +00:00
Robert Watson
14ba8add01 Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():

- Universally support and enforce the invariant that so_pcb is
  never NULL, converting dozens of unnecessary NULL checks into
  assertions, and eliminating dozens of unnecessary error handling
  cases in protocol code.

- In some cases, eliminate unnecessary pcbinfo locking, as it is no
  longer required to ensure so_pcb != NULL.  For example, in protocol
  shutdown methods, and in raw IP send.

- Abort and detach protocol switch methods no longer return failures,
  nor attempt to free sockets, as the socket layer does this.

- Invoke in_pcbfree() after in_pcbdetach() in order to free the
  detached in_pcb structure for a socket.

MFC after:	3 months
2006-04-01 16:20:54 +00:00
Robert Watson
bc725eafc7 Chance protocol switch method pru_detach() so that it returns void
rather than an error.  Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.

soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF.  so_pcb is now entirely owned and
managed by the protocol code.  Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.

Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.

In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.

netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit.  In their current state they may leak
memory or panic.

MFC after:	3 months
2006-04-01 15:42:02 +00:00
Robert Watson
ac45e92ff2 Change protocol switch pru_abort() API so that it returns void rather
than an int, as an error here is not meaningful.  Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.

This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit.  This will be corrected shortly in followup
commits to these components.

MFC after:      3 months
2006-04-01 15:15:05 +00:00
SUZUKI Shinsuke
4350fcab1b Raw IPv6 checksum must use the protocol number of the last header, instead of the first next-header value.
Obtained from: KAME
MFC after: 1 day
2005-10-19 01:21:49 +00:00
SUZUKI Shinsuke
971b154cd3 added a missing unlock
Submitted by: JINMEI Tatuya
MFC After: 1 day
2005-10-15 08:49:49 +00:00
SUZUKI Shinsuke
2af9b91993 added a missing unlock (just do the same thing as in netinet/raw_ip.c)
Obtained from: KAME
MFC after: 3 days
2005-08-18 11:11:27 +00:00
Hajimu UMEMOTO
a1f7e5f8ee scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
  scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
  scoped addresses as a special case.
- scope boundary check will be stricter.  For example, the current
  *BSD code allows a packet with src=::1 and dst=(some global IPv6
  address) to be sent outside of the node, if the application do:
    s = socket(AF_INET6);
    bind(s, "::1");
    sendto(s, some_global_IPv6_addr);
  This is clearly wrong, since ::1 is only meaningful within a single
  node, but the current implementation of the *BSD kernel cannot
  reject this attempt.

Submitted by:	JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from:	KAME
2005-07-25 12:31:43 +00:00
Hajimu UMEMOTO
885adbfa81 always copy ip6_pktopt. remove needcopy and needfree
argument/structure member accordingly.

Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from:	KAME
2005-07-21 16:39:23 +00:00
Hajimu UMEMOTO
e07db7aa57 simplified udp6_output() and rip6_output(): do not override
in6p_outputopts at the entrance of the functions.  this trick was
necessary when we passed an in6 pcb to in6_embedscope(), within which
the in6p_outputopts member was used, but we do not use this kind of
interface any more.

Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from:	KAME
2005-07-21 16:32:50 +00:00
Hajimu UMEMOTO
d5e3406d06 be consistent on naming advanced API functions; use ip6_XXXpktopt(s).
Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from:	KAME
2005-07-21 15:06:32 +00:00
Sam Leffler
8a9d54df38 check for malloc failure (also move malloc up to simplify error recovery)
Noticed by:	Coverity Prevent analysis tool
Reviewed by:	gnn
2005-03-29 01:26:27 +00:00
Robert Watson
8760934124 Remove a comment from the raw IPv6 output function regarding
M_TRYWAIT allocations: M_PREPEND() now uses M_DONTWAIT.

MFC after:	3 days
2005-02-06 21:43:55 +00:00
Warner Losh
caf43b0208 /* -> /*- for license, minor formatting changes, separate for KAME 2005-01-07 02:30:35 +00:00
Poul-Henning Kamp
756d52a195 Initialize struct pr_userreqs in new/sparse style and fill in common
default elements in net_init_domain().

This makes it possible to grep these structures and see any bogosities.
2004-11-08 14:44:54 +00:00
Robert Watson
0b7851fa03 Unlock rather than lock the ripcbinfo lock at the end of rip6_input().
RELENG_5 candidate.

Foot provided by:	Patrick Guelat <pg at imp dot ch>
2004-09-02 20:18:02 +00:00
Robert Watson
8a0c4da871 When allocating the IPv6 header to stick in front of raw packet being
sent via a raw IPv6 socket, use M_DONTWAIT not M_TRYWAIT, as we're
holding the raw pcb mutex.

Reported, tested by:	kuriyama
2004-08-12 18:31:36 +00:00
Robert Watson
f31f65a708 Pass pcbinfo structures to in6_pcbnotify() rather than pcbhead
structures, allowing in6_pcbnotify() to lock the pcbinfo and each
inpcb that it notifies of ICMPv6 events.  This prevents inpcb
assertions from firing when IPv6 generates and delievers event
notifications for inpcbs.

Reported by:	kuriyama
Tested by:	kuriyama
2004-08-06 03:45:45 +00:00
Robert Watson
07385abd73 Commit a first pass at in6pcb and pcbinfo locking for IPv6,
synchronizing IPv6 protocol control blocks and lists.  These changes
are modeled on the inpcb locking for IPv4, submitted by Jennifer Yang,
and committed by Jeffrey Hsu.  With these locking changes, IPv6 use of
inpcbs is now substantially more MPSAFE, and permits IPv4 inpcb locking
assertions to be run in the presence of IPv6 compiled into the kernel.
2004-07-27 23:44:03 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Pawel Jakub Dawidek
6823b82399 Remove unused argument.
Reviewed by:	ume
2004-03-27 20:41:32 +00:00
Hajimu UMEMOTO
da0f40995d IPSEC and FAST_IPSEC have the same internal API now;
so merge these (IPSEC has an extra ipsecstat)

Submitted by:	"Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
2004-02-17 14:02:37 +00:00
Hajimu UMEMOTO
efddf5c64d supported IPV6_RECVPATHMTU socket option.
Obtained from:	KAME
2004-02-13 14:50:01 +00:00
Hajimu UMEMOTO
f073c60f73 pass pcb rather than so. it is expected that per socket policy
works again.
2004-02-03 18:20:55 +00:00
Sam Leffler
5bd311a566 Split the "inp" mutex class into separate classes for each of divert,
raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness
complaints.

Reviewed by:	rwatson
Approved by:	re (rwatson)
2003-11-26 01:40:44 +00:00