Commit Graph

24 Commits

Author SHA1 Message Date
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Marko Zec
bfe1aba468 Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized.  In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw).  Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module.  The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on.  Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first.  The framework will panic if it detects any unresolved
dependencies before completing system initialization.  Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run.  In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container.  IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-11 05:58:58 +00:00
Marko Zec
1ed81b739e First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

        At this stage, the new per-vnet initializer functions are
	directly called from the existing global initialization code,
	which should in most cases result in compiler inlining those
	new functions, hence yielding a near-zero functional change.

        Modify the existing initializer functions which are invoked via
        protosw, like ip_init() et. al., to allow them to be invoked
	multiple times, i.e. per each vnet.  Global state, if any,
	is initialized only if such functions are called within the
	context of vnet0, which will be determined via the
	IS_DEFAULT_VNET(curvnet) check (currently always true).

        While here, V_irtualize a few remaining global UMA zones
        used by net/netinet/netipsec networking code.  While it is
        not yet clear to me or anybody else whether this is the right
        thing to do, at this stage this makes the code more readable,
        and makes it easier to track uncollected UMA-zone-backed
        objects on vnet removal.  In the long run, it's quite possible
        that some form of shared use of UMA zone pools among multiple
        vnets should be considered.

	Bump __FreeBSD_version due to changes in layout of structs
	vnet_ipfw, vnet_inet and vnet_net.

Approved by:	julian (mentor)
2009-04-06 22:29:41 +00:00
Marko Zec
44e33a0758 Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions.  As a rule,
initialization at instatiation for such variables should never be
introduced again from now on.  Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact.  In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-19 09:39:34 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Bjoern A. Zeeb
eaa9325f48 In addition to the ipsec_osdep.h removal a week ago, now also eliminate
IPSEC_SPLASSERT_SOFTNET which has been 'unused' since FreeBSD 5.0.
2008-05-24 15:32:46 +00:00
Robert Watson
0bf686c125 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00
George V. Neville-Neil
2cb64cb272 Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by:    bz
Approved by:    re
Supported by:   Secure Computing
2007-07-01 11:41:27 +00:00
Pawel Jakub Dawidek
80e35494cc - The authsize field from auth_hash structure was removed.
- Define that we want to receive only 96 bits of HMAC.
- Names of the structues have no longer _96 suffix.

Reviewed by:	sam
2006-05-17 18:30:28 +00:00
Pawel Jakub Dawidek
6131838b7c Hide net.inet.ipsec.test_{replay,integrity} sysctls under #ifdef REGRESSION.
Requested by:	sam, rwatson
2006-04-10 15:04:36 +00:00
Pawel Jakub Dawidek
dfa9422b4a Introduce two new sysctls:
net.inet.ipsec.test_replay - When set to 1, IPsec will send packets with
	the same sequence number. This allows to verify if the other side
	has proper replay attacks detection.

net.inet.ipsec.test_integrity - When set 1, IPsec will send packets with
	corrupted HMAC. This allows to verify if the other side properly
	detects modified packets.

I used the first one to discover that we don't have proper replay attacks
detection in ESP (in fast_ipsec(4)).
2006-04-09 19:11:45 +00:00
George V. Neville-Neil
a0196c3c89 First steps towards IPSec cleanup.
Make the kernel side of FAST_IPSEC not depend on the shared
structures defined in /usr/include/net/pfkeyv2.h  The kernel now
defines all the necessary in kernel structures in sys/netipsec/keydb.h
and does the proper massaging when moving messages around.

Sponsored By: Secure Computing
2006-03-25 13:38:52 +00:00
Pawel Jakub Dawidek
ec31427d3f Allow to use fast_ipsec(4) on debug.mpsafenet=0 and INVARIANTS-enabled
systems. Without the change it will panic on assertions.

MFC after:	2 weeks
2006-03-23 23:26:34 +00:00
Sam Leffler
47e2996e8b promote fast ipsec's m_clone routine for public use; it is renamed
m_unshare and the caller can now control how mbufs are allocated

Reviewed by:	andre, luigi, mlaier
MFC after:	1 week
2006-03-15 21:11:11 +00:00
Warner Losh
c398230b64 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
Sam Leffler
9ffa96777e MFp4: portability work, general cleanup, locking fixes
change 38496
o add ipsec_osdep.h that holds os-specific definitions for portability
o s/KASSERT/IPSEC_ASSERT/ for portability
o s/SPLASSERT/IPSEC_SPLASSERT/ for portability
o remove function names from ASSERT strings since line#+file pinpints
  the location
o use __func__ uniformly to reduce string storage
o convert some random #ifdef DIAGNOSTIC code to assertions
o remove some debuggging assertions no longer needed

change 38498
o replace numerous bogus panic's with equally bogus assertions
  that at least go away on a production system

change 38502 + 38530
o change explicit mtx operations to #defines to simplify
  future changes to a different lock type

change 38531
o hookup ipv4 ctlinput paths to a noop routine; we should be
  handling path mtu changes at least
o correct potential null pointer deref in ipsec4_common_input_cb

chnage 38685
o fix locking for bundled SA's and for when key exchange is required

change 38770
o eliminate recursion on the SAHTREE lock

change 38804
o cleanup some types: long -> time_t
o remove refrence to dead #define

change 38805
o correct some types: long -> time_t
o add scan generation # to secpolicy to deal with locking issues

change 38806
o use LIST_FOREACH_SAFE instead of handrolled code
o change key_flush_spd to drop the sptree lock before purging
  an entry to avoid lock recursion and to avoid holding the lock
  over a long-running operation
o misc cleanups of tangled and twisty code

There is still much to do here but for now things look to be
working again.

Supported by:	FreeBSD Foundation
2003-09-29 22:57:43 +00:00
Sam Leffler
6464079f10 Locking and misc cleanups; most of which I've been running for >4 months:
o add locking
o strip irrelevant spl's
o split malloc types to better account for memory use
o remove unused IPSEC_NONBLOCK_ACQUIRE code
o remove dead code

Sponsored by:	FreeBSD Foundation
2003-09-01 05:35:55 +00:00
Sam Leffler
d8409aaf6e consolidate callback optimization check in one location by adding a flag
for crypto operations that indicates the crypto code should do the check
in crypto_done

MFC after:	1 day
2003-06-30 05:09:32 +00:00
Sam Leffler
1bb98f3b7b Check crypto driver capabilities and if the driver operates synchronously
mark crypto requests with ``callback immediately'' to avoid doing a context
switch to return crypto results.  This completes the work to eliminate
context switches for using software crypto via the crypto subsystem (with
symmetric crypto ops).
2003-06-27 20:10:03 +00:00
Sam Leffler
eb73a605cd o add a CRYPTO_F_CBIMM flag to symmetric ops to indicate the callback
should be done in crypto_done rather than in the callback thread
o use this flag to mark operations from /dev/crypto since the callback
  routine just does a wakeup; this eliminates the last unneeded ctx switch
o change CRYPTO_F_NODELAY to CRYPTO_F_BATCH with an inverted meaning
  so "0" becomes the default/desired setting (needed for user-mode
  compatibility with openbsd)
o change crypto_dispatch to honor CRYPTO_F_BATCH instead of always
  dispatching immediately
o remove uses of CRYPTO_F_NODELAY
o define COP_F_BATCH for ops submitted through /dev/crypto and pass
  this on to the op that is submitted

Similar changes and more eventually coming for asymmetric ops.

MFC if re gives approval.
2003-02-23 07:25:48 +00:00
Sam Leffler
88768458d2 "Fast IPsec": this is an experimental IPsec implementation that is derived
from the KAME IPsec implementation, but with heavy borrowing and influence
of openbsd.  A key feature of this implementation is that it uses the kernel
crypto framework to do all crypto work so when h/w crypto support is present
IPsec operation is automatically accelerated.  Otherwise the protocol
implementations are rather differet while the SADB and policy management
code is very similar to KAME (for the moment).

Note that this implementation is enabled with a FAST_IPSEC option.  With this
you get all protocols; i.e. there is no FAST_IPSEC_ESP option.

FAST_IPSEC and IPSEC are mutually exclusive; you cannot build both into a
single system.

This software is well tested with IPv4 but should be considered very
experimental (i.e. do not deploy in production environments).  This software
does NOT currently support IPv6.  In fact do not configure FAST_IPSEC and
INET6 in the same system.

Obtained from:	KAME + openbsd
Supported by:	Vernier Networks
2002-10-16 02:10:08 +00:00