Commit Graph

256 Commits

Author SHA1 Message Date
eri
e8f045c8eb Fix a logic error in ipsec code that extracts
information from the packets.

Reviewed by:	bz, mlaier
Approved by:	mlaier(mentor)
MFC after:	1 month
2010-04-02 18:15:23 +00:00
bz
cb7afff0b8 When tearing down IPsec as part of a (virtual) network stack,
do not try to free the same list twice but free both the
acquiring list and the security policy acquiring list.

Reviewed by:	anchie
MFC after:	3 days
2010-03-28 06:51:50 +00:00
pjd
c08e2cb276 Correct typo in comment. 2010-02-18 22:34:29 +00:00
bz
68cb475233 Enable IPcomp by default.
PR:		kern/123587
MFC after:	5 days
2009-11-29 20:47:43 +00:00
bz
21f84f921a Add more statistics variables for IPcomp.
Try to version the struct in a backward compatible way.
People asked for the versioning of the stats structs in general before.

MFC after:	5 days
2009-11-29 20:37:30 +00:00
bz
a8256824b8 Assimilate very similar input and output code paths
(no real functional change).

MFC after:	5 days
2009-11-29 17:47:49 +00:00
bz
89a2398493 Only add the IPcomp header if crypto reported success and we have a lower
payload size.  Before we had always added the header, no matter if we
actually send out compressed data or not.

With this, after the opencrypto/deflate changes, IPcomp starts to work
apart from edge cases.  Leave it disabled by default until those are
fixed as well.

PR:		kern/123587
MFC after:	5 days
2009-11-29 10:53:34 +00:00
bz
90cf239f86 Remove whitespace.
MFC after:	6 days
2009-11-28 21:42:39 +00:00
bz
8bebf9e60f Directly send data uncompressed if the packet payload size is lower than
the compression algorithm threshold.

MFC after:	6 days
2009-11-28 21:40:57 +00:00
bz
b1928221d2 Correct a typo.
MFC after:	6 days
2009-11-28 21:01:26 +00:00
vanhu
7b642517df fixed two race conditions when inserting/removing SAs via PFKey,
which can both lead to a kernel panic when adding/removing quickly
a lot of SAs.

Obtained from:	NETASQ
MFC after:	2w (MFC on 8 before 8.0 release ???)
2009-11-17 16:00:41 +00:00
vanhu
4f56708582 Changed an IPSEC_ASSERT to a simple test, as such invalid packets
may come from outside without being discarded before.

Submitted by:	aurelien.ansel@netasq.com
Reviewed by:	bz (secteam)
Obtained from:	NETASQ
MFC after:	1m
2009-10-01 15:33:53 +00:00
vanhu
550a925d5c When checking traffic endpoint's adresses families in key_spdadd(),
compare them together instead of comparing each one with respective
tunnel endpoint.

PR:	kern/138439
Submitted by:	aurelien.ansel@netasq.com
Obtained from:	NETASQ
MFC after:	1 m
2009-09-16 11:56:44 +00:00
pjd
dfd5ed3d44 Silent gcc? Yeah, you wish. What I ment was to silence gcc.
Spotted by:	julian
2009-09-06 19:05:03 +00:00
pjd
fe152cceaf Initialize state_valid and arraysize variable so gcc won't complain.
Reported by:	bz
2009-09-06 18:09:25 +00:00
pjd
aa12b3e9c9 Improve code a bit by eliminating goto and having one unlock per lock. 2009-09-06 07:32:16 +00:00
pjd
78cb45a022 Correct typo in comment. 2009-09-06 07:30:21 +00:00
rwatson
ef8d755d4d Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock.  Either can be held to stablize the lists and indexes, but both
are required to write.  This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions.  As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by:	bz, julian
MFC after:	3 days
2009-08-23 20:40:19 +00:00
rwatson
fb9ffed650 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
rwatson
b3be1c6e3b Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
  occur each time a network stack is instantiated and destroyed.  In the
  !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
  For the VIMAGE case, we instead use SYSINIT's to track their order and
  properties on registration, using them for each vnet when created/
  destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
  previously, as well as its dependency scheme: we now just use the
  SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
  they want init functions to be called for each virtual network stack
  rather than just once at boot, compiling down to DOMAIN_SET() in the
  non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
  of modinfo or DOMAIN_SET() for init/uninit events.  In some cases,
  convert modular components from using modevent to using sysinit (where
  appropriate).  In some cases, do minor rejuggling of SYSINIT ordering
  to make room for or better manage events.

Portions submitted by:	jhb (VNET_SYSINIT), bz (cleanup)
Discussed with:		jhb, bz, julian, zec
Reviewed by:		bz
Approved by:		re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00
rwatson
d32e9e9901 Garbage collect vnet module registrations that have neither constructors
nor destructors, as there's no actual work to do.

In most cases, the constructors weren't needed because of the existing
protocol initialization functions run by net_init_domain() as part of
VNET_MOD_NET, or they were eliminated when support for static
initialization of virtualized globals was added.

Garbage collect dependency references to modules without constructors or
destructors, notably VNET_MOD_INET and VNET_MOD_INET6.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-07-20 13:55:33 +00:00
rwatson
6955067932 Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use).  Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions.  Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by:	bz
Approved by:	re (kib)
2009-07-19 14:20:53 +00:00
rwatson
88f8de4d40 Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used.  Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with:	bz, julian
Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-16 21:13:04 +00:00
rwatson
57ca4583e7 Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
rwatson
bd6eb7be79 Add address list locking for in6_ifaddrhead/ia_link: as with locking
for in_ifaddrhead, we stick with an rwlock for the time being, which
we will revisit in the future with a possible move to rmlocks.

Some pieces of code require significant further reworking to be
safe from all classes of writer-writer races.

Reviewed by:	bz
MFC after:	6 weeks
2009-06-25 16:35:28 +00:00
rwatson
ea70a3542d Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support.  As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done.  This means that one
class of reader-writer races still exists.

MFC after:      6 weeks
Reviewed by:    bz
2009-06-25 11:52:33 +00:00
rwatson
9c4380a8ee Convert netinet6 to using queue(9) rather than hand-crafted linked lists
for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead).  Adopt
the code styles and conventions present in netinet where possible.

Reviewed by:	gnn, bz
MFC after:	6 weeks (possibly not MFCable?)
2009-06-24 21:00:25 +00:00
bz
55f6868044 Move setting of ports from NAT-T below key_getsah() and actually
below key_setsaval().
Without that, the lookup for the SA had failed as we were looking for
a SA with the new, updated port numbers instead of the old ones and
were comparing the ports in key_cmpsaidx().
This makes updating the remote -> local SA on the initiator work again.

Problem introduced with:	p4 changeset 152114
2009-06-19 21:01:55 +00:00
bz
b017972f11 Add the explicit include of vimage.h to another five .c files still
missing it.

Remove the "hidden" kernel only include of vimage.h from ip_var.h added
with the very first Vimage commit r181803 to avoid further kernel poisoning.
2009-06-17 12:44:11 +00:00
vanhu
16c1346b9a Added support for NAT-Traversal (RFC 3948) in IPsec stack.
Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry
Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele
(julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense
team, and all people who used / tried the NAT-T patch for years and
reported bugs, patches, etc...

X-MFC: never

Reviewed by:	bz
Approved by:	gnn(mentor)
Obtained from:	NETASQ
2009-06-12 15:44:35 +00:00
bz
86d0c6ee3d Properly hide IPv4 only variables and functions under #ifdef INET. 2009-06-10 19:25:46 +00:00
bz
b7ff2bdc20 After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
zec
8b1f38241a Introduce an infrastructure for dismantling vnet instances.
Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state.  The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers.  Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels.  Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by:	bz, julian
Approved by:	rwatson, kib (re), julian (mentor)
2009-06-08 17:15:40 +00:00
rwatson
2bab695560 Reimplement the netisr framework in order to support parallel netisr
threads:

- Support up to one netisr thread per CPU, each processings its own
  workstream, or set of per-protocol queues.  Threads may be bound
  to specific CPUs, or allowed to migrate, based on a global policy.

  In the future it would be desirable to support topology-centric
  policies, such as "one netisr per package".

- Allow each protocol to advertise an ordering policy, which can
  currently be one of:

  NETISR_POLICY_SOURCE: packets must maintain ordering with respect to
    an implicit or explicit source (such as an interface or socket).

  NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work,
    as well as allowing protocols to provide a flow generation function
    for mbufs without flow identifers (m2flow).  Falls back on
    NETISR_POLICY_SOURCE if now flow ID is available.

  NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for
    each packet handled by netisr (m2cpuid).

- Provide utility functions for querying the number of workstreams
  being used, as well as a mapping function from workstream to CPU ID,
  which protocols may use in work placement decisions.

- Add explicit interfaces to get and set per-protocol queue limits, and
  get and clear drop counters, which query data or apply changes across
  all workstreams.

- Add a more extensible netisr registration interface, in which
  protocols declare 'struct netisr_handler' structures for each
  registered NETISR_ type.  These include name, handler function,
  optional mbuf to flow ID function, optional mbuf to CPU ID function,
  queue limit, and ordering policy.  Padding is present to allow these
  to be expanded in the future.  If no queue limit is declared, then
  a default is used.

- Queue limits are now per-workstream, and raised from the previous
  IFQ_MAXLEN default of 50 to 256.

- All protocols are updated to use the new registration interface, and
  with the exception of netnatm, default queue limits.  Most protocols
  register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use
  NETISR_POLICY_FLOW, and will therefore take advantage of driver-
  generated flow IDs if present.

- Formalize a non-packet based interface between interface polling and
  the netisr, rather than having polling pretend to be two protocols.
  Provide two explicit hooks in the netisr worker for start and end
  events for runs: netisr_poll() and netisr_pollmore(), as well as a
  function, netisr_sched_poll(), to allow the polling code to schedule
  netisr execution.  DEVICE_POLLING still embeds single-netisr
  assumptions in its implementation, so for now if it is compiled into
  the kernel, a single and un-bound netisr thread is enforced
  regardless of tunable configuration.

In the default configuration, the new netisr implementation maintains
the same basic assumptions as the previous implementation: a single,
un-bound worker thread processes all deferred work, and direct dispatch
is enabled by default wherever possible.

Performance measurement shows a marginal performance improvement over
the old implementation due to the use of batched dequeue.

An rmlock is used to synchronize use and registration/unregistration
using the framework; currently, synchronized use is disabled
(replicating current netisr policy) due to a measurable 3%-6% hit in
ping-pong micro-benchmarking.  It will be enabled once further rmlock
optimization has taken place.  However, in practice, netisrs are
rarely registered or unregistered at runtime.

A new man page for netisr will follow, but since one doesn't currently
exist, it hasn't been updated.

This change is not appropriate for MFC, although the polling shutdown
handler should be merged to 7-STABLE.

Bump __FreeBSD_version.

Reviewed by:	bz
2009-06-01 10:41:38 +00:00
vanhu
48cef84e5f Lock SPTREE before parsing it in key_spddump()
Approved by:	gnn(mentor)
Obtained from:	NETASQ
MFC after:	2 weeks
2009-05-27 09:44:14 +00:00
vanhu
6e1cb07c00 Only decrease refcnt once when flushing SPD entries, to
avoid flushing entries which are still used.

Approved by:	gnn(mentor)
Obtained from:	NETASQ
MFC after:	1 month
2009-05-27 09:31:50 +00:00
bz
9642ff6e28 Add sysctls to toggle the behaviour of the (former) IPSEC_FILTERTUNNEL
kernel option.
This also permits tuning of the option per virtual network stack, as
well as separately per inet, inet6.

The kernel option is left for a transition period, marked deprecated,
and will be removed soon.

Initially requested by:	phk (1 year 1 day ago)
MFC after:		4 weeks
2009-05-23 16:42:38 +00:00
zec
d78a1b1a82 Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
zec
7a56f17240 Make indentation more uniform accross vnet container structs.
This is a purely cosmetic / NOP change.

Reviewed by:	bz
Approved by:	julian (mentor)
Verified by:	svn diff -x -w producing no output
2009-05-02 08:16:26 +00:00
zec
39b6dc8ba2 Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)
2009-04-30 13:36:26 +00:00
bms
0915b81c76 Stub out IN6_LOOKUP_MULTI() for GETSPI requests, for now.
This has the effect that IPv6 multicast traffic won't trigger
an SPI allocation when IPSEC is in use, however, this obviously
needs to stomp on locks, and IN6_LOOKUP_MULTI() is about to go away.

This definitely needs to be revisited before 8.x is branched as
a release branch.
2009-04-29 11:15:58 +00:00
bz
a12cc82f1a key_gettunnel() has been unsued with FAST_IPSEC (now IPSEC).
KAME had explicit checks at one point using it, so just hide it behind
#if 0 for now until we are sure if we can completely dump it or not.

MFC after:	1 month
2009-04-27 21:04:16 +00:00
zec
b39b54e6de Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized.  In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw).  Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module.  The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on.  Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first.  The framework will panic if it detects any unresolved
dependencies before completing system initialization.  Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run.  In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container.  IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-11 05:58:58 +00:00
zec
c85551e0bc First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

        At this stage, the new per-vnet initializer functions are
	directly called from the existing global initialization code,
	which should in most cases result in compiler inlining those
	new functions, hence yielding a near-zero functional change.

        Modify the existing initializer functions which are invoked via
        protosw, like ip_init() et. al., to allow them to be invoked
	multiple times, i.e. per each vnet.  Global state, if any,
	is initialized only if such functions are called within the
	context of vnet0, which will be determined via the
	IS_DEFAULT_VNET(curvnet) check (currently always true).

        While here, V_irtualize a few remaining global UMA zones
        used by net/netinet/netipsec networking code.  While it is
        not yet clear to me or anybody else whether this is the right
        thing to do, at this stage this makes the code more readable,
        and makes it easier to track uncollected UMA-zone-backed
        objects on vnet removal.  In the long run, it's quite possible
        that some form of shared use of UMA zone pools among multiple
        vnets should be considered.

	Bump __FreeBSD_version due to changes in layout of structs
	vnet_ipfw, vnet_inet and vnet_net.

Approved by:	julian (mentor)
2009-04-06 22:29:41 +00:00
vanhu
7e0f7398ba Fixed comments so it stays in 80 chars by line
with hard tabs of 8 chars....

Approved by:	gnn(mentor)
2009-03-23 16:20:39 +00:00
vanhu
21967caaf2 Spelling fix in a comment
Approved by:	gnn(mentor)
2009-03-20 09:12:01 +00:00
vanhu
cea6d30cdc Fixed style for some comments
Approved by:	gnn(mentor)
2009-03-19 15:50:45 +00:00
vanhu
72aca0d947 Fixed style for some comments
Approved by:	gnn(mentor)
2009-03-19 15:44:13 +00:00
vanhu
e33d6fbff6 Fixed deletion of sav entries in key_delsah()
Approved by:	gnn(mentor)
Obtained from:	NETASQ
MFC after:	1 month
2009-03-18 14:01:41 +00:00
vanhu
a5f4a55744 SAs are valid (but dying) when they reached soft lifetime,
even if they have never been used.

Approved by:	gnn(mentor)
MFC after:	2 weeks
2009-03-05 16:22:32 +00:00
bz
4321e2a8f4 Add size-guards evaluated at compile-time to the main struct vnet_*
which are not in a module of their own like gif.

Single kernel compiles and universe will fail if the size of the struct
changes. Th expected values are given in sys/vimage.h.
See the comments where how to handle this.

Requested by:	peter
2009-03-01 11:01:00 +00:00
bz
df2be82cec For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
bz
710220924b Shuffle the vimage.h includes or add where missing. 2009-02-27 13:22:26 +00:00
rdivacky
e5bfcba080 Change the functions to ANSI in those cases where it breaks promotion
to int rule. See ISO C Standard: SS6.7.5.3:15.

Approved by:	kib (mentor)
Reviewed by:	warner
Tested by:	silence on -current
2009-02-24 18:09:31 +00:00
bz
8d30abae87 Try to remove/assimilate as much of formerly IPv4/6 specific
(duplicate) code in sys/netipsec/ipsec.c and fold it into
common, INET/6 independent functions.

The file local functions ipsec4_setspidx_inpcb() and
ipsec6_setspidx_inpcb() were 1:1 identical after the change
in r186528. Rename to ipsec_setspidx_inpcb() and remove the
duplicate.

Public functions ipsec[46]_get_policy() were 1:1 identical.
Remove one copy and merge in the factored out code from
ipsec_get_policy() into the other. The public function left
is now called ipsec_get_policy() and callers were adapted.

Public functions ipsec[46]_set_policy() were 1:1 identical.
Rename file local ipsec_set_policy() function to
ipsec_set_policy_internal().
Remove one copy of the public functions, rename the other
to ipsec_set_policy() and adapt callers.

Public functions ipsec[46]_hdrsiz() were logically identical
(ignoring one questionable assert in the v6 version).
Rename the file local ipsec_hdrsiz() to ipsec_hdrsiz_internal(),
the public function to ipsec_hdrsiz(), remove the duplicate
copy and adapt the callers.
The v6 version had been unused anyway. Cleanup comments.

Public functions ipsec[46]_in_reject() were logically identical
apart from statistics. Move the common code into a file local
ipsec46_in_reject() leaving vimage+statistics in small AF specific
wrapper functions. Note: unfortunately we already have a public
ipsec_in_reject().

Reviewed by:	sam
Discussed with:	rwatson (renaming to *_internal)
MFC after:	26 days
X-MFC:		keep wrapper functions for public symbols?
2009-02-08 09:27:07 +00:00
bz
133ba226c9 Use NULL rather than 0 when comparing pointers.
MFC after:	2 weeks
2009-01-30 20:17:08 +00:00
vanhu
1f1b367cb7 Remove remain <= MHLEN restriction in m_makespace(),
which caused assert with big packets

PR: kern/124609
Submitted by: fabien.thomas@netasq.com
Approved by:	gnn(mentor)
Obtained from:	NetBSD
MFC after:	1 month
2009-01-28 10:41:10 +00:00
bz
086c4b5b79 Switch the last protosw* structs to C99 initializers.
Reviewed by:	ed, julian, Christoph Mallon <christoph.mallon@gmx.de>
MFC after:	2 weeks
2009-01-05 20:29:01 +00:00
rwatson
1cfb2a06d0 Fix non-C99 initialization for protosw initializing pr_ousrreq. 2009-01-04 22:15:15 +00:00
rwatson
e259848db5 Unlike with struct protosw, several instances of struct ip6protosw
did not use C99-style sparse structure initialization, so remove
NULL assignments for now-removed pr_usrreq function pointers.

Reported by:	Chris Ruiz <yr.retarded at gmail.com>
2009-01-04 21:53:42 +00:00
bz
2207460cee Like in the rest of the file and the network stack use inp as
variable name for the inpcb.
For consistency with the other *_hdrsiz functions use 'size'
instead of 'siz' as variable name.

No functional change.

MFC after:	4 weeks
2008-12-27 23:24:59 +00:00
bz
14a7b2c5e1 Non-functional (style) changes:
- Always use round brackets with return ().
- Add empty line to beginning of functions without local variables.
- Comments start with a capital letter and end in a '.'.
  While there adapt a few comments.

Reviewed by:	rwatson
MFC after:	4 weeks
2008-12-27 22:58:16 +00:00
bz
2b6604ad9e Convert function definitions to constantly use ANSI-style
parameter declarations.

Reviewed by:	rwatson
MFC after:	4 weeks
2008-12-27 21:20:34 +00:00
bz
9d1d3ec9fe Rewrite ipsec6_setspidx_inpcb() to match the logic in the
(now) equivalent IPv4 counterpart.

MFC after:	4 weeks
2008-12-27 20:37:53 +00:00
bz
95679d0335 For consistency with ipsec4_setspidx_inpcb() rename file local function
ipsec6_setspidx_in6pcb() to ipsec6_setspidx_inpcb().

MFC after:	4 weeks
2008-12-27 19:42:59 +00:00
bz
f460f8ea26 Change the in6p variable names to inp to be able to diff
the v4 to the v6 implementations.

MFC after:	4 weeks
2008-12-27 19:37:46 +00:00
bz
52b35d1d21 Make ipsec_getpolicybysock() static and no longer export it. It has not
been used outside this file since about the FAST_IPSEC -> IPSEC change.

MFC after:	4 weeks
2008-12-27 09:36:22 +00:00
bz
46b52a2d38 Remove long unused netinet/ipprotosw.h (basically since r82884).
Discussed with:		rwatson
MFC after:		4 weeks
2008-12-23 16:52:03 +00:00
bz
03f6bb9dc9 Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().

Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.

Discussed with:	rwatson
Reviewed by:	rwatson (version before review requested changes)
MFC after:	4 weeks (set the timer and see then)
2008-12-15 21:50:54 +00:00
bz
98e7fe0e6a Second round of putting global variables, which were virtualized
but formerly missed under VIMAGE_GLOBAL.

Put the extern declarations of the  virtualized globals
under VIMAGE_GLOBAL as the globals themsevles are already.
This will help by the time when we are going to remove the globals
entirely.

Sponsored by:	The FreeBSD Foundation
2008-12-13 19:13:03 +00:00
zec
7b573d1496 Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out.  Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs.  This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps.  PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw.  Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-12-10 23:12:39 +00:00
bz
604d89458a Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
zec
7ecd715d48 Unhide declarations of network stack virtualization structs from
underneath #ifdef VIMAGE blocks.

This change introduces some churn in #include ordering and nesting
throughout the network stack and drivers but is not expected to cause
any additional issues.

In the next step this will allow us to instantiate the virtualization
container structures and switch from using global variables to their
"containerized" counterparts.

Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-28 23:30:51 +00:00
bz
9ef49d8b6f Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy.
Ignoring different names because of macros (in6pcb, in6p_sp) and
inp vs. in6p variable name both functions were entirely identical.

Reviewed by:	rwatson (as part of a larger changeset)
MFC after:	6 weeks (*)
(*) possibly need to leave a stub wrappers in 7 to keep the symbols.
2008-11-27 10:43:08 +00:00
zec
95a15f5c84 Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by:  bz, julian
Approved by:  julian (mentor)
Obtained from:        //depot/projects/vimage-commit2/...
X-MFC after:  never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-11-26 22:32:07 +00:00
bz
5de6e674b3 Unbreak the build without INET6. 2008-11-25 09:49:05 +00:00
zec
815d52c5df Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions.  As a rule,
initialization at instatiation for such variables should never be
introduced again from now on.  Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact.  In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-19 09:39:34 +00:00
des
66f807ed8b Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
zec
8797d4caec Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
bz
1021d43b56 Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
vanhu
72791f9bc1 Increase statistic counters for enc0 interface when enabled
and processing IPSec traffic.

Approved by:	gnn (mentor)
MFC after:	1 week
2008-08-12 09:05:01 +00:00
vanhu
3a946f98dc Add lifetime informations to generated SPD entries when SPDDUMP
Approved by: gnn (mentor)
MFC after:	4 weeks
2008-08-05 15:36:50 +00:00
trhodes
56ab14a8ae Fill in a few sysctl descriptions.
Approved by:	rwatson
2008-07-26 00:55:35 +00:00
trhodes
b3b4a48308 Document a few sysctls. While here, remove dead code
related to ip4_esp_randpad.

Reviewed by:	gnn, bz (older version)
Approved by:	gnn
Tested with:	make universe
2008-07-20 17:51:58 +00:00
rwatson
754034c5cf Remove unused support for local and foreign addresses in generic raw
socket support.  These utility routines are used only for routing and
pfkey sockets, neither of which have a notion of address, so were
required to mock up fake socket addresses to avoid connection
requirements for applications that did not specify their own fake
addresses (most of them).

Quite a bit of the removed code is #ifdef notdef, since raw sockets
don't support bind() or connect() in practice.  Removing this
simplifies the raw socket implementation, and removes two (commented
out) uses of dtom(9).

Fake addresses passed to sendto(2) by applications are ignored for
compatibility reasons, but this is now done in a more consistent way
(and with a comment).  Possibly, EINVAL could be returned here in
the future if it is determined that no applications depend on the
semantic inconsistency of specifying a destination address for a
protocol without address support, but this will require some amount
of careful surveying.

NB: This does not affect netinet, netinet6, or other wire protocol
raw sockets, which provide their own independent infrastructure with
control block address support specific to the protocol.

MFC after:	3 weeks
Reviewed by:	bz
2008-07-09 15:48:16 +00:00
julian
4dcc97b12c Enter the 1990s. Use real function declaration. 2008-06-29 00:49:50 +00:00
bz
db8afa9bc3 In addition to the ipsec_osdep.h removal a week ago, now also eliminate
IPSEC_SPLASSERT_SOFTNET which has been 'unused' since FreeBSD 5.0.
2008-05-24 15:32:46 +00:00
gnn
5e9c239f57 Remove last bits of OS adaptation code from the IPSec code.
Reviewed By: bz
2008-05-17 04:00:11 +00:00
bz
e1cf25141c Fix a bug that when getting/dumping the soft lifetime we reported
the hard lifetime instead.

MFC after:	3 days
2008-03-24 15:01:20 +00:00
bz
42fbad307b Import change from KAME, rev. 1.362 kame/kame/sys/netkey/key.c
In case of "new SA", we must check the hard lifetime of the old SA
to find out if it is not permanent and we can delete it.

Submitted by:	sakane via gnn
MFC after:	3 days
2008-03-24 14:55:09 +00:00
bz
418e4a564c Add ';' missed with the SYSINIT changes.
Not noticed by tb as TCP_SIGNATURE is not in LINT.

MFC after:	1 month
2008-03-21 18:31:42 +00:00
rwatson
877d7c65ba In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
bz
33dfb1706b Correct IPsec behaviour with a 'use' level in SP but no SA available.
In that case return an continue processing the packet without IPsec.

PR:		121384
MFC after:	5 days
Reported by:	Cyrus Rahman (crahman gmail.com)
Tested by:	Cyrus Rahman (crahman gmail.com) [slightly older version]
2008-03-14 16:38:11 +00:00
bz
ee90b5b6c8 Remove the "Fast " from the
"Fast IPsec: Initialized Security Association Processing." printf.
People kept asking questions about this after the IPsec shuffle.

This still is the Fast IPsec implementation so no worries that it would
be any slower now. There are no functional changes.

Discussed with:	sam
MFC after:	4 days
2008-03-14 16:25:40 +00:00
bz
767a2621f0 Fix bugs when allocating and passing information of current lifetime and
soft lifetime [1] introduced in rev. 1.21 of key.c.

Along with that, fix a related problem in key_debug
printing the correct data.
While there replace a printf by panic in a sanity check.

PR:		120751
Submitted by:	Kazuaki ODA (kazuaki aliceblue.jp) [1]
MFC after:	5 days
2008-03-02 17:12:28 +00:00
bz
cfb85f0c07 Rather than passing around a cached 'priv', pass in an ucred to
ipsec*_set_policy and do the privilege check only if needed.

Try to assimilate both ip*_ctloutput code blocks calling ipsec*_set_policy.

Reviewed by:	rwatson
2008-02-02 14:11:31 +00:00
bz
05fda2a0bf Add sysctls to if_enc(4) to control whether the firewalls or
bpf will see inner and outer headers or just inner or outer
headers for incoming and outgoing IPsec packets.

This is useful in bpf to not have over long lines for debugging
or selcting packets based on the inner headers.
It also properly defines the behavior of what the firewalls see.

Last but not least it gives you if_enc(4) for IPv6 as well.

[ As some auxiliary state was not available in the later
  input path we save it in the tdbi. That way tcpdump can give a
  consistent view of either of (authentic,confidential) for both
  before and after states. ]

Discussed with:	thompsa (2007-04-25, basic idea of unifying paths)
Reviewed by:	thompsa, gnn
2007-11-28 22:33:53 +00:00
bz
0e9e73cbd0 Adjust a comment that suggest that we might consider a panic.
Make clear that this is not a good idea when called from
tcp_output()->ipsec_hdrsiz_tcp()->ipsec4_hdrsize_tcp()
as we do not know if IPsec processing is needed at that point.
2007-11-28 21:48:21 +00:00
bz
a7318bd80c Move the priv check before the malloc call for so_pcb.
In case attach fails because of the priv check we leaked the
memory and left so_pcb as fodder for invariants.

Reported  by:	Pawel Worach
Reviewed by:	rwatson
2007-11-16 22:35:33 +00:00
bz
5c6a60df9f Add a missing priv check in key_attach to prevent non-su users
from messing with the spdb and sadb.

Problem sneaked in with the fast_ipsec+v6->ipsec merger by no
longer going via raw_usrreqs.pr_attach.

Reported by:	Pawel Worach
Identified by:	rwatson
Reviewed by:	rwatson
MFC after:	3 days
2007-11-12 23:47:48 +00:00