Commit Graph

62 Commits

Author SHA1 Message Date
John Baldwin
0f70218343 Make the warning intervals for deprecated crypto algorithms tunable.
New sysctl/tunables can now set the interval (in seconds) between
rate-limited crypto warnings.  The new sysctls are:
- kern.cryptodev_warn_interval for /dev/crypto
- net.inet.ipsec.crypto_warn_interval for IPsec
- kern.kgssapi_warn_interval for KGSSAPI

Reviewed by:	cem
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20555
2019-06-11 23:00:55 +00:00
John Baldwin
c2fd516f3a Add deprecation warnings for IPsec algorithms deprecated in RFC 8221.
All of these algorithms are either explicitly marked MUST NOT, or they
are implicitly MUST NOTs by virtue of not being included in IETF's
list of protocols at all despite having assignments from IANA.

Specifically, this adds warnings for the following ciphers:
- des-cbc
- blowfish-cbc
- cast128-cbc
- des-deriv
- des-32iv
- camellia-cbc

Warnings for the following authentication algorithms are also added:
- hmac-md5
- keyed-md5
- keyed-sha1
- hmac-ripemd160

Reviewed by:	cem, gnn
MFC after:	3 days
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20340
2019-05-23 22:06:57 +00:00
Conrad Meyer
1b0909d51a OpenCrypto: Convert sessions to opaque handles instead of integers
Track session objects in the framework, and pass handles between the
framework (OCF), consumers, and drivers.  Avoid redundancy and complexity in
individual drivers by allocating session memory in the framework and
providing it to drivers in ::newsession().

Session handles are no longer integers with information encoded in various
high bits.  Use of the CRYPTO_SESID2FOO() macros should be replaced with the
appropriate crypto_ses2foo() function on the opaque session handle.

Convert OCF drivers (in particular, cryptosoft, as well as myriad others) to
the opaque handle interface.  Discard existing session tracking as much as
possible (quick pass).  There may be additional code ripe for deletion.

Convert OCF consumers (ipsec, geom_eli, krb5, cryptodev) to handle-style
interface.  The conversion is largely mechnical.

The change is documented in crypto.9.

Inspired by
https://lists.freebsd.org/pipermail/freebsd-arch/2018-January/018835.html .

No objection from:	ae (ipsec portion)
Reported by:	jhb
2018-07-18 00:56:25 +00:00
Conrad Meyer
2e08e39ff5 OCF: Add a typedef for session identifiers
No functional change.

This should ease the transition from an integer session identifier model to
an opaque pointer model.
2018-07-13 23:46:07 +00:00
Conrad Meyer
ede2f7731d Correctly handle the padding for IPv6-AH, as specified by RFC4302
The RFC specifies that under IPv6 the complete AH header must be 64 bit
aligned, and under IPv4, 32 bit aligned. Prior to this change, we (along
with other BSDs and MacOS) had violated this requirement.

This makes it possible to set up IPv6-AH between Linux and BSD, and also
probably between Windows and BSD.

PR:		222684
Reported and tested by:	Jason Mader <jasonmader AT gmail.com>
Obtained from:	NetBSD xform_ah.c 1.105
		(b939fe2483972eb43d71bf990cfb7f26dece7839 NetBSD/src on GH)
		by Maxime Villard
MFC after:	35.2731 hours
Relnotes:	probably (breaks ipv6 compat with older FreeBSD/NetBSD/MacOS)
Sponsored by:	Dell EMC Isilon
2018-06-04 18:51:06 +00:00
John Baldwin
fd40ecf3d4 Set the proper vnet in IPsec callback functions.
When using hardware crypto engines, the callback functions used to handle
an IPsec packet after it has been encrypted or decrypted can be invoked
asynchronously from a worker thread that is not associated with a vnet.
Extend 'struct xform_data' to include a vnet pointer and save the current
vnet in this new member when queueing crypto requests in IPsec.  In the
IPsec callback routines, use the new member to set the current vnet while
processing the modified packet.

This fixes a panic when using hardware offload such as ccr(4) with IPsec
after VIMAGE was enabled in GENERIC.

Reported by:	Sony Arpita Das and Harsh Jain @ Chelsio
Reviewed by:	bz
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14763
2018-03-20 17:05:23 +00:00
Andrey V. Elsukov
6ca39da354 Check packet length to do not make out of bounds access. Also save ah_nxt
value to use it later, since ah pointer can become invalid.

Reported by:	Maxime Villard <max at m00nbsd dot net>
MFC after:	5 days
2018-02-19 11:14:38 +00:00
Andrey V. Elsukov
055679e67e Adopt revision 1.76 and 1.77 from NetBSD:
Fix a vulnerability in IPsec-IPv6-AH, that allows an attacker to remotely
  crash the kernel with a single packet.

  In this loop we need to increment 'ad' by two, because the length field
  of the option header does not count the size of the option header itself.

  If the length is zero, then 'count' is incremented by zero, and there's
  an infinite loop. Beyond that, this code was written with the assumption
  that since the IPv6 packet already went through the generic IPv6 option
  parser, several fields are guaranteed to be valid; but this assumption
  does not hold because of the missing '+2', and there's as a result a
  triggerable buffer overflow (write zeros after the end of the mbuf,
  potentially to the next mbuf in memory since it's a pool).

  Add the missing '+2', this place will be reinforced in separate commits.

Reported by:	Maxime Villard <maxv at NetBSD.org>
MFC after:	1 week
2018-01-24 19:48:25 +00:00
Andrey V. Elsukov
b12a7532e3 Merge revision 1.35 from NetBSD:
fix pointer/offset mistakes in handling of IPv4 options

Reported by:	Maxime Villard <maxv at NetBSD.org>
MFC after:	1 week
2018-01-24 19:06:44 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Fabien Thomas
39bbca6ffd crypto(9) is called from ipsec in CRYPTO_F_CBIFSYNC mode. This is working
fine when a lot of different flows to be ciphered/deciphered are involved.

However, when a software crypto driver is used, there are
situations where we could benefit from making crypto(9) multi threaded:
- a single flow is to be ciphered: only one thread is used to cipher it,
- a single ESP flow is to be deciphered: only one thread is used to
decipher it.

The idea here is to call crypto(9) using a new mode (CRYPTO_F_ASYNC) to
dispatch the crypto jobs on multiple threads, if the underlying crypto
driver is working in synchronous mode.

Another flag is added (CRYPTO_F_ASYNC_KEEPORDER) to make crypto(9)
dispatch the crypto jobs in the order they are received (an additional
queue/thread is used), so that the packets are reinjected in the network
using the same order they were posted.

A new sysctl net.inet.ipsec.async_crypto can be used to activate
this new behavior (disabled by default).

Submitted by:	Emeric Poupon <emeric.poupon@stormshield.eu>
Reviewed by:	ae, jmg, jhb
Differential Revision:    https://reviews.freebsd.org/D10680
Sponsored by:	Stormshield
2017-11-03 10:27:22 +00:00
Conrad Meyer
3693b18840 opencrypto: Loosen restriction on HMAC key sizes
Theoretically, HMACs do not actually have any limit on key sizes.
Transforms should compact input keys larger than the HMAC block size by
using the transform (hash) on the input key.

(Short input keys are padded out with zeros to the HMAC block size.)

Still, not all FreeBSD crypto drivers that provide HMAC functionality
handle longer-than-blocksize keys appropriately, so enforce a "maximum" key
length in the crypto API for auth_hashes that previously expressed a
requirement.  (The "maximum" is the size of a single HMAC block for the
given transform.)  Unconstrained auth_hashes are left as-is.

I believe the previous hardcoded sizes were committed in the original
import of opencrypto from OpenBSD and are due to specific protocol
details of IPSec.  Note that none of the previous sizes actually matched
the appropriate HMAC block size.

The previous hardcoded sizes made the SHA tests in cryptotest.py
useless for testing FreeBSD crypto drivers; none of the NIST-KAT example
inputs had keys sized to the previous expectations.

The following drivers were audited to check that they handled keys up to
the block size of the HMAC safely:

  Software HMAC:
    * padlock(4)
    * cesa
    * glxsb
    * safe(4)
    * ubsec(4)

  Hardware accelerated HMAC:
    * ccr(4)
    * hifn(4)
    * sec(4) (Only supports up to 64 byte keys despite claiming to
      support SHA2 HMACs, but validates input key sizes)
    * cryptocteon (MIPS)
    * nlmsec (MIPS)
    * rmisec (MIPS) (Amusingly, does not appear to use key material at
      all -- presumed broken)

Reviewed by:	jhb (previous version), rlibby (previous version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12437
2017-09-26 16:18:10 +00:00
Andrey V. Elsukov
7f1f65918b Disable IPsec debugging code by default when IPSEC_DEBUG kernel option
is not specified.

Due to the long call chain IPsec code can produce the kernel stack
exhaustion on the i386 architecture. The debugging code usually is not
used, but it requires a lot of stack space to keep buffers for strings
formatting. This patch conditionally defines macros to disable building
of IPsec debugging code.

IPsec currently has two sysctl variables to configure debug output:
 * net.key.debug variable is used to enable debug output for PF_KEY
   protocol. Such debug messages are produced by KEYDBG() macro and
   usually they can be interesting for developers.
 * net.inet.ipsec.debug variable is used to enable debug output for
   DPRINTF() macro and ipseclog() function. DPRINTF() macro usually
   is used for development debugging. ipseclog() function is used for
   debugging by administrator.

The patch disables KEYDBG() and DPRINTF() macros, and formatting buffers
declarations when IPSEC_DEBUG is not present in kernel config. This reduces
stack requirement for up to several hundreds of bytes.
The net.inet.ipsec.debug variable still can be used to enable ipseclog()
messages by administrator.

PR:		219476
Reported by:	eugen
No objection from:	#network
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10869
2017-05-29 09:30:38 +00:00
Andrey V. Elsukov
3aee70991d Fix possible double releasing for SA and SP references.
There are two possible ways how crypto callback are called: directly from
caller and deffered from crypto thread.

For outbound packets the direct call chain is the following:
 IPSEC_OUTPUT() method -> ipsec[46]_common_output() ->
 -> ipsec[46]_perform_request() -> xform_output() ->
 -> crypto_dispatch() -> crypto_invoke() -> crypto_done() ->
 -> xform_output_cb() -> ipsec_process_done() -> ip[6]_output().

The SA and SP references are held while crypto processing is not finished.
The error handling code wrongly expected that crypto callback always called
from the crypto thread context, and it did references releasing in
xform_output_cb(). But when the crypto callback called directly, in case of
error the error handling code in ipsec[46]_perform_request() also did
references releasing.

To fix this, remove error handling from ipsec[46]_perform_request() and do it
in xform_output() before crypto_dispatch().

MFC after:	10 days
2017-05-23 09:32:26 +00:00
Andrey V. Elsukov
5f7c516f21 Fix possible double releasing for SA reference.
There are two possible ways how crypto callback are called: directly from
caller and deffered from crypto thread.

For inbound packets the direct call chain is the following:
 IPSEC_INPUT() method -> ipsec_common_input() -> xform_input() ->
 -> crypto_dispatch() -> crypto_invoke() -> crypto_done() ->
 -> xform_input_cb() -> ipsec[46]_common_input_cb() -> netisr_queue().

The SA reference is held while crypto processing is not finished.
The error handling code wrongly expected that crypto callback always called
from the crypto thread context, and it did SA reference releasing in
xform_input_cb(). But when the crypto callback called directly, in case of
error (e.g. data authentification failed) the error handling in
ipsec_common_input() also did SA reference releasing.

To fix this, remove error handling from ipsec_common_input() and do it
in xform_input() before crypto_dispatch().

PR:		219356
MFC after:	10 days
2017-05-23 09:01:48 +00:00
Andrey V. Elsukov
fcf596178b Merge projects/ipsec into head/.
Small summary
 -------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
  option IPSEC_SUPPORT added. It enables support for loading
  and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
  default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
  support was removed. Added TCP/UDP checksum handling for
  inbound packets that were decapsulated by transport mode SAs.
  setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
  build as part of ipsec.ko module (or with IPSEC kernel).
  It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
  methods. The only one header file <netipsec/ipsec_support.h>
  should be included to declare all the needed things to work
  with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
  Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
  - now all security associations stored in the single SPI namespace,
    and all SAs MUST have unique SPI.
  - several hash tables added to speed up lookups in SADB.
  - SADB now uses rmlock to protect access, and concurrent threads
    can do SA lookups in the same time.
  - many PF_KEY message handlers were reworked to reflect changes
    in SADB.
  - SADB_UPDATE message was extended to support new PF_KEY headers:
    SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
    can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
  avoid locking protection for ipsecrequest. Now we support
  only limited number (4) of bundled SAs, but they are supported
  for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
  used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
  check for full history of applied IPsec transforms.
o References counting rules for security policies and security
  associations were changed. The proper SA locking added into xform
  code.
o xform code was also changed. Now it is possible to unregister xforms.
  tdb_xxx structures were changed and renamed to reflect changes in
  SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by:	gnn, wblock
Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9352
2017-02-06 08:49:57 +00:00
Fabien Thomas
bf4356266d IPsec RFC6479 support for replay window sizes up to 2^32 - 32 packets.
Since the previous algorithm, based on bit shifting, does not scale
with large replay windows, the algorithm used here is based on
RFC 6479: IPsec Anti-Replay Algorithm without Bit Shifting.
The replay window will be fast to be updated, but will cost as many bits
in RAM as its size.

The previous implementation did not provide a lock on the replay window,
which may lead to replay issues.

Reviewed by:	ae
Obtained from:	emeric.poupon@stormshield.eu
Sponsored by:	Stormshield
Differential Revision:	https://reviews.freebsd.org/D8468
2016-11-25 14:44:49 +00:00
Andrey V. Elsukov
f367798498 Take extra reference to security policy before calling crypto_dispatch().
Currently we perform crypto requests for IPSEC synchronous for most of
crypto providers (software, aesni) and only VIA padlock calls crypto
callback asynchronous. In synchronous mode it is possible, that security
policy will be removed during the processing crypto request. And crypto
callback will release the last reference to SP. Then upon return into
ipsec[46]_process_packet() IPSECREQUEST_UNLOCK() will be called to already
freed request. To prevent this we will take extra reference to SP.

PR:		201876
Sponsored by:	Yandex LLC
2015-09-30 08:16:33 +00:00
John-Mark Gurney
42e5fcbf2b these are comparing authenticators and need to be constant time...
This could be a side channel attack...  Now that we have a function
for this, use it...

jmgurney/ipsecgcm:	24d704cc and 7f37a14
2015-07-31 00:31:52 +00:00
John-Mark Gurney
a09a7146a7 RFC4868 section 2.3 requires that the output be half... This fixes
problems that was introduced in r285336...  I have verified that
HMAC-SHA2-256 both ah only and w/ AES-CBC interoperate w/ a NetBSD
6.1.5 vm...

Reviewed by:	gnn
2015-07-29 07:15:16 +00:00
George V. Neville-Neil
16de9ac1b5 Add support for AES modes to IPSec. These modes work both in software only
mode and with hardware support on systems that have AESNI instructions.

Differential Revision:	D2936
Reviewed by:	jmg, eri, cognet
Sponsored by:	Rubicon Communications (Netgate)
2015-07-09 18:16:35 +00:00
Andrey V. Elsukov
3d80e82d60 Fix possible use after free due to security policy deletion.
When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(),
we hold one reference to security policy and release it just after return
from this function. But IPSec processing can be deffered and when we release
reference to security policy after ipsec[46]_process_packet(), user can
delete this security policy from SPDB. And when IPSec processing will be
done, xform's callback function will do access to already freed memory.

To fix this move KEY_FREESP() into callback function. Now IPSec code will
release reference to SP after processing will be finished.

Differential Revision:	https://reviews.freebsd.org/D2324
No objections from:	#network
Sponsored by:	Yandex LLC
2015-04-27 00:55:56 +00:00
Andrey V. Elsukov
962ac6c727 Change ipsec_address() and ipsec_logsastr() functions to take two
additional arguments - buffer and size of this buffer.

ipsec_address() is used to convert sockaddr structure to presentation
format. The IPv6 part of this function returns pointer to the on-stack
buffer and at the moment when it will be used by caller, it becames
invalid. IPv4 version uses 4 static buffers and returns pointer to
new buffer each time when it called. But anyway it is still possible
to get corrupted data when several threads will use this function.

ipsec_logsastr() is used to format string about SA entry. It also
uses static buffer and has the same problem with concurrent threads.

To fix these problems add the buffer pointer and size of this
buffer to arguments. Now each caller will pass buffer and its size
to these functions. Also convert all places where these functions
are used (except disabled code).

And now ipsec_address() uses inet_ntop() function from libkern.

PR:		185996
Differential Revision:	https://reviews.freebsd.org/D2321
Reviewed by:	gnn
Sponsored by:	Yandex LLC
2015-04-18 16:58:33 +00:00
Andrey V. Elsukov
f0514a8b8a Remove now unused mtag argument from ipsec*_common_input_cb.
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-12-11 17:14:49 +00:00
Andrey V. Elsukov
08537f4526 Remove code related to PACKET_TAG_IPSEC_IN_CRYPTO_DONE mbuf tag.
It isn't used in FreeBSD.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-12-11 17:07:21 +00:00
Andrey V. Elsukov
2d957916ef Remove route chaching support from ipsec code. It isn't used for some time.
* remove sa_route_union declaration and route_cache member from struct secashead;
* remove key_sa_routechange() call from ICMP and ICMPv6 code;
* simplify ip_ipsec_mtu();
* remove #include <net/route.h>;

Sponsored by:	Yandex LLC
2014-12-02 04:20:50 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Gleb Smirnoff
eedc7fd9e8 Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 18:18:50 +00:00
Andrey V. Elsukov
db8c087944 Migrate structs ahstat, espstat, ipcompstat, ipipstat, pfkeystat,
ipsec4stat, ipsec6stat to PCPU counters.
2013-07-09 10:08:13 +00:00
Andrey V. Elsukov
a04d64d875 Use corresponding macros to update statistics for AH, ESP, IPIP, IPCOMP,
PFKEY.

MFC after:	2 weeks
2013-06-20 11:44:16 +00:00
Gleb Smirnoff
8ad458a471 Do not reduce ip_len by size of IP header in the ip_input()
before passing a packet to protocol input routines.
  For several protocols this mean that now protocol needs to
do subtraction itself, and for another half this means that
we do not need to add header length back to the packet.

  Make ip_stripoptions() to adjust ip_len, since now we enter
this function with a packet header whose ip_len does represent
length of entire packet, not payload only.
2012-10-23 08:33:13 +00:00
Gleb Smirnoff
20472bce45 Couple of changes missed from r241913, which converted
IPv4 stack to network byte order.
2012-10-22 22:42:28 +00:00
Pawel Jakub Dawidek
0e4fb1db44 Eliminate 'err' variable and just use existing 'error'. 2011-11-26 23:15:28 +00:00
Pawel Jakub Dawidek
0a95a08ecb Simplify code a bit. 2011-11-26 23:13:30 +00:00
Bjoern A. Zeeb
db178eb816 Make IPsec compile without INET adding appropriate #ifdef checks.
Unfold the IPSEC_COMMON_INPUT_CB() macro in xform_{ah,esp,ipcomp}.c
to not need three different versions depending on INET, INET6 or both.

Mark two places preparing for not yet supported functionality with IPv6.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-27 19:28:42 +00:00
Fabien Thomas
73171433c5 Optimisation in IPSEC(4):
- Remove contention on ISR during the crypto operation by using rwlock(9).
- Remove a second lookup of the SA in the callback.

Gain on 6 cores CPU with SHA1/AES128 can be up to 30%.

Reviewed by:	vanhu
MFC after:	1 month
2011-03-31 15:23:32 +00:00
Fabien Thomas
11d2f4df50 Fix two SA refcount:
- AH does not release the SA like in ESP/IPCOMP when handling EAGAIN
- ipsec_process_done incorrectly release the SA.

Reviewed by:	vanhu
MFC after:	1 week
2011-03-31 13:14:24 +00:00
VANHULLEBUS Yvan
442da28aeb Fixed IPsec's HMAC_SHA256-512 support to be RFC4868 compliant.
This will break interoperability with all older versions of
FreeBSD for those algorithms.

Reviewed by:	bz, gnn
Obtained from:	NETASQ
MFC after:	1w
2011-02-18 09:40:13 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Marko Zec
bfe1aba468 Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized.  In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw).  Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module.  The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on.  Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first.  The framework will panic if it detects any unresolved
dependencies before completing system initialization.  Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run.  In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container.  IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-11 05:58:58 +00:00
Marko Zec
1ed81b739e First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

        At this stage, the new per-vnet initializer functions are
	directly called from the existing global initialization code,
	which should in most cases result in compiler inlining those
	new functions, hence yielding a near-zero functional change.

        Modify the existing initializer functions which are invoked via
        protosw, like ip_init() et. al., to allow them to be invoked
	multiple times, i.e. per each vnet.  Global state, if any,
	is initialized only if such functions are called within the
	context of vnet0, which will be determined via the
	IS_DEFAULT_VNET(curvnet) check (currently always true).

        While here, V_irtualize a few remaining global UMA zones
        used by net/netinet/netipsec networking code.  While it is
        not yet clear to me or anybody else whether this is the right
        thing to do, at this stage this makes the code more readable,
        and makes it easier to track uncollected UMA-zone-backed
        objects on vnet removal.  In the long run, it's quite possible
        that some form of shared use of UMA zone pools among multiple
        vnets should be considered.

	Bump __FreeBSD_version due to changes in layout of structs
	vnet_ipfw, vnet_inet and vnet_net.

Approved by:	julian (mentor)
2009-04-06 22:29:41 +00:00
Marko Zec
44e33a0758 Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions.  As a rule,
initialization at instatiation for such variables should never be
introduced again from now on.  Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact.  In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-19 09:39:34 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Bjoern A. Zeeb
eaa9325f48 In addition to the ipsec_osdep.h removal a week ago, now also eliminate
IPSEC_SPLASSERT_SOFTNET which has been 'unused' since FreeBSD 5.0.
2008-05-24 15:32:46 +00:00
Robert Watson
0bf686c125 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00
George V. Neville-Neil
2cb64cb272 Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by:    bz
Approved by:    re
Supported by:   Secure Computing
2007-07-01 11:41:27 +00:00
Pawel Jakub Dawidek
80e35494cc - The authsize field from auth_hash structure was removed.
- Define that we want to receive only 96 bits of HMAC.
- Names of the structues have no longer _96 suffix.

Reviewed by:	sam
2006-05-17 18:30:28 +00:00