This mutex is a significant point of contention in the ipsec code, and
can be relatively trivially replaced by a read-mostly lock.
It does require a separate lock for the replay protection, which we do
here by adding a separate mutex.
This improves throughput (without replay protection) by 10-15%.
MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D35763
If an encapsulated frame is going to have DF bit set check its desitnitions'
PMTU and if it won't fit drop it and:
Generate ICMP 3/4 message if the packet was to be forwarded.
Return EMSGSIZE error otherwise.
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D30993
As RFC 4304 describes there is anti-replay algorithm responsibility
to provide appropriate value of Extended Sequence Number.
This patch introduces anti-replay algorithm with ESN support based on
RFC 4304, however to avoid performance regressions window implementation
was based on RFC 6479, which was already implemented in FreeBSD.
To keep things clean and improve code readability, implementation of window
is kept in seperate functions.
Submitted by: Grzegorz Jaszczyk <jaz@semihalf.com>
Patryk Duda <pdk@semihalf.com>
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D22367
Obtained from: Semihalf
Sponsored by: Stormshield
Examples of depecrated algorithms in manual pages and sample configs
are updated where relevant. I removed the one example of combining
ESP and AH (vs using a cipher and auth in ESP) as RFC 8221 says this
combination is NOT RECOMMENDED.
Specifically, this removes support for the following ciphers:
- des-cbc
- 3des-cbc
- blowfish-cbc
- cast128-cbc
- des-deriv
- des-32iv
- camellia-cbc
This also removes support for the following authentication algorithms:
- hmac-md5
- keyed-md5
- keyed-sha1
- hmac-ripemd160
Reviewed by: cem, gnn (older verisons)
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24342
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
New sysctl/tunables can now set the interval (in seconds) between
rate-limited crypto warnings. The new sysctls are:
- kern.cryptodev_warn_interval for /dev/crypto
- net.inet.ipsec.crypto_warn_interval for IPsec
- kern.kgssapi_warn_interval for KGSSAPI
Reviewed by: cem
MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20555
Track session objects in the framework, and pass handles between the
framework (OCF), consumers, and drivers. Avoid redundancy and complexity in
individual drivers by allocating session memory in the framework and
providing it to drivers in ::newsession().
Session handles are no longer integers with information encoded in various
high bits. Use of the CRYPTO_SESID2FOO() macros should be replaced with the
appropriate crypto_ses2foo() function on the opaque session handle.
Convert OCF drivers (in particular, cryptosoft, as well as myriad others) to
the opaque handle interface. Discard existing session tracking as much as
possible (quick pass). There may be additional code ripe for deletion.
Convert OCF consumers (ipsec, geom_eli, krb5, cryptodev) to handle-style
interface. The conversion is largely mechnical.
The change is documented in crypto.9.
Inspired by
https://lists.freebsd.org/pipermail/freebsd-arch/2018-January/018835.html .
No objection from: ae (ipsec portion)
Reported by: jhb
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
fine when a lot of different flows to be ciphered/deciphered are involved.
However, when a software crypto driver is used, there are
situations where we could benefit from making crypto(9) multi threaded:
- a single flow is to be ciphered: only one thread is used to cipher it,
- a single ESP flow is to be deciphered: only one thread is used to
decipher it.
The idea here is to call crypto(9) using a new mode (CRYPTO_F_ASYNC) to
dispatch the crypto jobs on multiple threads, if the underlying crypto
driver is working in synchronous mode.
Another flag is added (CRYPTO_F_ASYNC_KEEPORDER) to make crypto(9)
dispatch the crypto jobs in the order they are received (an additional
queue/thread is used), so that the packets are reinjected in the network
using the same order they were posted.
A new sysctl net.inet.ipsec.async_crypto can be used to activate
this new behavior (disabled by default).
Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu>
Reviewed by: ae, jmg, jhb
Differential Revision: https://reviews.freebsd.org/D10680
Sponsored by: Stormshield
When a security policy should match TCP connection with specific ports,
the SYN+ACK segment send by syncache_respond() is considered as forwarded
packet, because at this moment TCP connection does not have PCB structure,
and ip_output() is called without inpcb pointer. In this case SPIDX filled
for SP lookup will not contain TCP ports and security policy will not
be found. This can lead to unencrypted SYN+ACK on the wire.
This patch restores the old behavior, when ports will not be filled only
for forwarded packets.
Reported by: Dewayne Geraghty <dewayne.geraghty at heuristicsystems.com.au>
MFC after: 1 week
Small summary
-------------
o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
option IPSEC_SUPPORT added. It enables support for loading
and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
support was removed. Added TCP/UDP checksum handling for
inbound packets that were decapsulated by transport mode SAs.
setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
build as part of ipsec.ko module (or with IPSEC kernel).
It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
methods. The only one header file <netipsec/ipsec_support.h>
should be included to declare all the needed things to work
with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
- now all security associations stored in the single SPI namespace,
and all SAs MUST have unique SPI.
- several hash tables added to speed up lookups in SADB.
- SADB now uses rmlock to protect access, and concurrent threads
can do SA lookups in the same time.
- many PF_KEY message handlers were reworked to reflect changes
in SADB.
- SADB_UPDATE message was extended to support new PF_KEY headers:
SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
avoid locking protection for ipsecrequest. Now we support
only limited number (4) of bundled SAs, but they are supported
for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
check for full history of applied IPsec transforms.
o References counting rules for security policies and security
associations were changed. The proper SA locking added into xform
code.
o xform code was also changed. Now it is possible to unregister xforms.
tdb_xxx structures were changed and renamed to reflect changes in
SADB/SPDB, and changed rules for locking and refcounting.
Reviewed by: gnn, wblock
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D9352
This function is used only by ipsec_getpolicybysock() to fill security
policy index selector for locally generated packets (that have INPCB).
The function incorrectly assumes that spidx is the same for both directions.
Fix this by using new direction argument to specify correct INPCB security
policy - sp_in or sp_out. There is no need to fill both policy indeces,
because they are overwritten for each packet.
This fixes security policy matching for outbound packets when user has
specified TCP/UDP ports in the security policy upperspec.
PR: 213869
MFC after: 1 week
Since the previous algorithm, based on bit shifting, does not scale
with large replay windows, the algorithm used here is based on
RFC 6479: IPsec Anti-Replay Algorithm without Bit Shifting.
The replay window will be fast to be updated, but will cost as many bits
in RAM as its size.
The previous implementation did not provide a lock on the replay window,
which may lead to replay issues.
Reviewed by: ae
Obtained from: emeric.poupon@stormshield.eu
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D8468
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.
For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.
Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
Use hhook(9) framework to achieve ability of loading and unloading
if_enc(4) kernel module. INET and INET6 code on initialization registers
two helper hooks points in the kernel. if_enc(4) module uses these helper
hook points and registers its hooks. IPSEC code uses these hhook points
to call helper hooks implemented in if_enc(4).
degradation (7%) for host host TCP connections over 10Gbps links,
even when there were no secuirty policies in place. There is no
change in performance on 1Gbps network links. Testing GENERIC vs.
GENERIC-NOIPSEC vs. GENERIC with this change shows that the new
code removes any overhead introduced by having IPSEC always in the
kernel.
Differential Revision: D3993
MFC after: 1 month
Sponsored by: Rubicon Communications (Netgate)
When IPSEC is enabled on the kernel the forwarding path has an optimization to not enter the code paths
for checking security policies but first checks if there is any security policy active at all.
The patch introduces the same optimization but for traffic generated from the host itself.
This reduces the overhead by 50% on my tests for generated host traffic without and SP active.
Differential Revision: https://reviews.freebsd.org/D2980
Reviewed by: ae, gnn
Approved by: gnn(mentor)
additional arguments - buffer and size of this buffer.
ipsec_address() is used to convert sockaddr structure to presentation
format. The IPv6 part of this function returns pointer to the on-stack
buffer and at the moment when it will be used by caller, it becames
invalid. IPv4 version uses 4 static buffers and returns pointer to
new buffer each time when it called. But anyway it is still possible
to get corrupted data when several threads will use this function.
ipsec_logsastr() is used to format string about SA entry. It also
uses static buffer and has the same problem with concurrent threads.
To fix these problems add the buffer pointer and size of this
buffer to arguments. Now each caller will pass buffer and its size
to these functions. Also convert all places where these functions
are used (except disabled code).
And now ipsec_address() uses inet_ntop() function from libkern.
PR: 185996
Differential Revision: https://reviews.freebsd.org/D2321
Reviewed by: gnn
Sponsored by: Yandex LLC
IPv6. Initialize it only once in def_policy_init(). Remove its
initialization from key_init() and make it static.
Remove several fields from struct secpolicy:
* lock - it isn't so useful having mutex in the structure, but the only
thing we do with it is initialization and destroying.
* state - it has only two values - DEAD and ALIVE. Instead of take a lock
and change the state to DEAD, then take lock again in GC function and
delete policy from the chain - keep in the chain only ALIVE policies.
* scangen - it was used in GC function to protect from sending several
SADB_SPDEXPIRE messages for one SPD entry. Now we don't keep DEAD entries
in the chain and there is no need to have scangen variable.
Use TAILQ to implement SPD entries chain. Use rmlock to protect access
to SPD entries chain. Protect all SP lookup with RLOCK, and use WLOCK
when we are inserting (or removing) SP entry in the chain.
Instead of using pattern "LOCK(); refcnt++; UNLOCK();", use refcount(9)
API to implement refcounting in SPD. Merge code from key_delsp() and
_key_delsp() into _key_freesp(). And use KEY_FREESP() macro in all cases
when we want to release reference or just delete SP entry.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
ipsec_getpolicybyaddr()
ipsec4_checkpolicy()
ip_ipsec_output()
ip6_ipsec_output()
The only flag used here was IP_FORWARDING.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.
After this change a packet processed by the stack isn't
modified at all[2] except for TTL.
After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.
[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.
[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.
Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
feature_present(3) checks.
This will help to run-time detect and conditionally handle specific
optionas of either feature in user space (i.e. in libipsec).
Descriptions read by: rwatson
MFC after: 2 weeks
"Whitspace" churn after the VIMAGE/VNET whirls.
Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.
Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.
This also removes some header file pollution for putatively
static global variables.
Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.
Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.
Reviewed by: bz
Approved by: re (vimage blanket)
network stacks, VNET_SYSINIT:
- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
occur each time a network stack is instantiated and destroyed. In the
!VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
For the VIMAGE case, we instead use SYSINIT's to track their order and
properties on registration, using them for each vnet when created/
destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
previously, as well as its dependency scheme: we now just use the
SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
they want init functions to be called for each virtual network stack
rather than just once at boot, compiling down to DOMAIN_SET() in the
non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
of modinfo or DOMAIN_SET() for init/uninit events. In some cases,
convert modular components from using modevent to using sysinit (where
appropriate). In some cases, do minor rejuggling of SYSINIT ordering
to make room for or better manage events.
Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup)
Discussed with: jhb, bz, julian, zec
Reviewed by: bz
Approved by: re (VIMAGE blanket)
nor destructors, as there's no actual work to do.
In most cases, the constructors weren't needed because of the existing
protocol initialization functions run by net_init_domain() as part of
VNET_MOD_NET, or they were eliminated when support for static
initialization of virtualized globals was added.
Garbage collect dependency references to modules without constructors or
destructors, notably VNET_MOD_INET and VNET_MOD_INET6.
Reviewed by: bz
Approved by: re (vimage blanket)
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state. The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.
While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers. Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.
Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels. Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.
Bump __FreeBSD_version to 800097.
Reviewed by: bz, julian
Approved by: rwatson, kib (re), julian (mentor)