Commit Graph

74 Commits

Author SHA1 Message Date
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Justin Hibbits
3d0d5b21c9 IfAPI: Explicitly include <net/if_private.h> in netstack
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header.  <net/if_var.h> will stop including the
header in the future.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
2023-01-31 15:02:16 -05:00
Gleb Smirnoff
e68b379244 tcp: embed inpcb into tcpcb
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.

The most import one is the inpcb itself.  With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa.  Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that.  However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort.  The new intotcpcb() macro is ready for such move.

The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.

One interesting side effect is that now TCP data is allocated from
SMR-protected zone.  Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.

Large part of the change was done with sed script:

s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g

Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.

Differential revision:	https://reviews.freebsd.org/D37127
2022-12-07 09:00:48 -08:00
Gleb Smirnoff
fcb3f813f3 netinet*: remove PRC_ constants and streamline ICMP processing
In the original design of the network stack from the protocol control
input method pr_ctlinput was used notify the protocols about two very
different kinds of events: internal system events and receival of an
ICMP messages from outside.  These events were coded with PRC_ codes.
Today these methods are removed from the protosw(9) and are isolated
to IPv4 and IPv6 stacks and are called only from icmp*_input().  The
PRC_ codes now just create a shim layer between ICMP codes and errors
or actions taken by protocols.

- Change ipproto_ctlinput_t to pass just pointer to ICMP header.  This
  allows protocols to not deduct it from the internal IP header.
- Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer.
  It has all the information needed to the protocols.  In the structure,
  change ip6c_finaldst fields to sockaddr_in6.  The reason is that
  icmp6_input() already has this address wrapped in sockaddr, and the
  protocols want this address as sockaddr.
- For UDP tunneling control input, as well as for IPSEC control input,
  change the prototypes to accept a transparent union of either ICMP
  header pointer or struct ip6ctlparam pointer.
- In icmp_input() and icmp6_input() do only validation of ICMP header and
  count bad packets.  The translation of ICMP codes to errors/actions is
  done by protocols.
- Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap,
  inet6ctlerrmap arrays.
- In protocol ctlinput methods either trust what icmp_errmap() recommend,
  or do our own logic based on the ICMP header.

Differential revision:	https://reviews.freebsd.org/D36731
2022-10-03 20:53:04 -07:00
Gleb Smirnoff
809fef2913 netipsec: move specific ipsecmethods declarations to ipsec_support.h
where struct ipsec_methods is defined.  Not a functional change.
Allows further modification of method prototypes without breaking
compilation of other ipsec compilation units.

Differential revision:	https://reviews.freebsd.org/D36730
2022-10-03 20:53:04 -07:00
Gleb Smirnoff
46ddeb6be8 netinet6: retire ip6protosw.h
The netinet/ipprotosw.h and netinet6/ip6protosw.h were KAME relics, with
the former removed in f0ffb944d2 in 2001 and the latter survived until
today.  It has been reduced down to only one useful declaration that
moves to ip6_var.h

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36726
2022-10-03 20:53:04 -07:00
Gleb Smirnoff
78b1fc05b2 protosw: separate pr_input and pr_ctlinput out of protosw
The protosw KPI historically has implemented two quite orthogonal
things: protocols that implement a certain kind of socket, and
protocols that are IPv4/IPv6 protocol.  These two things do not
make one-to-one correspondence. The pr_input and pr_ctlinput methods
were utilized only in IP protocols.  This strange duality required
IP protocols that doesn't have a socket to declare protosw, e.g.
carp(4).  On the other hand developers of socket protocols thought
that they need to define pr_input/pr_ctlinput always, which lead to
strange dead code, e.g. div_input() or sdp_ctlinput().

With this change pr_input and pr_ctlinput as part of protosw disappear
and IPv4/IPv6 get their private single level protocol switch table
ip_protox[] and ip6_protox[] respectively, pointing at array of
ipproto_input_t functions.  The pr_ctlinput that was used for
control input coming from the network (ICMP, ICMPv6) is now represented
by ip_ctlprotox[] and ip6_ctlprotox[].

ipproto_register() becomes the only official way to register in the
table.  Those protocols that were always static and unlikely anybody
is interested in making them loadable, are now registered by ip_init(),
ip6_init().  An IP protocol that considers itself unloadable shall
register itself within its own private SYSINIT().

Reviewed by:		tuexen, melifaro
Differential revision:	https://reviews.freebsd.org/D36157
2022-08-17 11:50:31 -07:00
Gleb Smirnoff
489482e276 ipsec: isolate knowledge about protocols that are last header
Retire PR_LASTHDR protosw flag.

Reviewed by:		ae
Differential revision:	https://reviews.freebsd.org/D36155
2022-08-17 08:24:28 -07:00
Kornel Dulęba
863871d369 ipsec: Improve validation of PMTU
Currently there is no upper bound on the PMTU value that is accepted.
Update hostcache only if the new pmtu is smaller than the current entry
and the link MTU.

Approved by:	mw(mentor)
Sponsored by:	Stormshield
Obtained from:	Semihalf
Differential Revision: https://reviews.freebsd.org/D35872
2022-07-27 16:12:34 +02:00
Mateusz Guzik
590d0715b3 ipsec: enter epoch before calling into ipsec_run_hhooks
pfil_run_hooks which eventually can get called asserts on it.

Reviewed by:	ae
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32007
2021-09-21 17:02:41 +00:00
Mark Johnston
10eb2a2bde ipsec: Validate the protocol identifier in ipsec4_ctlinput()
key_allocsa() expects to handle only IPSec protocols and has an
assertion to this effect.  However, ipsec4_ctlinput() has to handle
messages from ICMP unreachable packets and was not validating the
protocol number.  In practice such a packet would simply fail to match
any SADB entries and would thus be ignored.

Reported by:	syzbot+6a9ef6fcfadb9f3877fe@syzkaller.appspotmail.com
Reviewed by:	ae
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31890
2021-09-10 09:09:00 -04:00
Wojciech Macek
d9d59bb1af ipsec: Handle ICMP NEEDFRAG message.
It will be needed for upcoming PMTU implementation in ipsec.
    For now simply create/update an entry in tcp hostcache when needed.
    The code is based on https://people.freebsd.org/~ae/ipsec_transport_mode_ctlinput.diff

Authored by: 		Kornel Duleba <mindal@semihalf.com>
Differential revision: 	https://reviews.freebsd.org/D30992
Reviewed by: 		tuxen
Sponsored by: 		Stormshield
Obtained from:	 	Semihalf
2021-08-09 12:01:46 +02:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
John Baldwin
f82eb2a6f0 Enter and exit the network epoch for async IPsec callbacks.
When an IPsec packet has been encrypted or decrypted, the next step in
the packet's traversal through the network stack is invoked from a
crypto worker thread, not from the original calling thread.  These
threads need to enter the network epoch before passing packets down to
IP output routines or up to transport protocols.

Reviewed by:	ae
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25444
2020-06-25 23:57:30 +00:00
Andrey V. Elsukov
1a01e0e7ac Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook
from enc_hhook().

This should solve the problem when pf is used with if_enc(4) interface,
and outbound packet with existing PCB checked by pf, and this leads to
deadlock due to pf does its own PCB lookup and tries to take rlock when
wlock is already held.

Now we pass PCB pointer if it is known to the pfil hook, this helps to
avoid extra PCB lookup and thus rlock acquiring is not needed.
For inbound packets it is safe to pass NULL, because we do not held any
PCB locks yet.

PR:		220217
MFC after:	3 weeks
Sponsored by:	Yandex LLC
2017-07-31 11:04:35 +00:00
Andrey V. Elsukov
7f1f65918b Disable IPsec debugging code by default when IPSEC_DEBUG kernel option
is not specified.

Due to the long call chain IPsec code can produce the kernel stack
exhaustion on the i386 architecture. The debugging code usually is not
used, but it requires a lot of stack space to keep buffers for strings
formatting. This patch conditionally defines macros to disable building
of IPsec debugging code.

IPsec currently has two sysctl variables to configure debug output:
 * net.key.debug variable is used to enable debug output for PF_KEY
   protocol. Such debug messages are produced by KEYDBG() macro and
   usually they can be interesting for developers.
 * net.inet.ipsec.debug variable is used to enable debug output for
   DPRINTF() macro and ipseclog() function. DPRINTF() macro usually
   is used for development debugging. ipseclog() function is used for
   debugging by administrator.

The patch disables KEYDBG() and DPRINTF() macros, and formatting buffers
declarations when IPSEC_DEBUG is not present in kernel config. This reduces
stack requirement for up to several hundreds of bytes.
The net.inet.ipsec.debug variable still can be used to enable ipseclog()
messages by administrator.

PR:		219476
Reported by:	eugen
No objection from:	#network
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10869
2017-05-29 09:30:38 +00:00
Andrey V. Elsukov
5f7c516f21 Fix possible double releasing for SA reference.
There are two possible ways how crypto callback are called: directly from
caller and deffered from crypto thread.

For inbound packets the direct call chain is the following:
 IPSEC_INPUT() method -> ipsec_common_input() -> xform_input() ->
 -> crypto_dispatch() -> crypto_invoke() -> crypto_done() ->
 -> xform_input_cb() -> ipsec[46]_common_input_cb() -> netisr_queue().

The SA reference is held while crypto processing is not finished.
The error handling code wrongly expected that crypto callback always called
from the crypto thread context, and it did SA reference releasing in
xform_input_cb(). But when the crypto callback called directly, in case of
error (e.g. data authentification failed) the error handling in
ipsec_common_input() also did SA reference releasing.

To fix this, remove error handling from ipsec_common_input() and do it
in xform_input() before crypto_dispatch().

PR:		219356
MFC after:	10 days
2017-05-23 09:01:48 +00:00
Andrey V. Elsukov
fcf596178b Merge projects/ipsec into head/.
Small summary
 -------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
  option IPSEC_SUPPORT added. It enables support for loading
  and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
  default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
  support was removed. Added TCP/UDP checksum handling for
  inbound packets that were decapsulated by transport mode SAs.
  setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
  build as part of ipsec.ko module (or with IPSEC kernel).
  It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
  methods. The only one header file <netipsec/ipsec_support.h>
  should be included to declare all the needed things to work
  with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
  Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
  - now all security associations stored in the single SPI namespace,
    and all SAs MUST have unique SPI.
  - several hash tables added to speed up lookups in SADB.
  - SADB now uses rmlock to protect access, and concurrent threads
    can do SA lookups in the same time.
  - many PF_KEY message handlers were reworked to reflect changes
    in SADB.
  - SADB_UPDATE message was extended to support new PF_KEY headers:
    SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
    can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
  avoid locking protection for ipsecrequest. Now we support
  only limited number (4) of bundled SAs, but they are supported
  for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
  used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
  check for full history of applied IPsec transforms.
o References counting rules for security policies and security
  associations were changed. The proper SA locking added into xform
  code.
o xform code was also changed. Now it is possible to unregister xforms.
  tdb_xxx structures were changed and renamed to reflect changes in
  SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by:	gnn, wblock
Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9352
2017-02-06 08:49:57 +00:00
Andrey V. Elsukov
0c127808dd Remove redundant sanity checks from ipsec[46]_common_input_cb().
This check already has been done in the each protocol callback.
2016-08-31 11:51:52 +00:00
Andrey V. Elsukov
ef91a9765d Overhaul if_enc(4) and make it loadable in run-time.
Use hhook(9) framework to achieve ability of loading and unloading
if_enc(4) kernel module. INET and INET6 code on initialization registers
two helper hooks points in the kernel. if_enc(4) module uses these helper
hook points and registers its hooks. IPSEC code uses these hhook points
to call helper hooks implemented in if_enc(4).
2015-11-25 07:31:59 +00:00
Ermal Luçi
705f4d9c6a IPSEC, remove variable argument function its already due.
Differential Revision:		https://reviews.freebsd.org/D3080
Reviewed by:	gnn, ae
Approved by:	gnn(mentor)
2015-07-21 21:46:24 +00:00
Andrey V. Elsukov
574fde00be Since PFIL can change mbuf pointer, we should update pointers after
calling ipsec_filter().

Sponsored by:	Yandex LLC
2015-04-28 09:29:28 +00:00
Andrey V. Elsukov
962ac6c727 Change ipsec_address() and ipsec_logsastr() functions to take two
additional arguments - buffer and size of this buffer.

ipsec_address() is used to convert sockaddr structure to presentation
format. The IPv6 part of this function returns pointer to the on-stack
buffer and at the moment when it will be used by caller, it becames
invalid. IPv4 version uses 4 static buffers and returns pointer to
new buffer each time when it called. But anyway it is still possible
to get corrupted data when several threads will use this function.

ipsec_logsastr() is used to format string about SA entry. It also
uses static buffer and has the same problem with concurrent threads.

To fix these problems add the buffer pointer and size of this
buffer to arguments. Now each caller will pass buffer and its size
to these functions. Also convert all places where these functions
are used (except disabled code).

And now ipsec_address() uses inet_ntop() function from libkern.

PR:		185996
Differential Revision:	https://reviews.freebsd.org/D2321
Reviewed by:	gnn
Sponsored by:	Yandex LLC
2015-04-18 16:58:33 +00:00
Andrey V. Elsukov
1d3b268c04 Requeue mbuf via netisr when we use IPSec tunnel mode and IPv6.
ipsec6_common_input_cb() uses partial copy of ip6_input() to parse
headers. But this isn't correct, when we use tunnel mode IPSec.

When we stripped outer IPv6 header from the decrypted packet, it
can become IPv4 packet and should be handled by ip_input. Also when
we use tunnel mode IPSec with IPv6 traffic, we should pass decrypted
packet with inner IPv6 header to ip6_input, it will correctly handle
it and also can decide to forward it.

The "skip" variable points to offset where payload starts. In tunnel
mode we reset it to zero after stripping the outer header. So, when
it is zero, we should requeue mbuf via netisr.

Differential Revision:	https://reviews.freebsd.org/D2306
Reviewed by:	adrian, gnn
Sponsored by:	Yandex LLC
2015-04-18 16:51:24 +00:00
Andrey V. Elsukov
1ae800e7a6 Fix handling of scoped IPv6 addresses in IPSec code.
* in ipsec_encap() embed scope zone ids into link-local addresses
  in the new IPv6 header, this helps ip6_output() disambiguate the
  scope;
* teach key_ismyaddr6() use in6_localip(). in6_localip() is less
  strict than key_sockaddrcmp(). It doesn't compare all fileds of
  struct sockaddr_in6, but it is faster and it should be safe,
  because all SA's data was checked for correctness. Also, since
  IPv6 link-local addresses in the &V_in6_ifaddrhead are stored in
  kernel-internal form, we need to embed scope zone id from SA into
  the address before calling in6_localip.
* in ipsec_common_input() take scope zone id embedded in the address
  and use it to initialize sin6_scope_id, then use this sockaddr
  structure to lookup SA, because we keep addresses in the SADB without
  embedded scope zone id.

Differential Revision:	https://reviews.freebsd.org/D2304
Reviewed by:	gnn
Sponsored by:	Yandex LLC
2015-04-18 16:46:31 +00:00
Andrey V. Elsukov
f0514a8b8a Remove now unused mtag argument from ipsec*_common_input_cb.
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-12-11 17:14:49 +00:00
Andrey V. Elsukov
2d957916ef Remove route chaching support from ipsec code. It isn't used for some time.
* remove sa_route_union declaration and route_cache member from struct secashead;
* remove key_sa_routechange() call from ICMP and ICMPv6 code;
* simplify ip_ipsec_mtu();
* remove #include <net/route.h>;

Sponsored by:	Yandex LLC
2014-12-02 04:20:50 +00:00
Andrey V. Elsukov
612faae7a2 Strip IP header only when we act in tunnel mode.
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-11-13 10:48:59 +00:00
Andrey V. Elsukov
b6e1ad3a3a Pass mbuf to pfil processing before stripping outer IP header as it
is described in if_enc(4).

MFC after:	2 week
Sponsored by:	Yandex LLC
2014-11-07 12:05:20 +00:00
Andrey V. Elsukov
1f194d8ae1 When mode isn't explicitly specified (wildcard) and inner protocol isn't
IPv4 or IPv6, assume it is the transport mode.

Reported by:	jmg
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-11-06 20:23:57 +00:00
Andrey V. Elsukov
a28b277a9f Do not strip outer header when operating in transport mode.
Instead requeue mbuf back to IPv4 protocol handler. If there is one extra IP-IP
encapsulation, it will be handled with tunneling interface. And thus proper
interface will be exposed into mbuf's rcvif. Also, tcpdump that listens on tunneling
interface will see packets in both directions.

Sponsored by:	Yandex LLC
2014-10-02 02:00:21 +00:00
Gleb Smirnoff
6ff8af1ca5 Mechanically convert to if_inc_counter(). 2014-09-19 10:18:14 +00:00
Kevin Lo
8f5a8818f5 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
VANHULLEBUS Yvan
aaf2cfc0d6 Fixed IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels.
For IPv6-in-IPv4, you may need to do the following command
on the tunnel interface if it is configured as IPv4 only:
ifconfig <interface> inet6 -ifdisabled

Code logic inspired from NetBSD.

PR: kern/169438
Submitted by: emeric.poupon@netasq.com
Reviewed by: fabient, ae
Obtained from: NETASQ
2014-05-28 12:45:27 +00:00
Andrey V. Elsukov
00a689c438 Initialize prot variable.
PR:		177417
MFC after:	1 week
2013-11-11 13:19:55 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Andrey V. Elsukov
a04d64d875 Use corresponding macros to update statistics for AH, ESP, IPIP, IPCOMP,
PFKEY.

MFC after:	2 weeks
2013-06-20 11:44:16 +00:00
Andrey V. Elsukov
9cb8d207af Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.
MFC after:	1 week
2013-04-09 07:11:22 +00:00
Gleb Smirnoff
d2bffb140e - Fix one more miss from r241913.
- Add XXX comment about necessity of the entire block,
  that "fixes up" the IP header.
2012-10-23 08:22:01 +00:00
Gleb Smirnoff
d6d3f01e0a Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

 o Fine grained locking, thus much better performance.
 o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

  Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by:	Florian Smeets <flo freebsd.org>
Tested by:	Chekaluk Vitaly <artemrts ukr.net>
Tested by:	Ben Wilber <ben desync.com>
Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-09-08 06:41:54 +00:00
Bjoern A. Zeeb
e0bfbfce79 Update packet filter (pf) code to OpenBSD 4.5.
You need to update userland (world and ports) tools
to be in sync with the kernel.

Submitted by:	mlaier
Submitted by:	eri
2011-06-28 11:57:25 +00:00
Bjoern A. Zeeb
db178eb816 Make IPsec compile without INET adding appropriate #ifdef checks.
Unfold the IPSEC_COMMON_INPUT_CB() macro in xform_{ah,esp,ipcomp}.c
to not need three different versions depending on INET, INET6 or both.

Mark two places preparing for not yet supported functionality with IPv6.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-27 19:28:42 +00:00
Thomas Quinot
94294cada5 Fix typo in comment. 2010-10-25 16:11:37 +00:00
Bjoern A. Zeeb
3abaa08643 MFp4 @178283:
Improve IPsec flow distribution for better netisr parallelism.
Instead of using the pointer that would have the last bits masked in a %
statement in netisr_select_cpuid() to select the queue, use the SPI.

Reviewed by:	rwatson
MFC after:	4 weeks
2010-05-24 16:27:47 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
VANHULLEBUS Yvan
7b495c4494 Added support for NAT-Traversal (RFC 3948) in IPsec stack.
Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry
Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele
(julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense
team, and all people who used / tried the NAT-T patch for years and
reported bugs, patches, etc...

X-MFC: never

Reviewed by:	bz
Approved by:	gnn(mentor)
Obtained from:	NETASQ
2009-06-12 15:44:35 +00:00
Bjoern A. Zeeb
fc228fbf49 Properly hide IPv4 only variables and functions under #ifdef INET. 2009-06-10 19:25:46 +00:00
Robert Watson
d4b5cae49b Reimplement the netisr framework in order to support parallel netisr
threads:

- Support up to one netisr thread per CPU, each processings its own
  workstream, or set of per-protocol queues.  Threads may be bound
  to specific CPUs, or allowed to migrate, based on a global policy.

  In the future it would be desirable to support topology-centric
  policies, such as "one netisr per package".

- Allow each protocol to advertise an ordering policy, which can
  currently be one of:

  NETISR_POLICY_SOURCE: packets must maintain ordering with respect to
    an implicit or explicit source (such as an interface or socket).

  NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work,
    as well as allowing protocols to provide a flow generation function
    for mbufs without flow identifers (m2flow).  Falls back on
    NETISR_POLICY_SOURCE if now flow ID is available.

  NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for
    each packet handled by netisr (m2cpuid).

- Provide utility functions for querying the number of workstreams
  being used, as well as a mapping function from workstream to CPU ID,
  which protocols may use in work placement decisions.

- Add explicit interfaces to get and set per-protocol queue limits, and
  get and clear drop counters, which query data or apply changes across
  all workstreams.

- Add a more extensible netisr registration interface, in which
  protocols declare 'struct netisr_handler' structures for each
  registered NETISR_ type.  These include name, handler function,
  optional mbuf to flow ID function, optional mbuf to CPU ID function,
  queue limit, and ordering policy.  Padding is present to allow these
  to be expanded in the future.  If no queue limit is declared, then
  a default is used.

- Queue limits are now per-workstream, and raised from the previous
  IFQ_MAXLEN default of 50 to 256.

- All protocols are updated to use the new registration interface, and
  with the exception of netnatm, default queue limits.  Most protocols
  register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use
  NETISR_POLICY_FLOW, and will therefore take advantage of driver-
  generated flow IDs if present.

- Formalize a non-packet based interface between interface polling and
  the netisr, rather than having polling pretend to be two protocols.
  Provide two explicit hooks in the netisr worker for start and end
  events for runs: netisr_poll() and netisr_pollmore(), as well as a
  function, netisr_sched_poll(), to allow the polling code to schedule
  netisr execution.  DEVICE_POLLING still embeds single-netisr
  assumptions in its implementation, so for now if it is compiled into
  the kernel, a single and un-bound netisr thread is enforced
  regardless of tunable configuration.

In the default configuration, the new netisr implementation maintains
the same basic assumptions as the previous implementation: a single,
un-bound worker thread processes all deferred work, and direct dispatch
is enabled by default wherever possible.

Performance measurement shows a marginal performance improvement over
the old implementation due to the use of batched dequeue.

An rmlock is used to synchronize use and registration/unregistration
using the framework; currently, synchronized use is disabled
(replicating current netisr policy) due to a measurable 3%-6% hit in
ping-pong micro-benchmarking.  It will be enabled once further rmlock
optimization has taken place.  However, in practice, netisrs are
rarely registered or unregistered at runtime.

A new man page for netisr will follow, but since one doesn't currently
exist, it hasn't been updated.

This change is not appropriate for MFC, although the polling shutdown
handler should be merged to 7-STABLE.

Bump __FreeBSD_version.

Reviewed by:	bz
2009-06-01 10:41:38 +00:00
Bjoern A. Zeeb
4b79449e2f Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00