Commit Graph

117 Commits

Author SHA1 Message Date
Jonathan T. Looney
5d6e356cb0 Avoid calling protocol drain routines more than once per reclamation event.
mb_reclaim() calls the protocol drain routines for each protocol in each
domain. Some protocols exist in more than one domain and share drain
routines. In the case of SCTP, it also uses the same drain routine for
its SOCK_SEQPACKET and SOCK_STREAM entries in the same domain.

On systems with INET, INET6, and SCTP all defined, mb_reclaim() calls
sctp_drain() four times. On systems with INET and INET6 defined,
mb_reclaim() calls tcp_drain() twice. mb_reclaim() is the only in-tree
caller of the pr_drain protocol entry.

Eliminate this duplication by ensuring that each pr_drain routine is only
specified for one protocol entry in one domain.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24418
2020-04-16 20:17:24 +00:00
Alexander V. Chernikov
4684d3cbcb Remove per-AF radix_mpath initializtion functions.
Split their functionality by moving random seed allocation
 to SYSINIT and calling (new) generic multipath function from
 standard IPv4/IPv5 RIB init handlers.

Differential Revision:	https://reviews.freebsd.org/D24356
2020-04-11 07:37:08 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Alexander V. Chernikov
34a5582c47 Bring back redirect route expiration.
Redirect (and temporal) route expiration was broken a while ago.
This change brings route expiration back, with unified IPv4/IPv6 handling code.

It introduces net.inet.icmp.redirtimeout sysctl, allowing to set
 an expiration time for redirected routes. It defaults to 10 minutes,
 analogues with net.inet6.icmp6.redirtimeout.

Implementation uses separate file, route_temporal.c, as route.c is already
 bloated with tons of different functions.
Internally, expiration is implemented as an per-rnh callout scheduled when
 route with non-zero rt_expire time is added or rt_expire is changed.
 It does not add any overhead when no temporal routes are present.

Callout traverses entire routing tree under wlock, scheduling expired routes
 for deletion and calculating the next time it needs to be run. The rationale
 for such implemention is the following: typically workloads requiring large
 amount of routes have redirects turned off already, while the systems with
 small amount of routes will not inhibit large overhead during tree traversal.

This changes also fixes netstat -rn display of route expiration time, which
 has been broken since the conversion from kread() to sysctl.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23075
2020-01-22 13:53:18 +00:00
Alexander V. Chernikov
ead85fe415 Add fibnum, family and vnet pointer to each rib head.
Having metadata such as fibnum or vnet in the struct rib_head
 is handy as it eases building functionality in the routing space.
This change is required to properly bring back route redirect support.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23047
2020-01-09 17:21:00 +00:00
Bjoern A. Zeeb
757cb678e5 frag6.c: move variables and sysctls into local file
Move the sysctls and the related variables only used in frag6.c
into the file and out of in6_proto.c.  That way everything belonging
together is in one place.

Sort the variables into global and per-vnet scopes and make
them static.  No longer export the (helper) function
frag6_set_bucketsize() now also file-local only.

Should be no functional changes, only reduced public KPI/KBI surface.

MFC after:		3 months
Sponsored by:		Netflix
2019-08-02 10:29:53 +00:00
Bjoern A. Zeeb
21231a7aa6 Update for IETF draft-ietf-6man-ipv6only-flag.
All changes are hidden behind the EXPERIMENTAL option and are not compiled
in by default.

Add ND6_IFF_IPV6_ONLY_MANUAL to be able to set the interface into no-IPv4-mode
manually without router advertisement options.  This will allow developers to
test software for the appropriate behaviour even on dual-stack networks or
IPv6-Only networks without the option being set in RA messages.
Update ifconfig to allow setting and displaying the flag.

Update the checks for the filters to check for either the automatic or the manual
flag to be set.  Add REVARP to the list of filtered IPv4-related protocols and add
an input filter similar to the output filter.

Add a check, when receiving the IPv6-Only RA flag to see if the receiving
interface has any IPv4 configured.  If it does, ignore the IPv6-Only flag.

Add a per-VNET global sysctl, which is on by default, to not process the automatic
RA IPv6-Only flag.  This way an administrator (if this is compiled in) has control
over the behaviour in case the node still relies on IPv4.
2019-03-06 23:31:42 +00:00
Jonathan T. Looney
1e9f3b734e Implement a limit on on the number of IPv6 reassembly queues per bucket.
There is a hashing algorithm which should distribute IPv6 reassembly
queues across the available buckets in a relatively even way. However,
if there is a flaw in the hashing algorithm which allows a large number
of IPv6 fragment reassembly queues to end up in a single bucket, a per-
bucket limit could help mitigate the performance impact of this flaw.

Implement such a limit, with a default of twice the maximum number of
reassembly queues divided by the number of buckets. Recalculate the
limit any time the maximum number of reassembly queues changes.
However, allow the user to override the value using a sysctl
(net.inet6.ip6.maxfragbucketsize).

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:27:41 +00:00
Jonathan T. Looney
03c99d7662 Add a limit of the number of fragments per IPv6 packet.
The IPv4 fragment reassembly code supports a limit on the number of
fragments per packet. The default limit is currently 17 fragments.
Among other things, this limit serves to limit the number of fragments
the code must parse when trying to reassembly a packet.

Add a limit to the IPv6 reassembly code. By default, limit a packet
to 65 fragments (64 on the queue, plus one final fragment to complete
the packet). This allows an average fragment size of 1,008 bytes, which
should be sufficient to hold a fragment. (Recall that the IPv6 minimum
MTU is 1280 bytes. Therefore, this configuration allows a full-size
IPv6 packet to be fragmented on a link with the minimum MTU and still
carry approximately 272 bytes of headers before the fragmented portion
of the packet.)

Users can adjust this limit using the net.inet6.ip6.maxfragsperpacket
sysctl.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:26:07 +00:00
Jonathan T. Looney
2adfd64f35 Make the IPv6 fragment limits be global, rather than per-VNET, limits.
The IPv6 reassembly fragment limit is based on the number of mbuf clusters,
which are a global resource. However, the limit is currently applied
on a per-VNET basis. Given enough VNETs (or given sufficient customization
on enough VNETs), it is possible that the sum of all the VNET fragment
limits will exceed the number of mbuf clusters available in the system.

Given the fact that the fragment limits are intended (at least in part) to
regulate access to a global resource, the IPv6 fragment limit should
be applied on a global basis.

Note that it is still possible to disable fragmentation for a particular
VNET by setting the net.inet6.ip6.maxfragpackets sysctl to 0 for that
VNET. In addition, it is now possible to disable fragmentation globally
by setting the net.inet6.ip6.maxfrags sysctl to 0.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:24:26 +00:00
Michael Tuexen
888973f5ae Allow implicit TCP connection setup for TCP/IPv6.
TCP/IPv4 allows an implicit connection setup using sendto(), which
is used for TTCP and TCP fast open. This patch adds support for
TCP/IPv6.
While there, improve some tests for detecting multicast addresses,
which are mapped.

Reviewed by:		bz@, kbowling@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16458
2018-07-30 21:27:26 +00:00
Andrey V. Elsukov
12e7376216 Remove empty encap_init() function.
MFC after:	2 weeks
2018-05-29 12:32:08 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Andrey V. Elsukov
627c036f65 Remove IPsec related PCB code from SCTP.
The inpcb structure has inp_sp pointer that is initialized by
ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec
security policies associated with a specific socket.
An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket
options to configure these security policies. Then ip[6]_output()
uses inpcb pointer to specify that an outgoing packet is associated
with some socket. And IPSEC_OUTPUT() method can use a security policy
stored in the inp_sp. For inbound packet the protocol-specific input
routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms
to inbound security policy configured in the inpcb.

SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends
packets. Thus IPSEC_OUTPUT() method does not consider such packets as
associated with some socket and can not apply security policies
from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY()
method is called from protocol-specific input routine, it can specify
inpcb pointer and associated with socket inbound policy will be
checked. But there are two problems:
1. Such check is asymmetric, becasue we can not apply security policy
from inpcb for outgoing packet.
2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and
access to inp_sp is protected. But for SCTP this is not correct,
becasue SCTP uses own locks to protect inpcb.

To fix these problems remove IPsec related PCB code from SCTP.
This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options
will be not applicable to SCTP sockets. To be able correctly check
inbound security policies for SCTP, mark its protocol header with
the PR_LASTHDR flag.

Reported by:	tuexen
Reviewed by:	tuexen
Differential Revision:	https://reviews.freebsd.org/D9538
2017-02-13 11:37:52 +00:00
Andrey V. Elsukov
fcf596178b Merge projects/ipsec into head/.
Small summary
 -------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
  option IPSEC_SUPPORT added. It enables support for loading
  and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
  default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
  support was removed. Added TCP/UDP checksum handling for
  inbound packets that were decapsulated by transport mode SAs.
  setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
  build as part of ipsec.ko module (or with IPSEC kernel).
  It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
  methods. The only one header file <netipsec/ipsec_support.h>
  should be included to declare all the needed things to work
  with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
  Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
  - now all security associations stored in the single SPI namespace,
    and all SAs MUST have unique SPI.
  - several hash tables added to speed up lookups in SADB.
  - SADB now uses rmlock to protect access, and concurrent threads
    can do SA lookups in the same time.
  - many PF_KEY message handlers were reworked to reflect changes
    in SADB.
  - SADB_UPDATE message was extended to support new PF_KEY headers:
    SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
    can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
  avoid locking protection for ipsecrequest. Now we support
  only limited number (4) of bundled SAs, but they are supported
  for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
  used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
  check for full history of applied IPsec transforms.
o References counting rules for security policies and security
  associations were changed. The proper SA locking added into xform
  code.
o xform code was also changed. Now it is possible to unregister xforms.
  tdb_xxx structures were changed and renamed to reflect changes in
  SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by:	gnn, wblock
Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9352
2017-02-06 08:49:57 +00:00
Mark Johnston
762d16d9e4 Improve some of the sysctl descriptions added in r299827.
Submitted by:	Marie Helene Kvello-Aune <marieheleneka@gmail.com>
		(original version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D5336
2017-01-16 19:35:19 +00:00
Bjoern A. Zeeb
3f58662dd9 The pr_destroy field does not allow us to run the teardown code in a
specific order.  VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.

Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.

Reviewed by:		gnn, tuexen (sctp), jhb (kernel.h)
Obtained from:		projects/vnet
MFC after:		2 weeks
X-MFC:			do not remove pr_destroy
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6652
2016-06-01 10:14:04 +00:00
Michael Tuexen
3d48d25be7 Add PR_CONNREQUIRED for SOCK_STREAM sockets using SCTP.
This is required to signal connetion setup on non-blocking sockets
via becoming writable. This still allows for implicit connection
setup.

MFC after:	1 week
2016-05-30 18:24:23 +00:00
Mark Johnston
82366c228b Add sysctl descriptions for net.inet6.ip6 and net.inet6.icmp6.
icmp6.redirtimeout, icmp6.nd6_maxnudhint and ip6.rr_prune are left
undocumented as they appear to have no effect. Some existing sysctl
descriptions were modified for consistency and style, and the
ip6.tempvltime and ip6.temppltime handlers were rewritten to be a bit
simpler and to avoid setting the sysctl value before validating it.

MFC after:	3 weeks
2016-05-15 03:18:03 +00:00
Pedro F. Giffuni
63b6b7a74a Indentation issues.
Contract some lines leftover from r298310.

Mea culpa.
2016-04-20 16:19:44 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Gleb Smirnoff
8ec07310fa These files were getting sys/malloc.h and vm/uma.h with header pollution
via sys/mbuf.h
2016-02-01 17:41:21 +00:00
Alexander V. Chernikov
603eaf792b Renove faith(4) and faithd(8) from base. It looks like industry
have chosen different (and more traditional) stateless/statuful
NAT64 as translation mechanism. Last non-trivial commits to both
faith(4) and faithd(8) happened more than 12 years ago, so I assume
it is time to drop RFC3142 in FreeBSD.

No objections from:	net@
2014-11-09 21:33:01 +00:00
Andrey V. Elsukov
f325335caf Overhaul if_gre(4).
Split it into two modules: if_gre(4) for GRE encapsulation and
if_me(4) for minimal encapsulation within IP.

gre(4) changes:
* convert to if_transmit;
* rework locking: protect access to softc with rmlock,
  protect from concurrent ioctls with sx lock;
* correct interface accounting for outgoing datagramms (count only payload size);
* implement generic support for using IPv6 as delivery header;
* make implementation conform to the RFC 2784 and partially to RFC 2890;
* add support for GRE checksums - calculate for outgoing datagramms and check
  for inconming datagramms;
* add support for sending sequence number in GRE header;
* remove support of cached routes. This fixes problem, when gre(4) doesn't
  work at system startup. But this also removes support for having tunnels with
  the same addresses for inner and outer header.
* deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD.
  Use our standard ioctls for tunnels.

me(4):
* implementation conform to RFC 2004;
* use if_transmit;
* use the same locking model as gre(4);

PR:		164475
Differential Revision:	D1023
No objections from:	net@
Relnotes:	yes
Sponsored by:	Yandex LLC
2014-11-07 19:13:19 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Gleb Smirnoff
428cf06b31 Remove VNET_SYSCTL_ARG(). The generic sysctl(9) code handles that.
Reviewed by:	ae
Sponsored by:	Nginx, Inc.
2014-11-07 08:58:05 +00:00
Alexander V. Chernikov
146a181f28 Finish r274118: remove useless fields from struct domain.
Sponsored by:	Yandex LLC
2014-11-06 14:39:04 +00:00
Alexander V. Chernikov
1a75e3b20f Make checks for rt_mtu generic:
Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
 In this case domain overrides radix _addroute callback (in[6]_addroute)
 and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
 In this case, there are no explicit per-domain calls, but one can
 override rte by setting ifa_rtrequest hook to domain handler
 (inet6 does this).

3) ifconfig ifaceX mtu YYYY
 In this case, we have no callbacks, but ip[6]_output performes runtime
 checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
 control plane, not in runtime part, and properly deal with increased
 interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
 if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
 However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
 for some cases, so we need an abstract way to know maximum MTU size
 for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
  user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
  use this functions on new non-inserted rte.

More changes will follow soon.

MFC after:	1 month
Sponsored by:	Yandex LLC
2014-11-06 13:13:09 +00:00
Kevin Lo
73d76e77b6 Change pr_output's prototype to avoid the need for explicit casts.
This is a follow up to r269699.

Phabric:	D564
Reviewed by:	jhb
2014-08-15 02:43:02 +00:00
Kevin Lo
8f5a8818f5 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
Kevin Lo
d1b18731d9 Minor style cleanups. 2014-04-07 01:55:53 +00:00
Kevin Lo
e06e816f67 Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks.
Tested with vlc and a test suite [1].

[1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz

Reviewed by:	jhb, glebius, adrian
2014-04-07 01:53:03 +00:00
Gleb Smirnoff
5d6d7e756b o Revamp API between flowtable and netinet, netinet6.
- ip_output() and ip_output6() simply call flowtable_lookup(),
    passing mbuf and address family. That's the only code under
    #ifdef FLOWTABLE in the protocols code now.
o Revamp statistics gathering and export.
  - Remove hand made pcpu stats, and utilize counter(9).
  - Snapshot of statistics is available via 'netstat -rs'.
  - All sysctls are moved into net.flowtable namespace, since
    spreading them over net.inet isn't correct.
o Properly separate at compile time INET and INET6 parts.
o General cleanup.
  - Remove chain of multiple flowtables. We simply have one for
    IPv4 and one for IPv6.
  - Flowtables are allocated in flowtable.c, symbols are static.
  - With proper argument to SYSINIT() we no longer need flowtable_ready.
  - Hash salt doesn't need to be per-VNET.
  - Removed rudimentary debugging, which use quite useless in dtrace era.

The runtime behavior of flowtable shouldn't be changed by this commit.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-02-07 15:18:23 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Andrey V. Elsukov
a786f67981 Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters. 2013-07-09 09:54:54 +00:00
Hiroki Sato
5df1b6b57e Use FF02:0:0:0:0:2:FF00::/104 prefix for IPv6 Node Information Group
Address.  Although KAME implementation used FF02:0:0:0:0:2::/96 based on
older versions of draft-ietf-ipngwg-icmp-name-lookup, it has been changed
in RFC 4620.

The kernel always joins the /104-prefixed address, and additionally does
/96-prefixed one only when net.inet6.icmp6.nodeinfo_oldmcprefix=1.
The default value of the sysctl is 1.

ping6(8) -N flag now uses /104-prefixed one.  When this flag is specified
twice, it uses /96-prefixed one instead.

Reviewed by:		ume
Based on work by:	Thomas Scheffler
PR:			conf/174957
MFC after:		2 weeks
2013-05-04 19:16:26 +00:00
Kevin Lo
dda95c6e59 Remove unused global variables.
Reviewed by:	ae, glebius
2013-03-22 01:40:17 +00:00
Maxim Konovalov
d96ea877a7 o Convert IPv6 read-only stats sysctls to the read-write ones.
o Teach netstat(1) -z to reset these stats sysctls.

PR:		bin/153206
Reviewed by:	glebuis
Sponsored by:	NGINX, Inc.
MFC after:	1 month
2011-12-19 05:50:34 +00:00
Hiroki Sato
049087a0f3 Add $ipv6_cpe_wanif to enable functionality required for IPv6 CPE
(r225485).  When setting an interface name to it, the following
configurations will be enabled:

 1. "no_radr" is set to all IPv6 interfaces automatically.

 2. "-no_radr accept_rtadv" will be set only for $ipv6_cpe_wanif.  This is
    done just before evaluating $ifconfig_IF_ipv6 in the rc.d scripts (this
    means you can manually supersede this configuration if necessary).

 3. The node will add RA-sending routers to the default router list
    even if net.inet6.ip6.forwarding=1.

This mode is added to conform to RFC 6204 (a router which connects
the end-user network to a service provider network).  To enable
packet forwarding, you still need to set ipv6_gateway_enable=YES.

Note that accepting router entries into the default router list when
packet forwarding capability and a routing daemon are enabled can
result in messing up the routing table.  To minimize such unexpected
behaviors, "no_radr" is set on all interfaces but $ipv6_cpe_wanif.

Approved by:	re (bz)
2011-09-13 00:06:11 +00:00
Michael Tuexen
78d9a31d3a The socket API only specifies SCTP for SOCK_SEQPACKET and
SOCK_STREAM, but not SOCK_DGRAM. So don't register it for
SOCK_DGRAM.
While there, fix some indentation.
2011-07-12 19:29:29 +00:00
Hiroki Sato
e7fa8d0ada - Accept Router Advertisement messages even when net.inet6.ip6.forwarding=1.
- A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR.
  This controls if accepting a route in an RA message as the default route.
  The default value for each interface can be set by net.inet6.ip6.no_radr.
  The system wide default value is 0.

- A new sysctl: net.inet6.ip6.norbit_raif.  This controls if setting R-bit in
  NA on RA accepting interfaces.  The default is 0 (R-bit is set based on
  net.inet6.ip6.forwarding).

Background:

 IPv6 host/router model suggests a router sends an RA and a host accepts it for
 router discovery.  Because of that, KAME implementation does not allow
 accepting RAs when net.inet6.ip6.forwarding=1.  Accepting RAs on a router can
 make the routing table confused since it can change the default router
 unintentionally.

 However, in practice there are cases where we cannot distinguish a host from
 a router clearly.  For example, a customer edge router often works as a host
 against the ISP, and as a router against the LAN at the same time.  Another
 example is a complex network configurations like an L2TP tunnel for IPv6
 connection to Internet over an Ethernet link with another native IPv6 subnet.
 In this case, the physical interface for the native IPv6 subnet works as a
 host, and the pseudo-interface for L2TP works as the default IP forwarding
 route.

Problem:

 Disabling processing RA messages when net.inet6.ip6.forwarding=1 and
 accepting them when net.inet6.ip6.forward=0 cause the following practical
 issues:

 - A router cannot perform SLAAC.  It becomes a problem if a box has
   multiple interfaces and you want to use SLAAC on some of them, for
   example.  A customer edge router for IPv6 Internet access service
   using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the
   physical interface for administration purpose; updating firmware
   and so on (link-local addresses can be used there, but GUAs by
   SLAAC are often used for scalability).

 - When a host has multiple IPv6 interfaces and it receives multiple RAs on
   them, controlling the default route is difficult.  Router preferences
   defined in RFC 4191 works only when the routers on the links are
   under your control.

Details of Implementation Changes:

 Router Advertisement messages will be accepted even when
 net.inet6.ip6.forwarding=1.  More precisely, the conditions are as
 follow:

 (ACCEPT_RTADV && !NO_RADR && !ip6.forwarding)
	=> Normal RA processing on that interface. (as IPv6 host)

 (ACCEPT_RTADV && (NO_RADR || ip6.forwarding))
	=> Accept RA but add the router to the defroute list with
	   rtlifetime=0 unconditionally.  This effectively prevents
	   from setting the received router address as the box's
	   default route.

 (!ACCEPT_RTADV)
	=> No RA processing on that interface.

 ACCEPT_RTADV and NO_RADR are per-interface knob.  In short, all interface
 are classified as "RA-accepting" or not.  An RA-accepting interface always
 processes RA messages regardless of ip6.forwarding.  The difference caused by
 NO_RADR or ip6.forwarding is whether the RA source address is considered as
 the default router or not.

 R-bit in NA on the RA accepting interfaces is set based on
 net.inet6.ip6.forwarding.  While RFC 6204 W-1 rule (for CPE case) suggests
 a router should disable the R-bit completely even when the box has
 net.inet6.ip6.forwarding=1, I believe there is no technical reason with
 doing so.  This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif
 (the default is 0).

Usage:

 # ifconfig fxp0 inet6 accept_rtadv
	=> accept RA on fxp0
 # ifconfig fxp0 inet6 accept_rtadv no_radr
	=> accept RA on fxp0 but ignore default route information in it.
 # sysctl net.inet6.ip6.norbit_no_radr=1
	=> R-bit in NAs on RA accepting interfaces will always be set to 0.
2011-06-06 02:14:23 +00:00
Bjoern A. Zeeb
8d5a3ca77b Add FEATURE() definitions for IPv4 and IPv6 so that we can use
feature_present(3) to dynamically decide whether to use one or the
other family.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	10 days
2011-05-25 00:34:25 +00:00
Bjoern A. Zeeb
1024547144 MFp4 CH=191760,191770:
Not compiling in and not initializing from inetsw from in_proto.c for
IPv6 only, we need to initialize upper layer protocols from inet6sw.
Make sure to not initialize them twice in a Dual-Stack
environment but only conditionally on no INET as we have done for
TCP for a long time.  Otherwise we would leak resources.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	3 days
2011-04-20 08:05:23 +00:00
Will Andrews
54bfbd5153 Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
Bjoern A. Zeeb
4f7495d32a MFp4 CH180235:
Add proto spacers to inet6sw like we have for legacy IP. This allows us
to dynamically pf_proto_register() for INET6 from modules, needed by
upcoming CARP changes and SeND.
MC and SCTP could make use of it as well in theory in the future after
upcoming VIMAGE vnet teardown work.

Discussed with:	will, anchie
MFC after:	10 days
2010-08-09 19:53:24 +00:00
Kip Macy
77931dd513 Add flowtable support to IPv6
Tested by: qingli@

Reviewed by:	qingli@
MFC after:	3 days
2010-05-09 20:32:00 +00:00
Bjoern A. Zeeb
82cea7e6f3 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
Bjoern A. Zeeb
4dcc55a363 Garbage collect references to the no longer implemented tcp_fasttimo().
Discussed with:	rwatson
MFC after:	5 days
2010-01-17 13:07:52 +00:00
Hiroki Sato
a283298ce3 Improve flexibility of receiving Router Advertisement and
automatic link-local address configuration:

- Convert a sysctl net.inet6.ip6.accept_rtadv to one for the
  default value of a per-IF flag ND6_IFF_ACCEPT_RTADV, not a
  global knob.  The default value of the sysctl is 0.

- Add a new per-IF flag ND6_IFF_AUTO_LINKLOCAL and convert a
  sysctl net.inet6.ip6.auto_linklocal to one for its default
  value.  The default value of the sysctl is 1.

- Make ND6_IFF_IFDISABLED more robust.  It can be used to disable
  IPv6 functionality of an interface now.

- Receiving RA is allowed if ip6_forwarding==0 *and*
  ND6_IFF_ACCEPT_RTADV is set on that interface.  The former
  condition will be revisited later to support a "host + router" box
  like IPv6 CPE router.  The current behavior is compatible with
  the older releases of FreeBSD.

- The ifconfig(8) now supports these ND6 flags as well as "nud",
  "prefer_source", and "disabled" in ndp(8).  The ndp(8) now
  supports "auto_linklocal".

Discussed with:	bz and jinmei
Reviewed by:	bz
MFC after:	3 days
2009-09-12 22:08:20 +00:00