Commit Graph

1895 Commits

Author SHA1 Message Date
Matt Macy
feeef8509b Fix PCBGROUPS build post CK conversion of pcbinfo 2018-06-13 23:19:54 +00:00
Andrey V. Elsukov
a5185adeb6 Rework if_gre(4) to use encap_lookup_t method to speedup lookup
of needed interface when many gre interfaces are present.

Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations. Use hash table to
speedup lookup of needed softc.
2018-06-13 11:11:33 +00:00
Matt Macy
b872626dbe mechanical CK macro conversion of inpcbinfo lists
This is a dependency for converting the inpcbinfo hash and info rlocks
to epoch.
2018-06-12 22:18:20 +00:00
Sean Bruno
1a43cff92a Load balance sockets with new SO_REUSEPORT_LB option.
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).

This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967.  Thanks to rwatson@
for the substantive feedback that is included in this commit.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Obtained from:	DragonflyBSD
Relnotes:	Yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003
2018-06-06 15:45:57 +00:00
Andrey V. Elsukov
4a089e6bc5 Use m_copyback() function to write delayed checksum when it isn't located
in the first mbuf of the chain.

MFC after:	1 week
2018-06-06 10:46:24 +00:00
Andrey V. Elsukov
a41372abd4 Fix LINT-NOINET build.
Use known at build time size for min_length value. Also remove the
check from in6_gre_encapcheck(), now it is done in generic code.
2018-06-06 05:17:21 +00:00
Andrey V. Elsukov
b941bc1d6e Rework if_gif(4) to use new encap_lookup_t method to speedup lookup
of needed interface when many gif interfaces are present.

Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations.
Use hash table to speedup lookup of needed softc. Interfaces
with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST.
Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed
16 years ago, and actually it could work only for outbound direction.
Each protocol, that can be handled by if_gif(4) interface is registered
by separate encap handler, this helps avoid invoking the handler
for unrelated protocols (GRE, PIM, etc.).

This change allows dramatically improve performance when many gif(4)
interfaces are used.

Sponsored by:	Yandex LLC
2018-06-05 21:24:59 +00:00
Andrey V. Elsukov
73ae9758a1 Constify argument of in6_getscope(). 2018-06-05 20:54:29 +00:00
Andrey V. Elsukov
6d8fdfa9d5 Rework IP encapsulation handling code.
Currently it has several disadvantages:
- it uses single mutex to protect internal structures. It is used by
  data- and control- path, thus there are no parallelism at all.
- it uses single list to keep encap handlers for both INET and INET6
  families.
- struct encaptab keeps unneeded information (src, dst, masks, protosw),
  that isn't used by code in the source tree.
- matches are prioritized and when many tunneling interfaces are
  registered, encapcheck handler of each interface is invoked for each
  packet. The search takes O(n) for n interfaces. All this work is done
  with exclusive lock held.

What this patch includes:
- the datapath is converted to be lockless using epoch(9) KPI.
- struct encaptab now linked using CK_LIST.
- all unused fields removed from struct encaptab. Several new fields
  addedr: min_length is the minimum packet length, that encapsulation
  handler expects to see; exact_match is maximum number of bits, that
  can return an encapsulation handler, when it wants to consume a packet.
- IPv6 and IPv4 handlers are stored in separate lists;
- added new "encap_lookup_t" method, that will be used later. It is
  targeted to speedup lookup of needed interface, when gif(4)/gre(4) have
  many interfaces.
- the need to use protosw structure is eliminated. The only pr_input
  method was used from this structure, so I don't see the need to keep
  using it.
- encap_input_t method changed to avoid using mbuf tags to store softc
  pointer. Now it is passed directly trough encap_input_t method.
  encap_getarg() funtions is removed.
- all sockaddr structures and code that uses them removed. We don't have
  any code in the tree that uses them. All consumers use encap_attach_func()
  method, that relies on invoking of encapcheck() to determine the needed
  handler.
- introduced struct encap_config, it contains parameters of encap handler
  that is going to be registered by encap_attach() function.
- encap handlers are stored in lists ordered by exact_match value, thus
  handlers that need more bits to match will be checked first, and if
  encapcheck method returns exact_match value, the search will be stopped.
- all current consumers changed to use new KPI.

Reviewed by:	mmacy
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15617
2018-06-05 20:51:01 +00:00
Andrey V. Elsukov
12e7376216 Remove empty encap_init() function.
MFC after:	2 weeks
2018-05-29 12:32:08 +00:00
Matt Macy
0f8d79d977 CK: update consumers to use CK macros across the board
r334189 changed the fields to have names distinct from those in queue.h
in order to expose the oversights as compile time errors
2018-05-24 23:21:23 +00:00
Matt Macy
4f6c66cc9c UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
Ed Maste
3b9b6b1704 Pair CURVNET_SET and CURVNET_RESTORE in a block
Per vnet(9), CURVNET_SET and CURVNET_RESTORE cannot be used as a single
statement for a conditional and CURVNET_RESTORE must be in the same
block as CURVNET_SET (or a subblock).

Reviewed by:	andrew
Sponsored by:	The FreeBSD Foundation
2018-05-21 13:08:44 +00:00
Ed Maste
15f8acc53f Revert r333968, it broke all archs but i386 and amd64 2018-05-21 11:56:07 +00:00
Matt Macy
ed6bb714b2 in(6)_mcast: Expand out vnet set / restore macro so that they work in a conditional block
Reported by:	zec at fer.hr
2018-05-21 08:34:10 +00:00
Matt Macy
b15253bba0 make sure vnet is set when freeing
Reported by:	pho
2018-05-20 20:48:26 +00:00
Matt Macy
cb6bb2303e ip(6)_freemoptions: defer imo destruction to epoch callback task
Avoid the ugly unlock / lock of the inpcbinfo where we need to
figure out what kind of lock we hold by simply deferring the
operation to another context. (Also a small dependency for
converting the pcbinfo read lock to epoch)
2018-05-20 00:22:28 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Brooks Davis
9c082ac517 Unwrap some not-so-long lines now that extra tabs been removed. 2018-05-15 17:59:46 +00:00
Brooks Davis
04f0d3db4a Remove stray tabs in in6_lltable_dump_entry(). NFC. 2018-05-15 17:57:46 +00:00
Stephen Hurd
b69888c28f Fix LORs in in6?_leave_group()
r333175 updated the join_group functions, but not the leave_group ones.

Reviewed by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15393
2018-05-11 21:42:27 +00:00
Andrew Gallatin
66fe09d8d9 Fix a panic in the IPv6 multicast code.
Use LIST_FOREACH_SAFE in in6m_disconnect() since we're
deleting and freeing item from the membership list
while traversing the list.

Reviewed by:	mmacy
Sponsored by:	Netflix
2018-05-10 16:19:41 +00:00
Hans Petter Selasky
306cf294b2 Fix for missing network interface address event when adding the default IPv6
based link-local address.

The default link local address for IPv6 is added as part of bringing the
network interface up. Move the call to "EVENTHANDLER_INVOKE(ifaddr_event,)"
from the SIOCAIFADDR_IN6 ioctl(2) handler to in6_notify_ifa() which should
catch all the cases of adding IPv6 based addresses to a network interface.
Add a witness warning in case the event handler is not allowed to sleep.

Reviewed by:	network (ae), kib
Differential Revision:	https://reviews.freebsd.org/D13407
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-05-08 11:39:01 +00:00
Matt Macy
b6f6f88018 r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl
to sleep on commands to the NIC when updating multicast filters. More generally this permitted
driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a
a multicast update would still be queued for deletion when ifconfig deleted the interface
thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses
on the interface.

Synchronously remove all external references to a multicast address before enqueueing for delete.

Reported by:	lwhsu
Approved by:	sbruno
2018-05-06 20:34:13 +00:00
Matt Macy
28c001002a Currently in_pcbfree will unconditionally wunlock the pcbinfo lock
to avoid a LOR on the multicast list lock in the freemoptions routines.
As it turns out, tcp_usr_detach can acquire the tcbinfo lock readonly.
Trying to wunlock the pcbinfo lock in that context has caused a number
of reported crashes.

This change unclutters in_pcbfree and moves the handling of wunlock vs
runlock of pcbinfo to the freemoptions routine.

Reported by:	mjg@, bde@, o.hartmann at walstatt.org
Approved by:	sbruno
2018-05-05 22:40:40 +00:00
Michael Tuexen
f9656ee690 Send an ICMPv6 PacketTooBig message in case of forwading a packet which
is too big for the outgoing interface and no firewall is involed.
This problem was introduced in
https://svnweb.freebsd.org/changeset/base/324996
Thanks to Irene Ruengeler for finding the bug and testing the fix.

Reviewed by:	kp@
MFC after:	3 days
2018-05-02 22:11:16 +00:00
Stephen Hurd
f3e1324b41 Separate list manipulation locking from state change in multicast
Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.

Submitted by:	mmacy <mmacy@mattmacy.io>
Reviewed by:	shurd, sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14969
2018-05-02 19:36:29 +00:00
Sean Bruno
7875017ca9 Revert r332894 at the request of the submitter.
Submitted by:	Johannes Lundberg <johalun0_gmail.com>
Sponsored by:	Limelight Networks
2018-04-24 19:55:12 +00:00
Sean Bruno
7b7796eea5 Load balance sockets with new SO_REUSEPORT_LB option
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).

Submitted by:	Johannes Lundberg <johanlun0@gmail.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003
2018-04-23 19:51:00 +00:00
Andrey V. Elsukov
849eeaa592 icmp6_reflect() sends ICMPv6 message with new IPv6 header. So, it is
considered as originated by our host packet. And thus rcvif should be
NULL, since it is used by ipfw(4) to determine that packet was originated
from this host. Some of icmp6_reflect() consumers reuse mbuf and m_pkthdr
without resetting rcvif pointer. To avoid this always reset m_pkthdr.rcvif
pointer to NULL in icmp6_reflect(). Also remove such line and comment
describing this from icmp6_error(), since it does not longer matters.

PR:		227674
Reported by:	eugen
MFC after:	1 week
2018-04-23 12:20:07 +00:00
Brooks Davis
3a4fc8a8a1 Remove support for the Arcnet protocol.
While Arcnet has some continued deployment in industrial controls, the
lack of drivers for any of the PCI, USB, or PCIe NICs on the market
suggests such users aren't running FreeBSD.

Evidence in the PR database suggests that the cm(4) driver (our sole
Arcnet NIC) was broken in 5.0 and has not worked since.

PR:		182297
Reviewed by:	jhibbits, vangyzen
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15057
2018-04-13 21:18:04 +00:00
Andrey V. Elsukov
56c989dff2 Add check that mbuf had not multicast layer2 address.
Such packets should be handled by ip6_mforward().

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2018-04-13 16:13:59 +00:00
Brooks Davis
0437c8e3b1 Remove support for FDDI networks.
Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).

Reviewed by:	kib, jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15017
2018-04-11 17:28:24 +00:00
Michael Tuexen
efcf28ef77 Fix a logical inversion bug.
Thanks to Irene Ruengeler for finding and reporting this bug.

MFC after:	3 days
2018-04-08 12:08:20 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Brooks Davis
8708f1bdaf Document and enforce assumptions about struct (in6_)ifreq.
- The two types must be type-punnable for shared members of ifr_ifru.
  This allows compatibility accessors to be shared.

- There must be no padding gap between ifr_name and ifr_ifru.  This is
  assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be
  broadly portable.  This is true for all current architectures, but very
  large (256-bit) fat-pointers could violate this invariant.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14910
2018-03-30 21:38:53 +00:00
Brooks Davis
f97f15e44c Remove a comment that suggests checking that a non-pointer is non-NULL.
Reviewed by:	melifaro, markj, hrs, ume
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14904
2018-03-30 18:26:29 +00:00
Brooks Davis
69f0fecbd6 Remove infrastructure for token-ring networks.
Reviewed by:	cem, imp, jhb, jmallett
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14875
2018-03-28 23:33:26 +00:00
Jonathan T. Looney
20cb3e2557 This change adds a flag to the DAD entry to indicate whether it is
currently on the queue. This prevents accidentally doubly-removing a DAD
entry from the queue, while also simplifying some of the logic in
nd6_dad_stop().

Reviewed by:	ae, hrs, vangyzen
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D10943
2018-03-24 13:18:09 +00:00
Jonathan T. Looney
c187c03466 Remove some unneccessary variable sets in IPv6 code, as detected by
clang's static analyzer.

Reviewed by:	bz
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D10940
2018-03-24 12:43:34 +00:00
Sean Bruno
72bfa0bf63 Revert r331379 as the "simple" lock changes have revealed a deeper problem
and need for a rethink.

Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
2018-03-23 18:34:38 +00:00
Kristof Provost
effaab8861 netpfil: Introduce PFIL_FWD flag
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by:	ae, kevans
Differential Revision:	https://reviews.freebsd.org/D13715
2018-03-23 16:56:44 +00:00
Sean Bruno
06b479a6a7 Refactor ip6_getpcbopt() for better locking and memory management
Created GET_PKTOPT_EXT_HDR() and GET_PKTOPT_SOCKADDR() macros to
handle safely fetching options from in6p_outputopts, including
properly dealing with in6p locking and preparing memory for
sooptcopyout().

Changed the function signature of ip6_getpcbopt() to allow the
function to acquire and release locks on in6p as needed.

Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14619
2018-03-22 23:34:48 +00:00
Sean Bruno
2a499acf59 Simple locking fixes in ip_ctloutput, ip6_ctloutput, rip_ctloutput.
Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14624
2018-03-22 22:29:32 +00:00
Sean Bruno
5cbeca4497 Handle locking and memory safety for IPV6_PATHMTU in ip6_ctloutput().
Submitted by:	Jason Eggleston <jason@eggnet.com>
Reviewed by:	ae
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14622
2018-03-22 21:18:34 +00:00
Sean Bruno
37d4fc1e70 Improve write locking in ip6_ctloutput() with macros.
Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14620
2018-03-22 20:21:05 +00:00
Jonathan T. Looney
7fb2986ff6 If the INP lock is uncontested, avoid taking a reference and jumping
through the lock-switching hoops.

A few of the INP lookup operations that lock INPs after the lookup do
so using this mechanism (to maintain lock ordering):

1. Lock lookup structure.
2. Find INP.
3. Acquire reference on INP.
4. Drop lock on lookup structure.
5. Acquire INP lock.
6. Drop reference on INP.

This change provides a slightly shorter path for cases where the INP
lock is uncontested:

1. Lock lookup structure.
2. Find INP.
3. Try to acquire the INP lock.
4. If successful, drop lock on lookup structure.

Of course, if the INP lock is contested, the functions will need to
revert to the previous way of switching locks safely.

This saves a few atomic operations when the INP lock is uncontested.

Discussed with:	gallatin, rrs, rwatson
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D12911
2018-03-21 15:54:46 +00:00
Alexander V. Chernikov
1435dcd94f Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.
Current arp/nd code relies on the feedback from the datapath indicating
 that the entry is still used. This mechanism is incorporated into the
 arpresolve()/nd6_resolve() routines. After the inpcb route cache
 introduction, the packet path for the locally-originated packets changed,
 passing cached lle pointer to the ether_output() directly. This resulted
 in the arp/ndp entry expire each time exactly after the configured max_age
 interval. During the small window between the ARP/NDP request and reply
 from the router, most of the packets got lost.

Fix this behaviour by plugging datapath notification code to the packet
 path used by route cache. Unify the notification code by using single
 inlined function with the per-AF callbacks.

Reported by:	sthaug at nethelp.no
Reviewed by:	ae
MFC after:	2 weeks
2018-03-17 17:05:48 +00:00
Eric van Gyzen
0bbfb20fe5 Update the MTU in affected routes when IPv6 RA changes the MTU
ip6_calcmtu() only looks at the interface MTU if neither the TCP hostcache
nor the route provides an MTU.  Update the routes so they do not provide
stale MTUs.

This fixes UNH IPv6 conformance test cases v6LC_4_1_08 and v6LC_4_1_09,
which use a RA to reduce the link MTU from 1500 to 1280.

Reported and tested by:	Farrell Woods <Farrell_Woods@Dell.com>
Reviewed by:	dab, melifaro
Discussed with:	ae
MFC after:	1 week
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14257
2018-02-12 19:49:20 +00:00
Eric van Gyzen
43105e589a Fix ICMPv6 redirects
icmp6_redirect_input() validates that a redirect packet came from the
current gateway for the respective destination.  To do this, it compares
the source address, which has an embedded scope zone id, to the next-hop
address, which does not.  If the address is link-local, which should be
the case, the comparison fails and the redirect is ignored.

Insert the scope zone id into the next-hop address so the comparison
is accurate.

Unsurprisingly, this fixes 35 UNH IPv6 conformance test cases.

Submitted by:	Farrell Woods <Farrell_Woods@Dell.com> (initial revision)
Reviewed by:	ae melifaro dab
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14254
2018-02-09 00:13:05 +00:00
Andrey V. Elsukov
68e0e5a673 Modify ip6_get_prevhdr() to be able use it safely.
Instead of returning pointer to the previous header, return its offset.
In frag6_input() use m_copyback() and determined offset to store next
header instead of accessing to it by pointer and assuming that the memory
is contiguous.

In rip6_input() use offset returned by ip6_get_prevhdr() instead of
calculating it from pointers arithmetic, because IP header can belong
to another mbuf in the chain.

Reported by:	Maxime Villard <max at m00nbsd dot net>
Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14158
2018-02-05 09:22:07 +00:00
Andrey V. Elsukov
883cd89b05 Merge r1.120 from NetBSD:
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
  not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
  on an mbuf that was already freed.

Reported by:	Maxime Villard <max at m00nbsd dot net>
MFC after:	3 days
2018-02-02 07:39:34 +00:00
Eric van Gyzen
f8116f391a ND6: Set the correct state for new neighbor cache entries
Restore state 6.  Many of the UNH tests end up exercising this
state, where we have a new neighbor cache entry and a new link-layer
entry is being created for it.  The link-layer address is currently
unknown so the initial state of the "llentry" should remain initialized
to ND6_LLINFO_NOSTATE so that the ND code will send a solicitation.
Setting this to ND6_LLINFO_STALE implies that the link-level entry
is valid and can be used (but needs to be refreshed via the Neighbor
Unreachability state machine).

https://forums.freebsd.org/threads/64287/

Submitted by:	Farrell Woods <Farrell_Woods@Dell.com>
Reviewed by:	mjoras, dab, ae
MFC after:	1 week
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14059
2018-01-29 16:12:26 +00:00
Andrey V. Elsukov
2164def67c Do not skip scope zone violation check, when mbuf has M_FASTFWD_OURS flag.
When mbuf has M_FASTFWD_OURS flag, this means that a destination address
is our local, but we still need to pass scope zone violation check,
because protocol level expects that IPv6 link-local addresses have
embedded scope zone indexes. This should fix the problem, when ipfw is
used to forward packets to local address and source address of a packet
is IPv6 LLA.

Reported by:	sbruno
MFC after:	3 weeks
2018-01-29 11:03:29 +00:00
Andrey V. Elsukov
efc284cb12 Assign IPv6 link-local address to loopback interfaces whith unit > 0.
When an interface has IFF_LOOPBACK flag in6_ifattach() tries to assing
IPv6 loopback address to this interface. It uses in6ifa_ifpwithaddr()
to check, that interface doesn't already have given address and then
uses in6_ifattach_loopback(). If in6_ifattach_loopback() fails, it just
exits and thus skips assignment of IPv6 LLA.
Fix this using in6ifa_ifwithaddr() function. If IPv6 loopback address is
already assigned in the system, do not call in6_ifattach_loopback().

PR:		138678
MFC after:	3 weeks
2018-01-29 10:33:55 +00:00
Navdeep Parhar
09b0b8c058 Do not generate illegal mbuf chains during IP fragment reassembly. Only
the first mbuf of the reassembled datagram should have a pkthdr.

This was discovered with cxgbe(4) + IPSEC + ping with payload more than
interface MTU.  cxgbe can generate !M_WRITEABLE mbufs and this results
in m_unshare being called on the reassembled datagram, and it complains:

panic: m_unshare: m0 0xfffff80020f82600, m 0xfffff8005d054100 has M_PKTHDR

PR:		224922
Reviewed by:	ae@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14009
2018-01-24 05:09:21 +00:00
Alan Somers
81e04458b8 sys/netinet6: fix typos in comments. No functional change.
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2018-01-23 19:40:05 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Pedro F. Giffuni
443133416b net*: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:21:51 +00:00
Pedro F. Giffuni
3760a9ac78 Fix some typos.
Obtained from:	OpenBSD (CVS v1.5)
2017-12-28 20:40:56 +00:00
Pedro F. Giffuni
a8e6714356 netinet6/ip6_id.c: niels kindly dropped clause 3/4 from the license.
This bring back r327293 from OpenBSD, with the important difference that
we are now getting it from their ip6_id.c file.

Obtained from:	OpenBSD (CVS v1.3)
2017-12-28 20:35:21 +00:00
Pedro F. Giffuni
b3c64c30fa Start syncing changes from OpenBSD's ip6_id.c instead of ip_id.c.
correct non-repetitive ID code, based on comments from niels provos.
- seed2 is necessary, but use it as "seed2 + x" not "seed2 ^ x".
- skipping number is not needed, so disable it for 16bit generator (makes
  the repetition period to 30000)

Obtained from:	OpenBSD (CVS rev. 1.2)
MFC after:	1 week
2017-12-28 20:26:51 +00:00
Pedro F. Giffuni
d82751000f Revert r327293
netinet6/ip6_id.c: niels kindly dropped clause 3/4 from the license.

I was looking at the wrong file. There is an important merge that must be
done before I can bring this change.
2017-12-28 20:10:10 +00:00
Pedro F. Giffuni
e9738d25c1 netinet6/ip6_id.c: niels kindly dropped clause 3/4 from the license.
This file is supposed to be based on the OpenBSD CVS v1.6 but checking
the OpenBSD repository the license had already dropped the 2&3 clasues by
then. Catch up with the licensing.

Obtained from:	OpenBSD (CVS 1.2)
2017-12-28 19:42:53 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Alexander Kabaev
bf51c9665d Silence clang analyzer false positive.
clang does not know that two lookup calls will return the same
pointer, so it assumes correctly that using the old pointer
after dropping the reference to it is a bit risky.
2017-12-23 16:45:26 +00:00
Andrey V. Elsukov
a406128960 Follow the RFC6980 and silently ignore following IPv6 NDP messages
that had the IPv6 fragmentation header:
 o  Neighbor Solicitation
 o  Neighbor Advertisement
 o  Router Solicitation
 o  Router Advertisement
 o  Redirect

Introduce M_FRAGMENTED mbuf flag, and set it after IPv6 fragment reassembly
is completed. Then check the presence of this flag in correspondig ND6
handling routines.

PR:		224247
MFC after:	2 weeks
2017-12-15 12:37:32 +00:00
Michael Tuexen
9f0abda051 Retire SCTP_WITH_NO_CSUM option.
This option was used in the early days to allow performance measurements
extrapolating the use of SCTP checksum offloading. Since this feature
is now available, get rid of this option.
This also un-breaks the LINT kernel. Thanks to markj@ for making me
aware of the problem.
2017-12-07 22:19:08 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Konstantin Belousov
06193f0be0 Use hardware timestamps to report packet timestamps for SO_TIMESTAMP
and other similar socket options.

Provide new control message SCM_TIME_INFO to supply information about
timestamp.  Currently it indicates that the timestamp was
hardware-assisted and high-precision, for software timestamps the
message is not returned.  Reserved fields are added to ABI to report
additional info about it, it is expected that raw hardware clock value
might be useful for some applications.

Reviewed by:	gallatin (previous version), hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
X-Differential revision:	https://reviews.freebsd.org/D12638
2017-11-07 09:46:26 +00:00
Michael Tuexen
28a6adde1d Allow the setting of the MTU for future paths using an SCTP socket option.
This functionality was missing.

MFC after:	1 week
2017-11-03 20:46:12 +00:00
Kristof Provost
a0bf3ee425 Evaluate packet size after the firewall had its chance in the ip6 fast path
Defer the packet size check until after the firewall has had a look at it. This
means that the firewall now has the opportunity to (re-)fragment an oversized
packet.
This mirrors what the slow path does.

Reviewed by:	ae
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12779
2017-10-25 19:21:48 +00:00
Gleb Smirnoff
0e229f343f Hide struct socket and struct unpcb from the userland.
Violators may define _WANT_SOCKET and _WANT_UNPCB respectively and
are not guaranteed for stability of the structures.  The violators
list is the the usual one: libprocstat(3) and netstat(1) internally
and lsof in ports.

In struct xunpcb remove the inclusion of kernel structure and add
a bunch of spare fields.  The xsocket already has socket not included,
but add there spares as well.  Embed xsockbuf into xsocket.

Sort declarations in sys/socketvar.h to separate kernel only from
userland available ones.

PR:		221820 (exp-run)
2017-10-02 23:29:56 +00:00
Michael Tuexen
2e8bb5ddf4 Fix a locking issue found by Coverity scanning the usrsctp library.
MFC after:	3 days
2017-09-09 20:51:54 +00:00
Bjoern A. Zeeb
ae69ad884d After inpcb route caching was put back in place there is no need for
flowtable anymore (as flowtable was never considered to be useful in
the forwarding path).

Reviewed by:		np
Differential Revision:	https://reviews.freebsd.org/D11448
2017-07-27 13:03:36 +00:00
Michael Tuexen
5ba7f91f9d Use memset/memcpy instead of bzero/bcopy.
Just use one variant instead of both. Use the memset/memcpy
ones since they cause less problems in crossplatform deployment.

MFC after:	1 week
2017-07-19 14:28:58 +00:00
Jonathan T. Looney
8b07e00e99 Fix an unnecessary/incorrect check in the PKTOPT_EXTHDRCPY macro.
This macro allocates memory and, if malloc does not return NULL, copies
data into the new memory. However, it doesn't just check whether malloc
returns NULL. It also checks whether we called malloc with M_NOWAIT. That
is not necessary.

While it may be that malloc() will only return NULL when the M_NOWAIT flag
is set, we don't need to check for this when checking malloc's return
value. Further, in this case, the check was not completely accurate,
because it checked for flags == M_NOWAIT, rather than treating it as a bit
field and checking for (flags & M_NOWAIT).

Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D10942
2017-05-30 14:50:28 +00:00
Jonathan T. Looney
fb04394554 Fix two places in the ICMP6 code where we could dereference a NULL pointer
in the icmp6_input() function.

When processing an ICMP6_ECHO_REQUEST, if IP6_EXTHDR_GET fails, it will
set nicmp6 and n to NULL. Therefore, we should condition our modification
to nicmp6 on n being not NULL.

And, when processing an ICMP6_WRUREQUEST in the (mode != FQDN) case, if
m_dup_pkthdr() fails, the code will set n to NULL. However, the very next
line dereferences n. Therefore, when m_dup_pkthdr() fails, we should
discontinue further processing and follow the same path as when m_gethdr()
fails.

Reported by:	clang static analyzer
Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D10941
2017-05-30 14:41:31 +00:00
Jonathan T. Looney
382a6bbcf1 Enforce the limit on ICMP messages before doing work to formulate the
response.

Delete an unneeded rate limit for UDP under IPv6. Because ICMP6
messages have their own rate limit, it is unnecessary to apply a
second rate limit to UDP messages.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D10387
2017-05-30 14:32:44 +00:00
Michael Tuexen
5dba6ada91 The connect() system call should return -1 and set errno to EAFNOSUPPORT
if it is called on a TCP socket
 * with an IPv6 address and the socket is bound to an
    IPv4-mapped IPv6 address.
 * with an IPv4-mapped IPv6 address and the socket is bound to an
   IPv6 address.
Thanks to Jonathan T. Leighton for reporting this issue.

Reviewed by:		bz gnn
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D9163
2017-05-22 15:29:10 +00:00
Ed Maste
3e85b721d6 Remove register keyword from sys/ and ANSIfy prototypes
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by:	cem, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10193
2017-05-17 00:34:34 +00:00
Enji Cooper
bd7459366e Add missing braces around MCAST_EXCLUDE check when KTR support is
compiled into the kernel

This ensures that .iss_asm (the number of ASM listeners) isn't incorrectly
decremented for MLD-layer source datagrams when inspecting im*s_st[1]
(the second state in the structure).

MFC after:	2 months
PR:		217509 [1]
Reported by:	Coverity (Isilon)
Reviewed by:	ae ("This patch looks correct to me." [1])
Submitted by:	Miles Ohlrich <miles.ohlrich@isilon.com>
Sponsored by:	Dell EMC Isilon
2017-05-13 18:41:24 +00:00
Navdeep Parhar
ce9ac139d4 ip6_output runs with the inp lock held, just like ip_output. 2017-05-10 00:14:55 +00:00
Michael Tuexen
d274bcc661 Fix an issue with MTU calculation if an ICMP messaeg is received
for an SCTP/UDP packet.

MFC after:	1 week
2017-04-26 20:21:05 +00:00
Michael Tuexen
6ebfa5ee14 Use consistently uint32_t for mtu values.
This does not change functionality, but this cleanup is need for further
improvements of ICMP handling.

MFC after:	1 week
2017-04-26 19:26:40 +00:00
Kristof Provost
d78c0804fb Rename variable for clarity
Rename the mtu variable in ip6_fragment(), because mtu is misleading. The
variable actually holds the fragment length.
No functional change.

Suggested by: ae
2017-04-22 13:04:36 +00:00
Kristof Provost
00eab743ab pf: Fix possible incorrect IPv6 fragmentation
When forwarding pf tracks the size of the largest fragment in a fragmented
packet, and refragments based on this size.
It failed to ensure that this size was a multiple of 8 (as is required for all
but the last fragment), so it could end up generating incorrect fragments.

For example, if we received an 8 byte and 12 byte fragment pf would emit a first
fragment with 12 bytes of payload and the final fragment would claim to be at
offset 8 (not 12).

We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so
other users won't make the same mistake.

Reported by:	Antonios Atlasis <aatlasis at secfu net>
MFC after:	3 days
2017-04-20 09:05:53 +00:00
Andrey V. Elsukov
c33a231337 Rework r316770 to make it protocol independent and general, like we
do for streaming sockets.

And do more cleanup in the sbappendaddr_locked_internal() to prevent
leak information from existing mbuf to the one, that will be possible
created later by netgraph.

Suggested by:	glebius
Tested by:	Irina Liakh <spell at itl ua>
MFC after:	1 week
2017-04-14 09:00:48 +00:00
Andrey V. Elsukov
8428914909 Clear h/w csum flags on mbuf handled by UDP.
When checksums of received IP and UDP header already checked, UDP uses
sbappendaddr_locked() to pass received data to the socket.
sbappendaddr_locked() uses given mbuf as is, and if NIC supports checksum
offloading, mbuf contains csum_data and csum_flags that were calculated
for already stripped headers. Some NICs support only limited checksums
offloading and do not use CSUM_PSEUDO_HDR flag, and csum_data contains
some value that UDP/TCP should use for pseudo header checksum calculation.

When L2TP is used for tunneling with mpd5, ng_ksocket receives mbuf with
filled csum_flags and csum_data, that were calculated for outer headers.
When L2TP header is stripped, a packet that was tunneled goes to the IP
layer and due to presence of csum_flags (without CSUM_PSEUDO_HDR) and
csum_data, the UDP/TCP checksum check fails for this packet.

Reported by:	Irina Liakh <spell at itl ua>
Tested by:	Irina Liakh <spell at itl ua>
MFC after:	1 week
2017-04-13 17:03:57 +00:00
Steven Hartland
4d806fc663 Allow explicitly assigned IPv6 loopback address to be used in jails
If a jail has an explicitly assigned IPv6 loopback address then allow it
to be used instead of remapping requests for the loopback adddress to the
first IPv6 address assigned to the jail.

This fixes issues where applications attempt to detect their bound port
where they requested a loopback address, which was available, but instead
the kernel remapped it to the jails first address.

This is the same fix applied to IPv4 fix by: r316313

Also:
* Correct the description of prison_check_ip6_locked to match the code.

MFC after:	2 weeks
Relnotes:	Yes
Sponsored by:	Multiplay
2017-03-31 09:10:05 +00:00
Mike Karels
8c1960d506 Fix reference count leak with L2 caching.
ip_forward, TCP/IPv6, and probably SCTP leaked references to L2 cache
entry because they used their own routes on the stack, not in_pcb routes.
The original model for route caching was callers that provided a route
structure to ip{,6}input() would keep the route, and this model was used
for L2 caching as well. Instead, change L2 caching to be done by default
only when using a route structure in the in_pcb; the pcb deallocation
code frees L2 as well as L3 cacches. A separate change will add route
caching to TCP/IPv6.

Another suggestion was to have the transport protocols indicate willingness
to use L2 caching, but this approach keeps the changes in the network
level

Reviewed by:    ae gnn
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D10059
2017-03-25 15:06:28 +00:00
Alan Somers
559b42968c Constrain IPv6 routes to single FIBs when net.add_addr_allfibs=0
sys/netinet6/icmp6.c
	Use the interface's FIB for source address selection in ICMPv6 error
	responses.

sys/netinet6/in6.c
	In in6_newaddrmsg, announce arrival of local addresses on the
	interface's FIB only.  In in6_lltable_rtcheck, use a per-fib ND6
	cache instead of a single cache.

sys/netinet6/in6_src.c
	In in6_selectsrc, use the caller's fib instead of the default fib.
	In in6_selectsrc_socket, remove a superfluous check.

sys/netinet6/nd6.c
	In nd6_lle_event, use the interface's fib for routing socket
	messages.  In nd6_is_new_addr_neighbor, check all FIBs when trying
	to determine whether an address is a neighbor.  Also, simplify the
	code for point to point interfaces.

sys/netinet6/nd6.h
sys/netinet6/nd6.c
sys/netinet6/nd6_rtr.c
	Make defrouter_select fib-aware, and make all of its callers pass in
	the interface fib.

sys/netinet6/nd6_nbr.c
	When inputting a Neighbor Solicitation packet, consider the
	interface fib instead of the default fib for DAD.  Output NS and
	Neighbor Advertisement packets on the correct fib.

sys/netinet6/nd6_rtr.c
	Allow installing the same host route on different interfaces in
	different FIBs.  If rt_add_addr_allfibs=0, only install or delete
	the prefix route on the interface fib.

tests/sys/netinet/fibs_test.sh
	Clear some expected failures, but add a skip for the newly revealed
	BUG217871.

PR:		196361
Submitted by:	Erick Turnquist <jhujhiti@adjectivism.org>
Reported by:	Jason Healy <jhealy@logn.net>
Reviewed by:	asomers
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D9451
2017-03-17 16:50:37 +00:00
Ermal Luçi
dce33a45c9 The patch provides the same socket option as Linux IP_ORIGDSTADDR.
Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD.

The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application.

This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets.

Reviewed by:	adrian, aw
Approved by:	ae (mentor)
Sponsored by:	rsync.net
Differential Revision:	D9235
2017-03-06 04:01:58 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Andrey V. Elsukov
9907aba370 When IPv6 fragments reassembly is complete, update mbuf's csum_data
and csum_flags using information from all fragments. This fixes
dropping of reassembled packets due to wrong checksum when the IPv6
checksum offloading is enabled on a network card.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2017-02-28 22:58:19 +00:00
Andrey V. Elsukov
627c036f65 Remove IPsec related PCB code from SCTP.
The inpcb structure has inp_sp pointer that is initialized by
ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec
security policies associated with a specific socket.
An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket
options to configure these security policies. Then ip[6]_output()
uses inpcb pointer to specify that an outgoing packet is associated
with some socket. And IPSEC_OUTPUT() method can use a security policy
stored in the inp_sp. For inbound packet the protocol-specific input
routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms
to inbound security policy configured in the inpcb.

SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends
packets. Thus IPSEC_OUTPUT() method does not consider such packets as
associated with some socket and can not apply security policies
from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY()
method is called from protocol-specific input routine, it can specify
inpcb pointer and associated with socket inbound policy will be
checked. But there are two problems:
1. Such check is asymmetric, becasue we can not apply security policy
from inpcb for outgoing packet.
2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and
access to inp_sp is protected. But for SCTP this is not correct,
becasue SCTP uses own locks to protect inpcb.

To fix these problems remove IPsec related PCB code from SCTP.
This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options
will be not applicable to SCTP sockets. To be able correctly check
inbound security policies for SCTP, mark its protocol header with
the PR_LASTHDR flag.

Reported by:	tuexen
Reviewed by:	tuexen
Differential Revision:	https://reviews.freebsd.org/D9538
2017-02-13 11:37:52 +00:00
Ermal Luçi
c10c5b1eba Committed without approval from mentor.
Reported by:	gnn
2017-02-12 06:56:33 +00:00
Ermal Luçi
70d81c5e91 Use proper value for socket option on IPv6
Reported-by: ohartmann@walstatt.org
2017-02-10 06:20:27 +00:00
Ermal Luçi
4616026faf Revert r313527
Heh svn is not git
2017-02-10 05:58:16 +00:00
Ermal Luçi
c0fadfdbbf Correct missed variable name.
Reported-by: ohartmann@walstatt.org
2017-02-10 05:51:39 +00:00
Ermal Luçi
ed55edceef The patch provides the same socket option as Linux IP_ORIGDSTADDR.
Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD.

The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application.

This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets.

Sponsored-by: rsync.net
Differential Revision: D9235
Reviewed-by: adrian
2017-02-10 05:16:14 +00:00
Andrey V. Elsukov
fcf596178b Merge projects/ipsec into head/.
Small summary
 -------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
  option IPSEC_SUPPORT added. It enables support for loading
  and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
  default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
  support was removed. Added TCP/UDP checksum handling for
  inbound packets that were decapsulated by transport mode SAs.
  setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
  build as part of ipsec.ko module (or with IPSEC kernel).
  It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
  methods. The only one header file <netipsec/ipsec_support.h>
  should be included to declare all the needed things to work
  with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
  Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
  - now all security associations stored in the single SPI namespace,
    and all SAs MUST have unique SPI.
  - several hash tables added to speed up lookups in SADB.
  - SADB now uses rmlock to protect access, and concurrent threads
    can do SA lookups in the same time.
  - many PF_KEY message handlers were reworked to reflect changes
    in SADB.
  - SADB_UPDATE message was extended to support new PF_KEY headers:
    SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
    can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
  avoid locking protection for ipsecrequest. Now we support
  only limited number (4) of bundled SAs, but they are supported
  for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
  used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
  check for full history of applied IPsec transforms.
o References counting rules for security policies and security
  associations were changed. The proper SA locking added into xform
  code.
o xform code was also changed. Now it is possible to unregister xforms.
  tdb_xxx structures were changed and renamed to reflect changes in
  SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by:	gnn, wblock
Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9352
2017-02-06 08:49:57 +00:00
Andriy Voskoboinyk
2bbd06fc33 Garbage collect IFT_IEEE80211 (but leave the define for possible reuse)
This interface type ("a parent interface of wlanX") is not used since
r287197

Reviewed by:	adrian, glebius
Differential Revision:	https://reviews.freebsd.org/D9308
2017-01-28 17:08:40 +00:00
Luiz Otavio O Souza
338e227ac0 After the in_control() changes in r257692, an existing address is
(intentionally) deleted first and then completely added again (so all the
events, announces and hooks are given a chance to run).

This cause an issue with CARP where the existing CARP data structure is
removed together with the last address for a given VHID, which will cause
a subsequent fail when the address is later re-added.

This change fixes this issue by adding a new flag to keep the CARP data
structure when an address is not being removed.

There was an additional issue with IPv6 CARP addresses, where the CARP data
structure would never be removed after a change and lead to VHIDs which
cannot be destroyed.

Reviewed by:	glebius
Obtained from:	pfSense
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-01-25 19:04:08 +00:00
Hans Petter Selasky
f3e7afe2d7 Implement kernel support for hardware rate limited sockets.
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by:		wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision:	https://reviews.freebsd.org/D3687
Sponsored by:		Mellanox Technologies
MFC after:		3 months
2017-01-18 13:31:17 +00:00
Mark Johnston
762d16d9e4 Improve some of the sysctl descriptions added in r299827.
Submitted by:	Marie Helene Kvello-Aune <marieheleneka@gmail.com>
		(original version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D5336
2017-01-16 19:35:19 +00:00
Maxim Sobolev
339efd75a4 Add a new socket option SO_TS_CLOCK to pick from several different clock
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:

o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).

In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.

Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.

This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.

There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.

Reviewed by:    adrian, gnn
Differential Revision:  https://reviews.freebsd.org/D9171
2017-01-16 17:46:38 +00:00
Mark Johnston
8cd3b2042c Release the ND6 list lock before making a prefix off-link in nd6_timer().
Reported by:	Jim <BM-2cWfdfG5CJsquqkJyry7hZT9LypbSEWEkQ@bitmessage.ch>
X-MFC With:	r306829
2017-01-08 18:46:00 +00:00
Michael Tuexen
b7b84c0e02 Whitespace changes.
The toolchain for processing the sources has been updated. No functional
change.

MFC after:	3 days
2016-12-26 11:06:41 +00:00
Mark Johnston
62280740d6 Remove a bogus KASSERT from nd6_prefix_unlink().
The caller may unlink a prefix before purging referencing addresses. An
identical assertion in nd6_prefix_del() verifies that the addresses are
purged before the prefix is freed.

PR:		215372
X-MFC With:	r306829
2016-12-19 19:21:28 +00:00
Andrey V. Elsukov
ad9f4d6ab6 ip[6]_tryforward does inbound and outbound packet firewall processing.
This can lead to change of mbuf pointer (packet filter could do m_pullup(),
NAT, etc). Also in case of change of destination address, tryforward can
decide that packet should be handled by local system. In this case modified
mbuf can be returned to the ip[6]_input(). To handle this correctly, check
M_FASTFWD_OURS flag after return from ip[6]_tryforward. And if it is present,
update variables that depend from mbuf pointer and skip another inbound
firewall processing.

No objection from:	#network
MFC after:	3 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D8764
2016-12-19 11:02:49 +00:00
Andrey V. Elsukov
8a030e9c6e Modify IPv6 statistic accounting in ip6_input().
Add rcvif local variable to keep inbound interface pointer. Count
ifs6_in_discard errors in all "goto bad" cases. Now it will count
errors even if mbuf was freed. Modify all places where m->m_pkthdr.rcvif
is used to use local rcvif variable.

Obtained from:	Yandex LLC
MFC after:	1 month
2016-12-12 11:26:59 +00:00
Andrey V. Elsukov
5a1842a24a Add ip6_tryforward() - a run to completion forwarding implementation
for IPv6.

It gets performance benefits from reduced number of checks. It doesn't
copy mbuf to be able send ICMPv6 error message, because it keeps mbuf
unchanged until the moment, when the route decision has been made.
It doesn't do IPsec checks, and when some IPsec security policies present,
ip6_input() uses normal slow path.

Reviewed by:	bz, gnn
Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D8527
2016-12-12 10:57:32 +00:00
Michael Tuexen
5b495f17a5 Whitespace changes.
The tools using to generate the sources has been updated and produces
different whitespaces. Commit this seperately to avoid intermixing
these with real code changes.

MFC after:	3 days
2016-12-06 10:21:25 +00:00
Michael Tuexen
3e1465754f Make ICMPv6 hard error handling for TCP consistent with the ICMPv4
handling. Ensure that:
* Protocol unreachable errors are handled by indicating ECONNREFUSED
  to the TCP user for both IPv4 and IPv6. These were ignored for IPv6.
* Communication prohibited errors are handled by indicating ECONNREFUSED
  to the TCP user for both IPv4 and IPv6. These were ignored for IPv6.
* Hop Limited exceeded errors are handled by indicating EHOSTUNREACH
  to the TCP user for both IPv4 and IPv6.
  For IPv6 the TCP connected was dropped but errno wasn't set.

Reviewed by: gallatin, rrs
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: 7904
2016-10-21 10:32:57 +00:00
George V. Neville-Neil
aec9c8d5a5 Limit the number of mbufs that can be allocated for IPV6_2292PKTOPTIONS
(and IPV6_PKTOPTIONS).

PR:		100219
Submitted by:	Joseph Kong
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D5157
2016-10-17 23:25:31 +00:00
Gleb Smirnoff
cc94f0c2d7 - Revert r300854, r303657 which tried to fix regression from r297225.
- Fix the regression proper way using RO_RTFREE().

Submitted by:	ae
2016-10-13 20:15:47 +00:00
Mark Johnston
d748f7efcd Lock the ND prefix list and add refcounting for prefixes.
This change extends the nd6 lock to protect the ND prefix list as well
as the list of advertising routers associated with each prefix. To handle
cases where the nd6 lock must be dropped while iterating over either the
prefix or default router lists, a generation counter is used to track
modifications to the lists. Additionally, a new mutex is used to serialize
prefix on-link/off-link transitions. This mutex must be acquired before
the nd6 lock and is held while updating the routing table in
nd6_prefix_onlink() and nd6_prefix_offlink().

Reviewed by:	ae, tuexen (SCTP bits)
Tested by:	Jason Wolfe <jason@llnw.com>,
		Larry Rosenman <ler@lerctr.org>
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D8125
2016-10-07 21:10:53 +00:00
Mark Johnston
c26158449e Reduce the number of conditional statements in nd6_prefix_onlink().
MFC after:	1 week
2016-10-07 21:03:18 +00:00
Mark Johnston
7b0e84b7c8 Combine several checks in nd6_prefix_offlink() into one.
MFC after:	1 week
2016-10-07 21:02:30 +00:00
Mark Johnston
7782accf13 Fix whitespace around prototypes in nd6_rtr.c.
MFC after:	1 week
2016-10-07 00:36:18 +00:00
Mark Johnston
07c1f95976 Fix a typo.
MFC after:	1 week
2016-10-07 00:35:28 +00:00
Mark Johnston
a88d6d7e07 Shorten and simplify some of the loops in pfxlist_onlink_check().
No functional change intended.

MFC after:	1 week
2016-10-07 00:34:57 +00:00
Mark Johnston
f7d91d8cdd Use a const reference to prefixes in nd6_is_new_addr_neighbor().
MFC after:	1 week
2016-10-07 00:26:36 +00:00
Mark Johnston
0ed7d74424 nd6_dad_timer(): don't assert that the address is tentative.
It appears that this assertion can be tripped in some cases when
multiple interfaces are on the same link. Until this is resolved, revert a
part of r306305 and simply log a message if the DAD timer fires on a
non-tentative address.

Reported by:	jhb
X-MFC With:	r306305
2016-10-01 01:30:34 +00:00
Andrey V. Elsukov
5a03e7819a Fix bug introduced in r274300.
In icmp6_reflect() use original source address of erroneous packet as
destination address for source selection algorithm when original
destination address is not one of our own.

Reported by:	Mark Kamichoff <prox at prolixium com>
Tested by:	Mark Kamichoff <prox at prolixium com>
MFC after:	1 week
2016-09-29 19:57:37 +00:00
Mark Johnston
970fe0938e Convert checks in nd6_dad_start() and nd6_dad_timer() to assertions.
In particular, these functions can assume they are operating on tentative
addresses.

MFC after:	2 weeks
2016-09-24 21:40:24 +00:00
Mark Johnston
0bbf244e9f Rename ndpr_refcnt to ndpr_addrcnt.
This field counts derived addresses and is not a true refcount for prefix
objects, so the previous name was misleading.

MFC after:	1 week
2016-09-24 01:14:25 +00:00
Mark Johnston
2d12d25c6a Reduce code duplication around NDP message handlers in icmp6_input().
No functional change intended.

MFC after:	2 weeks
2016-09-20 18:08:17 +00:00
Kevin Lo
c3bef61e58 Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead.
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D7878
2016-09-15 07:41:48 +00:00
Mike Karels
0f5687f2ae Fix L2 caching for UDP over IPv6
ip6_output() was missing cache invalidation code analougous to
ip_output.c. r304545 disabled L2 caching for UDP/IPv6 as a workaround.
This change adds the missing cache invalidation code and reverts
r304545.

Reviewed by:	gnn
Approved by:	gnn (mentor)
Tested by:	peter@, Mike Andrews
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D7591
2016-08-24 00:52:30 +00:00
Bjoern A. Zeeb
77ecef378a Remove the kernel optoion for IPSEC_FILTERTUNNEL, which was deprecated
more than 7 years ago in favour of a sysctl in r192648.
2016-08-21 18:55:30 +00:00
Mike Karels
db727c1bd7 Disable L2 caching for UDP over IPv6
The ip6_output routine is missing L2 cache invalication as done
in ip_output.  Even with that code, some problems with UDP over
IPv6 have been reported.  Diabling L2 cache for that problem works
around the problem for now.

PR:		211872 211926
Reviewed by:	gnn
Approved by:	gnn (mentor)
MFC after:	immediate
2016-08-20 20:46:53 +00:00
Andrey V. Elsukov
d8caf56e9e Add ipfw_nat64 module that implements stateless and stateful NAT64.
The module works together with ipfw(4) and implemented as its external
action module.

Stateless NAT64 registers external action with name nat64stl. This
keyword should be used to create NAT64 instance and to address this
instance in rules. Stateless NAT64 uses two lookup tables with mapped
IPv4->IPv6 and IPv6->IPv4 addresses to perform translation.

A configuration of instance should looks like this:
 1. Create lookup tables:
 # ipfw table T46 create type addr valtype ipv6
 # ipfw table T64 create type addr valtype ipv4
 2. Fill T46 and T64 tables.
 3. Add rule to allow neighbor solicitation and advertisement:
 # ipfw add allow icmp6 from any to any icmp6types 135,136
 4. Create NAT64 instance:
 # ipfw nat64stl NAT create table4 T46 table6 T64
 5. Add rules that matches the traffic:
 # ipfw add nat64stl NAT ip from any to table(T46)
 # ipfw add nat64stl NAT ip from table(T64) to 64:ff9b::/96
 6. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
    via NAT64 host.

Stateful NAT64 registers external action with name nat64lsn. The only
one option required to create nat64lsn instance - prefix4. It defines
the pool of IPv4 addresses used for translation.

A configuration of instance should looks like this:
 1. Add rule to allow neighbor solicitation and advertisement:
 # ipfw add allow icmp6 from any to any icmp6types 135,136
 2. Create NAT64 instance:
 # ipfw nat64lsn NAT create prefix4 A.B.C.D/28
 3. Add rules that matches the traffic:
 # ipfw add nat64lsn NAT ip from any to A.B.C.D/28
 # ipfw add nat64lsn NAT ip6 from any to 64:ff9b::/96
 4. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
    via NAT64 host.

Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D6434
2016-08-13 16:09:49 +00:00
Stephen J. Kiernan
0ce1624d0e Move IPv4-specific jail functions to new file netinet/in_jail.c
_prison_check_ip4 renamed to prison_check_ip4_locked

Move IPv6-specific jail functions to new file netinet6/in6_jail.c
_prison_check_ip6 renamed to prison_check_ip6_locked

Add appropriate prototypes to sys/sys/jail.h

Adjust kern_jail.c to call prison_check_ip4_locked and
prison_check_ip6_locked accordingly.

Add netinet/in_jail.c and netinet6/in6_jail.c to the list of files that
need to be built when INET and INET6, respectively, are configured in the
kernel configuration file.

Reviewed by:	jtl
Approved by:	sjg (mentor)
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D6799
2016-08-09 02:16:21 +00:00
Andrey V. Elsukov
723758b7ce Fix NULL pointer dereference.
ro pointer can be NULL when IPSec consumes mbuf.

PR:		211486
MFC after:	3 days
2016-08-02 12:18:06 +00:00
Andrew Gallatin
d4c22202e6 Rework IPV6 TCP path MTU discovery to match IPv4
- Re-write tcp_ctlinput6() to closely mimic the IPv4 tcp_ctlinput()

- Now that tcp_ctlinput6() updates t_maxseg, we can allow ip6_output()
  to send TCP packets without looking at the tcp host cache for every
  single transmit.

- Make the icmp6 code mimic the IPv4 code & avoid returning
  PRC_HOSTDEAD because it is so expensive.

Without these changes in place, every TCP6 pmtu discovery or host
unreachable ICMP resulted in a call to in6_pcbnotify() which walks the
tcbinfo table with the write lock held.  Because the tcbinfo table is
shared between IPv4 and IPv6, this causes huge scalabilty issues on
servers with lots of (~100K) TCP connections, to the point where even
a small percent of IPv6 traffic had a disproportionate impact on
overall throughput.

Reviewed by:	bz, rrs, ae (all earlier versions), lstewart (in Netflix's tree)
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D7272
2016-08-01 17:02:21 +00:00
Stephen J. Kiernan
4ac21b4f09 Prepare for network stack as a module
- Move cr_canseeinpcb to sys/netinet/in_prot.c in order to separate the
   INET and INET6-specific code from the rest of the prot code (It is only
   used by the network stack, so it makes sense for it to live with the
   other network stack code.)
 - Move cr_canseeinpcb prototype from sys/systm.h to netinet/in_systm.h
 - Rename cr_seeotheruids to cr_canseeotheruids and cr_seeothergids to
   cr_canseeothergids, make them non-static, and add prototypes (so they
   can be seen/called by in_prot.c functions.)
 - Remove sw_csum variable from ip6_forward in ip6_forward.c, as it is an
   unused variable.

Reviewed by:	gnn, jtl
Approved by:	sjg (mentor)
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D2901
2016-07-27 20:34:09 +00:00
Mike Karels
ea17754c5a Fix per-connection L2 caching in fast path
r301217 re-added per-connection L2 caching from a previous change,
but it omitted caching in the fast path.  Add it.

Reviewed By: gallatin
Approved by: gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D7239
2016-07-22 02:11:49 +00:00
Andrey V. Elsukov
b867e84e95 Add ipfw_nptv6 module that implements Network Prefix Translation for IPv6
as defined in RFC 6296. The module works together with ipfw(4) and
implemented as its external action module. When it is loaded, it registers
as eaction and can be used in rules. The usage pattern is similar to
ipfw_nat(4). All matched by rule traffic goes to the NPT module.

Reviewed by:	hrs
Obtained from:	Yandex LLC
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D6420
2016-07-18 19:46:31 +00:00
Andrey V. Elsukov
7aeccebc0d Add net.inet6.ip6.intr_queue_maxlen sysctl. It can be used to
change netisr queue limit for IPv6 at runtime.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2016-07-15 17:09:30 +00:00
Dimitry Andric
be39349169 Fix a page fault in ip6_setpktopt(), occurring when the pflog module is
loaded, and syncthing is started, which uses setsockopt(IPV6_PKGINFO).

This is because pflog interfaces do not normally have an IPv6 address,
causing the ND_IFINFO() macro to dereference a NULL pointer.

Reviewed by:	ae
PR:		210943
MFC after:	3 days
2016-07-13 19:41:19 +00:00
Michael Tuexen
55b8cd93ef Don't consider the socket when processing an incoming ICMP/ICMP6 packet,
which was triggered by an SCTP packet. Whether a socket exists, is just
not relevant.

Approved by: re (kib)
MFC after: 1 week
2016-06-23 09:13:15 +00:00
Andrey V. Elsukov
a844d49cc6 Fix the NULL pointer dereference for unresolved link layer entries in
the netinet6 code. Copy link layer address only when corresponding entry
has LLE_VALID flag.

PR:		210379
Approved by:	re (kib)
2016-06-22 11:29:21 +00:00
Bjoern A. Zeeb
89856f7e2d Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
Pedro F. Giffuni
268aab1cfc Remove the SIOCSIFALIFETIME_IN6 ioctl.
The SIOCSIFALIFETIME_IN6 provided by the kame project is unused,
it can't really be used safely and has been completely removed from
NetBSD and OpenBSD.

Obtained from:	NetBSD (kern/35897)
PR:		210148 (exp-run)
Reviewed by:	ae, hrs
Relnotes:	yes
Approved by:	re (glebius)
Differential Revision:	https://reviews.freebsd.org/D5491
2016-06-13 22:31:16 +00:00
Andrey V. Elsukov
4c10540274 Cleanup unneded include "opt_ipfw.h".
It was used for conditional build IPFIREWALL_FORWARD support.
But IPFIREWALL_FORWARD option was removed a long time ago.
2016-06-09 05:48:34 +00:00
Bjoern A. Zeeb
74b44794c6 Make KASSERT message more useful by printing the variables on which
we assert.

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 22:34:12 +00:00