Commit Graph

1453 Commits

Author SHA1 Message Date
Julien Charbon
ff9b006d61 Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
  stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
  and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR:			183659
Differential Revision:	https://reviews.freebsd.org/D2599
Reviewed by:		jhb, adrian
Tested by:		adrian, nitroboost-gmail.com
Sponsored by:		Verisign, Inc.
2015-08-03 12:13:54 +00:00
Andrey V. Elsukov
51a01baf23 Properly handle IPV6_NEXTHOP socket option in selectroute().
o remove disabled code;
 o if nexthop address is link-local, use embedded scope zone id to
   determine outgoing interface;
 o properly fill ro_dst before doing route lookup;
 o remove LLE lookup, instead check rt_flags for RTF_GATEWAY bit.

Sponsored by:	Yandex LLC
2015-08-02 12:40:56 +00:00
Andrey V. Elsukov
a6f7dea1fe Remove redundant check. 2015-08-02 11:58:24 +00:00
Andrey V. Elsukov
10a0e0bf0a Eliminate the use of m_copydata() in gif_encapcheck().
ip_encap already has inspected mbuf's data, at least an IP header.
And it is safe to use mtod() and do direct access to needed fields.
Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that
mbuf has a packet header.
Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also
remove "martian filters" checks. According to RFC 4213 it is enough to
verify that the source address is the address of the encapsulator, as
configured on the decapsulator.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 14:07:43 +00:00
Andrey V. Elsukov
cc0a3c8ca4 Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by:	gnn (previous version)
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3149
2015-07-29 08:12:05 +00:00
Michael Tuexen
4ff815b71c Move including netinet/icmp6.h around to avoid a problem when including
netinet/icmp6.h and net/netmap.h. Both use ni_flags...
This allows to build multistack with SCTP support.

MFC after: 1 week
2015-07-25 18:26:09 +00:00
Randall Stewart
f260c1b939 Fix inverted logic bug that David Wolfskill found (thanks David!)
MFC after:	3 Weeks
2015-07-22 09:29:50 +00:00
Randall Stewart
c0d1be08f6 When a tunneling protocol is being used with UDP we must release the
lock on the INP before calling the tunnel protocol, else a LOR
may occur (it does with SCTP for sure). Instead we must acquire a
ref count and release the lock, taking care to allow for the case
where the UDP socket has gone away and *not* unlocking since the
refcnt decrement on the inp will do the unlock in that case.

Reviewed by:	tuexen
MFC after:	3 weeks
2015-07-21 09:54:31 +00:00
Andrey V. Elsukov
30aee13117 Add LLE event handler to report ND6 events to userland via rtsock.
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 06:58:32 +00:00
Andrey V. Elsukov
585753c432 Invoke LLE event handler when entry is deleted.
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 06:54:50 +00:00
Andrey V. Elsukov
cb207f93ca Keep IPv6 address specified by IPV6_PKTINFO socket option in kernel
internal form to be able handle link-local IPv6 addresses.

Reported by:	kp
Tested by:	kp
2015-07-03 19:01:38 +00:00
Bjoern A. Zeeb
bfbc08b848 Move comment to the right position.
PR:		152791
Submitted by:	vangyzen (as part of the functional change)
MFC after:	3 days
2015-07-03 09:53:56 +00:00
Michael Tuexen
d089f9b915 Add FIB support for SCTP.
This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200379

MFC after: 3 days
2015-06-17 15:20:14 +00:00
Andrey V. Elsukov
4e870f943f Move RTM announces into generic code to be independent from Layer2 code.
This fixes bug introduced in 274988, when announces about new addresses
don't sent for tunneling interfaces.

Reported by:	tuexen@
MFC after:	1 week
2015-05-29 10:24:16 +00:00
Michael Tuexen
b7d130befc Fix and cleanup the debug information. This has no user-visible changes.
Thanks to Irene Ruengeler for proving a patch.

MFC after: 3 days
2015-05-28 16:00:23 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Andrey V. Elsukov
c1b4f79dfa Add an ability accept encapsulated packets from different sources by one
gif(4) interface. Add new option "ignore_source" for gif(4) interface.
When it is enabled, gif's encapcheck function requires match only for
packet's destination address.

Differential Revision:	https://reviews.freebsd.org/D2004
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-05-15 12:19:45 +00:00
Hiroki Sato
59333867ff - Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice
because a link where looped back NS messages are permanently observed
  does not work with either NDP or ARP for IPv4.

- draft-ietf-6man-enhanced-dad is now RFC 7527.

Discussed with:	hiren
MFC after:	3 days
2015-05-12 03:31:57 +00:00
Andrey V. Elsukov
654bdb5abb Mark data checksum as valid for multicast packets, that we send back
to myself via simloop.
Also remove duplicate check under #ifdef DIAGNOSTIC.

PR:		180065
MFC after:	1 week
2015-05-07 14:17:43 +00:00
Andrey V. Elsukov
db037aa4ed Remove unneded #ifdef INET6 and IPSEC. This file compiled only when
both options are defined.
Include opt_sctp.h and sctp_crc32.h to enable #ifdef SCTP code block
and delayed checksum calculation for SCTP.
2015-05-07 12:15:45 +00:00
Gleb Smirnoff
0fa5aacd8b Remove #ifdef IFT_FOO.
Submitted by:	Guy Yur <guyyur gmail.com>
2015-05-02 20:31:27 +00:00
Andrey V. Elsukov
3e92c37f32 Remove now unneded KEY_FREESP() for case when ipsec[46]_process_packet()
returns EJUSTRETURN.

Sponsored by:	Yandex LLC
2015-04-27 01:11:09 +00:00
Andrey V. Elsukov
3d80e82d60 Fix possible use after free due to security policy deletion.
When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(),
we hold one reference to security policy and release it just after return
from this function. But IPSec processing can be deffered and when we release
reference to security policy after ipsec[46]_process_packet(), user can
delete this security policy from SPDB. And when IPSec processing will be
done, xform's callback function will do access to already freed memory.

To fix this move KEY_FREESP() into callback function. Now IPSec code will
release reference to SP after processing will be finished.

Differential Revision:	https://reviews.freebsd.org/D2324
No objections from:	#network
Sponsored by:	Yandex LLC
2015-04-27 00:55:56 +00:00
Gleb Smirnoff
210b5c73e7 Fix r281649: don't call in6_clearscope() twice.
Submitted by:	ae
2015-04-17 15:26:08 +00:00
Gleb Smirnoff
28ebe80cab Provide functions to determine presence of a given address
configured on a given interface.

Discussed with:	np
Sponsored by:	Nginx, Inc.
2015-04-17 11:57:06 +00:00
Mark Johnston
dff78447a4 Fix a possible refcount leak in regen_tmpaddr().
public_ifa6 may be set to NULL after taking a reference to a previous
address list element. Instead, only take the reference after leaving the
loop but before releasing the address list lock.

Differential Revision:	https://reviews.freebsd.org/D2253
Reviewed by:		ae
MFC after:		2 weeks
2015-04-13 01:55:42 +00:00
Andrey V. Elsukov
e2956804dd Fix the IPV6_MULTICAST_IF sockopt handling. RFC 3493 says when the
interface index is specified as zero, the system should select the
interface to use for outgoing multicast packets. Even the comment
for the in6p_set_multicast_if() function says about index of zero.
But in fact for zero index the function just returns EADDRNOTAVAIL.

I.e. if you first set some interface and then will try reset it
with zero ifindex, you will get EADDRNOTAVAIL.

Reset im6o_multicast_ifp to NULL when interface index specified as
zero. Also return EINVAL in case when ifnet_byindex() returns NULL.
This will be the same behaviour as when ifindex is bigger than
V_if_index. And return EADDRNOTAVAIL only when interface is not
multicast capable.

Reported by:	Olivier Cochard-Labbé
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-04-10 19:09:51 +00:00
Andrey V. Elsukov
efb19cf6db Fix the check for maximum mbuf's size needed to send ND6 NA and NS.
It is acceptable that the size can be equal to MCLBYTES. In the later
KAME's code this check has been moved under DIAGNOSTIC ifdef, because
the size of NA and NS is much smaller than MCLBYTES. So, it is safe to
replace the check with KASSERT.

PR:		199304
Discussed with:	glebius
MFC after:	1 week
2015-04-09 12:57:58 +00:00
Kristof Provost
53deb05c36 Evaluate packet size after the firewall had its chance
Defer the packet size check until after the firewall has had a look at it. This
means that the firewall now has the opportunity to (re-)fragment an oversized
packet.

Differential Revision:	https://reviews.freebsd.org/D1815
Reviewed by:	ae
Approved by:	gnn (mentor)
2015-04-07 20:29:03 +00:00
Xin LI
dd3856601d Mitigate Local Denial of Service with IPv6 Router Advertisements
and log attack attempts.

Submitted by:	hrs
Security:	FreeBSD-SA-15:09.nd6
Security:	CVE-2015-2923
2015-04-07 20:20:09 +00:00
Gleb Smirnoff
c151f24d08 o Make net.inet6.ip6.mif6table return special API structure, that doesn't
contain kernel pointers, and instead has interface index.
  Bump __FreeBSD_version for that change.
o Now, netstat/mroute6.c no longer needs to kvm_read(3) struct ifnet, and
  no longer needs to include if_var.h

Note that this change is far from being a complete move of IPv6 multicast
routing to a proper API. Other structures are still dumped into their
sysctls as is, requiring userland application to #define _KERNEL when
including ip6_mroute.h and then call kvm_read(3) to gather all bits and
pieces. But fixing this is out of scope of the opaque ifnet project.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-04-06 22:12:18 +00:00
Kristof Provost
31e2e88c27 Remove duplicate code
We'll just fall into the same local delivery block under the
'if (m->m_flags & M_FASTFWD_OURS)'.

Suggested by:	ae
Differential Revision:	https://reviews.freebsd.org/D2225
Approved by:	gnn (mentor)
2015-04-06 19:08:44 +00:00
Kristof Provost
798318490e Preserve IPv6 fragment IDs accross reassembly and refragmentation
When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.

We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.

Differential Revision:	https://reviews.freebsd.org/D2188
Approved by:		gnn (mentor)
2015-04-01 12:15:01 +00:00
Gleb Smirnoff
20778ab5b4 Move ip6_sprintf() declaration from in6_var.h to in6.h. This is a simple
function that works with in6_addr and it is not related to the INET6
stack implementation.

Sponsored by:	Nginx, Inc.
2015-03-24 16:45:50 +00:00
Andrey V. Elsukov
ff9f2a36de To avoid a possible race, release the reference to ifa after return
from nd6_dad_na_input().

Submitted by:	Alexandre Martins
MFC after:	1 week
2015-03-19 00:04:25 +00:00
Andrey V. Elsukov
fd8dd3a6d7 tcp6_ctlinput() doesn't pass MTU value to in6_pcbnotify().
Check cmdarg isn't NULL before dereference, this check was in the
ip6_notify_pmtu() before r279588.

Reported by:	Florian Smeets
MFC after:	1 week
2015-03-06 05:50:39 +00:00
Hiroki Sato
23e9ffb0e1 - Implement loopback probing state in enhanced DAD algorithm.
- Add no_dad and ignoreloop per-IF knob.  no_dad disables DAD completely,
  and ignoreloop is to prevent infinite loop in loopback probing state when
  loopback is permanently expected.
2015-03-05 21:27:49 +00:00
Andrey V. Elsukov
8f1beb889e Fix deadlock in IPv6 PCB code.
When several threads are trying to send datagram to the same destination,
but fragmentation is disabled and datagram size exceeds link MTU,
ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all
sockets wanted to know MTU to this destination. And since all threads
hold PCB lock while sending, taking the lock for each PCB in the
in6_pcbnotify() leads to deadlock.

RFC 3542 p.11.3 suggests notify all application wanted to receive
IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message.
But it doesn't require this, when we don't receive ICMPv6 message.

Change ip6_notify_pmtu() function to be able use it directly from
ip6_output() to notify only one socket, and to notify all sockets
when ICMPv6 packet too big message received.

PR:		197059
Differential Revision:	https://reviews.freebsd.org/D1949
Reviewed by:	no objection from #network
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2015-03-04 11:20:01 +00:00
Andrey V. Elsukov
1eef8a6c08 Create nd6_ns_output_fib() function with extra argument fibnum. Use it
to initialize mbuf's fibnum. Uninitialized fibnum value can lead to
panic in the routing code. Currently we use only RT_DEFAULT_FIB value
for initialization.

Differential Revision:	https://reviews.freebsd.org/D1998
Reviewed by:	hrs (previous version)
Sponsored by:	Yandex LLC
2015-03-03 10:50:03 +00:00
Hiroki Sato
8d56075939 Nonce has to be non-NULL for DAD even if net.inet6.ip6.dad_enhanced=0. 2015-03-03 04:28:19 +00:00
Hiroki Sato
11d8451df3 Implement Enhanced DAD algorithm for IPv6 described in
draft-ietf-6man-enhanced-dad-13.

This basically adds a random nonce option (RFC 3971) to NS messages
for DAD probe to detect a looped back packet.  This looped back packet
prevented DAD on some pseudo-interfaces which aggregates multiple L2 links
such as lagg(4).

The length of the nonce is set to 6 bytes.  This algorithm can be disabled by
setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis.

Reported by:		hiren
Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D1835
2015-03-02 17:30:26 +00:00
Gleb Smirnoff
e072c794ad Now that all users of _WANT_IFADDR are fixed, remove this crutch and
hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 23:16:10 +00:00
Gleb Smirnoff
9e62a5a379 - Rename 'struct mld_ifinfo' into 'struct mld_ifsoftc', since it really
represents a context.
- Preserve name 'struct mld_ifinfo' for a new structure, that will be stable
  API between userland and kernel.
- Make sysctl_mld_ifinfo() return the new 'struct mld_ifinfo', instead of
  old one, which had a bunch of internal kernel structures in it.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 22:37:01 +00:00
Gleb Smirnoff
fd1b2a7c57 Widen _KERNEL ifdef to hide more kernel network stack structures from userland. 2015-02-19 06:24:27 +00:00
Gleb Smirnoff
a99c84d4e6 Use new struct mbufq instead of struct ifqueue to manage packet queues in
IPv6 multicast code.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 01:21:23 +00:00
Gleb Smirnoff
6c269f6912 Factor out ip6_fragment() function, to be used in IPv6 stack and pf(4).
Submitted by:		Kristof Provost
Differential Revision:	D1766
2015-02-16 06:30:27 +00:00
Gleb Smirnoff
e5ee706031 Move ip6_deletefraghdr() to frag6.c.
Suggested by:	bz
2015-02-16 05:58:32 +00:00
Gleb Smirnoff
0b438b0fb8 Factor out ip6_deletefraghdr() function, to be shared between IPv6
stack and pf(4).

Submitted by:	Kristof Provost
Reviewed by:	ae
Differential Revision:	D1764
2015-02-16 01:12:20 +00:00
Randall Stewart
2575fbb827 This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Commented upon by hiren and sbruno
See Phabricator D1777 for more details.

Commented upon by hiren and sbruno
Reviewed by:	adrian, jhb and bz
Sponsored by:	Netflix Inc.
2015-02-09 19:28:11 +00:00
Andrey V. Elsukov
46386183da Print IPv6 address in log message instead of address of pointer.
MFC after:	1 week
2015-02-05 16:29:26 +00:00