Commit Graph

1551 Commits

Author SHA1 Message Date
Alexander V. Chernikov
2caee4be35 Rename rt_foreach_fib() to rt_foreach_fib_walk().
Suggested by:	julian
2015-08-10 20:50:31 +00:00
Alexander V. Chernikov
11cdad9873 Partially merge r274887,r275334,r275577,r275578,r275586 to minimize
differences between projects/routing and HEAD.

This commit tries to keep code logic the same while changing underlying
code to use unified callbacks.

* Add llt_foreach_entry method to traverse all entries in given llt
* Add llt_dump_entry method to export particular lle entry in sysctl/rtsock
  format (code is not indented properly to minimize diff). Will be fixed
  in the next commits.
* Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle.
* Add llt_fill_sa_entry method to export address in the lle to sockaddr
  format.
* Add llt_hash method to use in generic hash table support code.
* Add llt_free_entry method which is used in llt_prefix_free code.

* Prepare for fine-grained locking by separating lle unlink and deletion in
  lltable_free() and lltable_prefix_free().

* Provide lltable_get<ifp|af>() functions to reduce direct 'struct lltable'
 access by external callers.

* Remove @llt agrument from lle_free() lle callback since it was unused.
* Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting.
* Switch to per-af hashing code.
* Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to
  in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method.
  Update description from these functions.
* Use unified lltable_free_entry() function instead of per-af one.

Reviewed by:	ae
2015-08-10 12:03:59 +00:00
Marius Strobl
6e4cd74673 Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT. 2015-08-08 21:41:59 +00:00
Alexander V. Chernikov
4bdf0b6a9a MFP r274295:
* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
  to use new rt_foreach_fib() instead of hand-rolling cycles.
2015-08-08 18:14:59 +00:00
Alexander V. Chernikov
e362cf0e9f MFP r274553:
* Move lle creation/deletion from lla_lookup to separate functions:
  lla_lookup(LLE_CREATE) -> lla_create
  lla_lookup(LLE_DELETE) -> lla_delete
lla_create now returns with LLE_EXCLUSIVE lock for lle.
* Provide typedefs for new/existing lltable callbacks.

Reviewed by:	ae
2015-08-08 17:48:54 +00:00
Alexander V. Chernikov
331dff0737 Simplify ip[6] simploop:
Do not pass 'dst' sockaddr to ip[6]_mloopback:
  - We have explicit check for AF_INET in ip_output()
  - We assume ip header inside passed mbuf in ip_mloopback
  - We assume ip6 header inside passed mbuf in ip6_mloopback
2015-08-08 15:58:35 +00:00
Julien Charbon
ff9b006d61 Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
  stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
  and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR:			183659
Differential Revision:	https://reviews.freebsd.org/D2599
Reviewed by:		jhb, adrian
Tested by:		adrian, nitroboost-gmail.com
Sponsored by:		Verisign, Inc.
2015-08-03 12:13:54 +00:00
Andrey V. Elsukov
51a01baf23 Properly handle IPV6_NEXTHOP socket option in selectroute().
o remove disabled code;
 o if nexthop address is link-local, use embedded scope zone id to
   determine outgoing interface;
 o properly fill ro_dst before doing route lookup;
 o remove LLE lookup, instead check rt_flags for RTF_GATEWAY bit.

Sponsored by:	Yandex LLC
2015-08-02 12:40:56 +00:00
Andrey V. Elsukov
a6f7dea1fe Remove redundant check. 2015-08-02 11:58:24 +00:00
Andrey V. Elsukov
10a0e0bf0a Eliminate the use of m_copydata() in gif_encapcheck().
ip_encap already has inspected mbuf's data, at least an IP header.
And it is safe to use mtod() and do direct access to needed fields.
Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that
mbuf has a packet header.
Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also
remove "martian filters" checks. According to RFC 4213 it is enough to
verify that the source address is the address of the encapsulator, as
configured on the decapsulator.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 14:07:43 +00:00
Andrey V. Elsukov
cc0a3c8ca4 Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by:	gnn (previous version)
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3149
2015-07-29 08:12:05 +00:00
Michael Tuexen
4ff815b71c Move including netinet/icmp6.h around to avoid a problem when including
netinet/icmp6.h and net/netmap.h. Both use ni_flags...
This allows to build multistack with SCTP support.

MFC after: 1 week
2015-07-25 18:26:09 +00:00
Randall Stewart
f260c1b939 Fix inverted logic bug that David Wolfskill found (thanks David!)
MFC after:	3 Weeks
2015-07-22 09:29:50 +00:00
Randall Stewart
c0d1be08f6 When a tunneling protocol is being used with UDP we must release the
lock on the INP before calling the tunnel protocol, else a LOR
may occur (it does with SCTP for sure). Instead we must acquire a
ref count and release the lock, taking care to allow for the case
where the UDP socket has gone away and *not* unlocking since the
refcnt decrement on the inp will do the unlock in that case.

Reviewed by:	tuexen
MFC after:	3 weeks
2015-07-21 09:54:31 +00:00
Andrey V. Elsukov
30aee13117 Add LLE event handler to report ND6 events to userland via rtsock.
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 06:58:32 +00:00
Andrey V. Elsukov
585753c432 Invoke LLE event handler when entry is deleted.
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 06:54:50 +00:00
Andrey V. Elsukov
cb207f93ca Keep IPv6 address specified by IPV6_PKTINFO socket option in kernel
internal form to be able handle link-local IPv6 addresses.

Reported by:	kp
Tested by:	kp
2015-07-03 19:01:38 +00:00
Bjoern A. Zeeb
bfbc08b848 Move comment to the right position.
PR:		152791
Submitted by:	vangyzen (as part of the functional change)
MFC after:	3 days
2015-07-03 09:53:56 +00:00
Michael Tuexen
d089f9b915 Add FIB support for SCTP.
This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200379

MFC after: 3 days
2015-06-17 15:20:14 +00:00
Andrey V. Elsukov
4e870f943f Move RTM announces into generic code to be independent from Layer2 code.
This fixes bug introduced in 274988, when announces about new addresses
don't sent for tunneling interfaces.

Reported by:	tuexen@
MFC after:	1 week
2015-05-29 10:24:16 +00:00
Michael Tuexen
b7d130befc Fix and cleanup the debug information. This has no user-visible changes.
Thanks to Irene Ruengeler for proving a patch.

MFC after: 3 days
2015-05-28 16:00:23 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Andrey V. Elsukov
c1b4f79dfa Add an ability accept encapsulated packets from different sources by one
gif(4) interface. Add new option "ignore_source" for gif(4) interface.
When it is enabled, gif's encapcheck function requires match only for
packet's destination address.

Differential Revision:	https://reviews.freebsd.org/D2004
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-05-15 12:19:45 +00:00
Hiroki Sato
59333867ff - Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice
because a link where looped back NS messages are permanently observed
  does not work with either NDP or ARP for IPv4.

- draft-ietf-6man-enhanced-dad is now RFC 7527.

Discussed with:	hiren
MFC after:	3 days
2015-05-12 03:31:57 +00:00
Andrey V. Elsukov
654bdb5abb Mark data checksum as valid for multicast packets, that we send back
to myself via simloop.
Also remove duplicate check under #ifdef DIAGNOSTIC.

PR:		180065
MFC after:	1 week
2015-05-07 14:17:43 +00:00
Andrey V. Elsukov
db037aa4ed Remove unneded #ifdef INET6 and IPSEC. This file compiled only when
both options are defined.
Include opt_sctp.h and sctp_crc32.h to enable #ifdef SCTP code block
and delayed checksum calculation for SCTP.
2015-05-07 12:15:45 +00:00
Gleb Smirnoff
0fa5aacd8b Remove #ifdef IFT_FOO.
Submitted by:	Guy Yur <guyyur gmail.com>
2015-05-02 20:31:27 +00:00
Andrey V. Elsukov
3e92c37f32 Remove now unneded KEY_FREESP() for case when ipsec[46]_process_packet()
returns EJUSTRETURN.

Sponsored by:	Yandex LLC
2015-04-27 01:11:09 +00:00
Andrey V. Elsukov
3d80e82d60 Fix possible use after free due to security policy deletion.
When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(),
we hold one reference to security policy and release it just after return
from this function. But IPSec processing can be deffered and when we release
reference to security policy after ipsec[46]_process_packet(), user can
delete this security policy from SPDB. And when IPSec processing will be
done, xform's callback function will do access to already freed memory.

To fix this move KEY_FREESP() into callback function. Now IPSec code will
release reference to SP after processing will be finished.

Differential Revision:	https://reviews.freebsd.org/D2324
No objections from:	#network
Sponsored by:	Yandex LLC
2015-04-27 00:55:56 +00:00
Gleb Smirnoff
210b5c73e7 Fix r281649: don't call in6_clearscope() twice.
Submitted by:	ae
2015-04-17 15:26:08 +00:00
Gleb Smirnoff
28ebe80cab Provide functions to determine presence of a given address
configured on a given interface.

Discussed with:	np
Sponsored by:	Nginx, Inc.
2015-04-17 11:57:06 +00:00
Mark Johnston
dff78447a4 Fix a possible refcount leak in regen_tmpaddr().
public_ifa6 may be set to NULL after taking a reference to a previous
address list element. Instead, only take the reference after leaving the
loop but before releasing the address list lock.

Differential Revision:	https://reviews.freebsd.org/D2253
Reviewed by:		ae
MFC after:		2 weeks
2015-04-13 01:55:42 +00:00
Andrey V. Elsukov
e2956804dd Fix the IPV6_MULTICAST_IF sockopt handling. RFC 3493 says when the
interface index is specified as zero, the system should select the
interface to use for outgoing multicast packets. Even the comment
for the in6p_set_multicast_if() function says about index of zero.
But in fact for zero index the function just returns EADDRNOTAVAIL.

I.e. if you first set some interface and then will try reset it
with zero ifindex, you will get EADDRNOTAVAIL.

Reset im6o_multicast_ifp to NULL when interface index specified as
zero. Also return EINVAL in case when ifnet_byindex() returns NULL.
This will be the same behaviour as when ifindex is bigger than
V_if_index. And return EADDRNOTAVAIL only when interface is not
multicast capable.

Reported by:	Olivier Cochard-Labbé
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-04-10 19:09:51 +00:00
Andrey V. Elsukov
efb19cf6db Fix the check for maximum mbuf's size needed to send ND6 NA and NS.
It is acceptable that the size can be equal to MCLBYTES. In the later
KAME's code this check has been moved under DIAGNOSTIC ifdef, because
the size of NA and NS is much smaller than MCLBYTES. So, it is safe to
replace the check with KASSERT.

PR:		199304
Discussed with:	glebius
MFC after:	1 week
2015-04-09 12:57:58 +00:00
Kristof Provost
53deb05c36 Evaluate packet size after the firewall had its chance
Defer the packet size check until after the firewall has had a look at it. This
means that the firewall now has the opportunity to (re-)fragment an oversized
packet.

Differential Revision:	https://reviews.freebsd.org/D1815
Reviewed by:	ae
Approved by:	gnn (mentor)
2015-04-07 20:29:03 +00:00
Xin LI
dd3856601d Mitigate Local Denial of Service with IPv6 Router Advertisements
and log attack attempts.

Submitted by:	hrs
Security:	FreeBSD-SA-15:09.nd6
Security:	CVE-2015-2923
2015-04-07 20:20:09 +00:00
Gleb Smirnoff
c151f24d08 o Make net.inet6.ip6.mif6table return special API structure, that doesn't
contain kernel pointers, and instead has interface index.
  Bump __FreeBSD_version for that change.
o Now, netstat/mroute6.c no longer needs to kvm_read(3) struct ifnet, and
  no longer needs to include if_var.h

Note that this change is far from being a complete move of IPv6 multicast
routing to a proper API. Other structures are still dumped into their
sysctls as is, requiring userland application to #define _KERNEL when
including ip6_mroute.h and then call kvm_read(3) to gather all bits and
pieces. But fixing this is out of scope of the opaque ifnet project.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-04-06 22:12:18 +00:00
Kristof Provost
31e2e88c27 Remove duplicate code
We'll just fall into the same local delivery block under the
'if (m->m_flags & M_FASTFWD_OURS)'.

Suggested by:	ae
Differential Revision:	https://reviews.freebsd.org/D2225
Approved by:	gnn (mentor)
2015-04-06 19:08:44 +00:00
Kristof Provost
798318490e Preserve IPv6 fragment IDs accross reassembly and refragmentation
When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.

We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.

Differential Revision:	https://reviews.freebsd.org/D2188
Approved by:		gnn (mentor)
2015-04-01 12:15:01 +00:00
Gleb Smirnoff
20778ab5b4 Move ip6_sprintf() declaration from in6_var.h to in6.h. This is a simple
function that works with in6_addr and it is not related to the INET6
stack implementation.

Sponsored by:	Nginx, Inc.
2015-03-24 16:45:50 +00:00
Andrey V. Elsukov
ff9f2a36de To avoid a possible race, release the reference to ifa after return
from nd6_dad_na_input().

Submitted by:	Alexandre Martins
MFC after:	1 week
2015-03-19 00:04:25 +00:00
Andrey V. Elsukov
fd8dd3a6d7 tcp6_ctlinput() doesn't pass MTU value to in6_pcbnotify().
Check cmdarg isn't NULL before dereference, this check was in the
ip6_notify_pmtu() before r279588.

Reported by:	Florian Smeets
MFC after:	1 week
2015-03-06 05:50:39 +00:00
Hiroki Sato
23e9ffb0e1 - Implement loopback probing state in enhanced DAD algorithm.
- Add no_dad and ignoreloop per-IF knob.  no_dad disables DAD completely,
  and ignoreloop is to prevent infinite loop in loopback probing state when
  loopback is permanently expected.
2015-03-05 21:27:49 +00:00
Andrey V. Elsukov
8f1beb889e Fix deadlock in IPv6 PCB code.
When several threads are trying to send datagram to the same destination,
but fragmentation is disabled and datagram size exceeds link MTU,
ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all
sockets wanted to know MTU to this destination. And since all threads
hold PCB lock while sending, taking the lock for each PCB in the
in6_pcbnotify() leads to deadlock.

RFC 3542 p.11.3 suggests notify all application wanted to receive
IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message.
But it doesn't require this, when we don't receive ICMPv6 message.

Change ip6_notify_pmtu() function to be able use it directly from
ip6_output() to notify only one socket, and to notify all sockets
when ICMPv6 packet too big message received.

PR:		197059
Differential Revision:	https://reviews.freebsd.org/D1949
Reviewed by:	no objection from #network
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2015-03-04 11:20:01 +00:00
Andrey V. Elsukov
1eef8a6c08 Create nd6_ns_output_fib() function with extra argument fibnum. Use it
to initialize mbuf's fibnum. Uninitialized fibnum value can lead to
panic in the routing code. Currently we use only RT_DEFAULT_FIB value
for initialization.

Differential Revision:	https://reviews.freebsd.org/D1998
Reviewed by:	hrs (previous version)
Sponsored by:	Yandex LLC
2015-03-03 10:50:03 +00:00
Hiroki Sato
8d56075939 Nonce has to be non-NULL for DAD even if net.inet6.ip6.dad_enhanced=0. 2015-03-03 04:28:19 +00:00
Hiroki Sato
11d8451df3 Implement Enhanced DAD algorithm for IPv6 described in
draft-ietf-6man-enhanced-dad-13.

This basically adds a random nonce option (RFC 3971) to NS messages
for DAD probe to detect a looped back packet.  This looped back packet
prevented DAD on some pseudo-interfaces which aggregates multiple L2 links
such as lagg(4).

The length of the nonce is set to 6 bytes.  This algorithm can be disabled by
setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis.

Reported by:		hiren
Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D1835
2015-03-02 17:30:26 +00:00
Gleb Smirnoff
e072c794ad Now that all users of _WANT_IFADDR are fixed, remove this crutch and
hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 23:16:10 +00:00
Gleb Smirnoff
9e62a5a379 - Rename 'struct mld_ifinfo' into 'struct mld_ifsoftc', since it really
represents a context.
- Preserve name 'struct mld_ifinfo' for a new structure, that will be stable
  API between userland and kernel.
- Make sysctl_mld_ifinfo() return the new 'struct mld_ifinfo', instead of
  old one, which had a bunch of internal kernel structures in it.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 22:37:01 +00:00
Gleb Smirnoff
fd1b2a7c57 Widen _KERNEL ifdef to hide more kernel network stack structures from userland. 2015-02-19 06:24:27 +00:00
Gleb Smirnoff
a99c84d4e6 Use new struct mbufq instead of struct ifqueue to manage packet queues in
IPv6 multicast code.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 01:21:23 +00:00
Gleb Smirnoff
6c269f6912 Factor out ip6_fragment() function, to be used in IPv6 stack and pf(4).
Submitted by:		Kristof Provost
Differential Revision:	D1766
2015-02-16 06:30:27 +00:00
Gleb Smirnoff
e5ee706031 Move ip6_deletefraghdr() to frag6.c.
Suggested by:	bz
2015-02-16 05:58:32 +00:00
Gleb Smirnoff
0b438b0fb8 Factor out ip6_deletefraghdr() function, to be shared between IPv6
stack and pf(4).

Submitted by:	Kristof Provost
Reviewed by:	ae
Differential Revision:	D1764
2015-02-16 01:12:20 +00:00
Randall Stewart
2575fbb827 This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Commented upon by hiren and sbruno
See Phabricator D1777 for more details.

Commented upon by hiren and sbruno
Reviewed by:	adrian, jhb and bz
Sponsored by:	Netflix Inc.
2015-02-09 19:28:11 +00:00
Andrey V. Elsukov
46386183da Print IPv6 address in log message instead of address of pointer.
MFC after:	1 week
2015-02-05 16:29:26 +00:00
Adrian Chadd
b2bdc62a95 Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific
bits.

The motivation here is to eventually teach netisr and potentially
other networking subsystems a bit more about how RSS work queues / buckets
are configured so things have a hope of auto-configuring in the future.

* net/rss_config.[ch] takes care of the generic bits for doing
  configuration, hash function selection, etc;
* topelitz.[ch] is now in net/ rather than netinet/;
* (and would be in libkern if it didn't directly include RSS_KEYSIZE;
  that's a later thing to fix up.)
* netinet/in_rss.[ch] now just contains the IPv4 specific methods;
* and netinet/in6_rss.[ch] now just contains the IPv6 specific methods.

This should have no functional impact on anyone currently using
the RSS support.

Differential Revision:	D1383
Reviewed by:	gnn, jfv (intel driver bits)
2015-01-18 18:06:40 +00:00
Gleb Smirnoff
ffec6ee527 Do not go one layer down to check ifqueue length. First, not all drivers
use ifqueue at all. Second, there is no point in this lockless check.
Either positive or negative result of the check could be incorrect after
a tick.

Sponsored by:	Nginx, Inc.
2015-01-12 14:52:43 +00:00
Michael Tuexen
4be807c4d6 Minimize the usage of SCTP_BUF_IS_EXTENDED.
This should help Robert...
2015-01-10 20:49:57 +00:00
Alexander V. Chernikov
d63e657c04 * Deal with ARCNET L2 multicast mapping for IPv6 the same way as in IPv4:
handle it in arc_output() instead of nd6_storelladdr().
* Remove IFT_ARCNET check from arpresolve() since arc_output() does not
  use arpresolve() to handle broadcast/multicast. This check was there
  since r84931. It looks like it was not used since r89099 (initial
  import of Arcnet support where multicast is handled separately).
* Remove IFT_IEEE1394 case from nd6_storelladdr() since firewire_output()
  calles nd6_storelladdr() for unicast addresses only.
* Remove IFT_ARCNET case from nd6_storelladdr() since arc_output() now
  handles multicast by itself.

As a result, we have the following pattern: all non-ethernet-style
media have their own multicast map handling inside their appropriate
routines. On the other hand, arpresolve() (and nd6_storelladdr()) which
meant to be 'generic' ones de-facto handles ethernet-only multicast maps.

MFC after:	3 weeks
2015-01-09 12:56:51 +00:00
Alexander V. Chernikov
abc1be9062 Add forgotten definition for nd6_output_ifp(). 2015-01-08 18:29:54 +00:00
Alexander V. Chernikov
d7968c29ec * Use newly-created nd6_grab_holdchain() function to retrieve lle
hold mbuf chain instead of calling full-blown nd6_output_lle()
  for each packet. This simplifies both callers and nd6_output_lle()
  implementation.
* Make nd6_output_lle() static and remove now-unused lle and chain
  arguments.
* Rename nd6_output_flush() -> nd6_flush_holdchain() to be consistent.
* Move all pre-send transmit hooks to newly-created nd6_output_ifp().
  Now nd6_output(), nd6_output_lle() and nd6_flush_holdchain() are using
  it to send mbufs to if_output.
* Remove SeND hook from nd6_na_input() because it was implemented
  incorrectly since the beginning (r211501):
  - it tagged initial input mbuf (m) instead of m_hold
  - tagging _all_ mbufs in holdchain seems to be wrong anyway.
2015-01-08 18:02:05 +00:00
Alexander V. Chernikov
3a7498636a * Allocate hash tables separately
* Make llt_hash() callback more flexible
* Default hash size and hashing method is now per-af
* Move lltable allocation to separate function
2015-01-05 17:23:02 +00:00
Alexander V. Chernikov
df485dbe3e Do not call LLE_WUNLOCK() for deleted lle. 2015-01-05 16:10:54 +00:00
Robert Watson
ed6a66ca6c To ease changes to underlying mbuf structure and the mbuf allocator, reduce
the knowledge of mbuf layout, and in particular constants such as M_EXT,
MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment
utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single
M_ALIGN() macro, implemented by a now-inlined m_align() function:

- Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline.
- Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align().
- Update consumers around the tree to simply use M_ALIGN().

This change eliminates a number of cases where mbuf consumers must be aware
of whether or not mbufs returned by the allocator use external storage, but
also assumptions about the size of the returned mbuf. This will make it
easier to introduce changes in how we use external storage, as well as
features such as variable-size mbufs.

Differential Revision:	https://reviews.freebsd.org/D1436
Reviewed by:	glebius, trasz, gnn, bz
Sponsored by:	EMC / Isilon Storage Division
2015-01-05 09:58:32 +00:00
Alexander V. Chernikov
b44a7d5d87 * Use unified code for deleting entry by sockaddr instead of per-af one.
* Remove now unused llt_delete_addr callback.
2015-01-03 19:09:06 +00:00
Alexander V. Chernikov
20dd899505 * Hide lltable implementation details in if_llatbl_var.h
* Make most of lltable_* methods 'normal' functions instead of inline
* Add lltable_get_<af|ifp>() functions to access given lltable fields
* Temporarily resurrect nd6_lookup() function
2015-01-03 16:04:28 +00:00
Alexander V. Chernikov
787cea14a5 Since @ln is the result of LLTABLE6(ifp) lookup its originating interface
must always be @ifp. So change ln->lle_tbl->llt_ifp to ifp.
2015-01-03 14:18:48 +00:00
Alexander V. Chernikov
d2e0f37c22 Finish r275628 #2: remove remaining 'base' references. 2015-01-03 14:09:35 +00:00
Adrian Chadd
492ccbe14d Migrate the RSS IPv6 hash code to use pointers to the v6 addresses
rather than passing them in by value.

The eventual aim is to do incremental hash construction rather than
all of the memcpy()'ing into a contiguous buffer for the hash
function, which does show up as taking quite a bit of CPU during
profiling.

Tested:

* a variety of laptops/desktop setups I have, with v6 connectivity

Differential Revision:	D1404
Reviewed by:	bz, rpaulo
2014-12-31 22:52:43 +00:00
Andrey V. Elsukov
f188f14d43 Extern declarations in C files loses compile-time checking that
the functions' calls match their definitions. Move them to header files.

Reviewed by:	jilles (previous version)
2014-12-25 21:32:37 +00:00
Andrey V. Elsukov
132c449079 Remove in_gif.h and in6_gif.h files. They only contain function
declarations used by gif(4). Instead declare these functions in C files.
Also make some variables static.
2014-12-23 16:17:37 +00:00
Michael Tuexen
caeae63f97 Plug a memory leak in an error code path.
Reported by:	Coverity
CID:		1018936
MFC after: 	3 days
2014-12-17 20:19:57 +00:00
Andrey V. Elsukov
44eb8bbe7b Do not count security policy violation twice.
ipsec*_in_reject() do this by their own.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-12-11 19:20:13 +00:00
Andrey V. Elsukov
49ada98eac Use ipsec6_in_reject() to simplify ip6_ipsec_fwd() and ip6_ipsec_input().
ipsec6_in_reject() does the same things, also it counts policy violation
errors.

Do IPSEC check in the ip6_forward() after addresses checks.
Also use ip6_ipsec_fwd() to make code similar to IPv4 implementation.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-12-11 19:09:57 +00:00
Andrey V. Elsukov
0275b2e369 Remove flag/flags argument from the following functions:
ipsec_getpolicybyaddr()
 ipsec4_checkpolicy()
 ip_ipsec_output()
 ip6_ipsec_output()

The only flag used here was IP_FORWARDING.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-12-11 18:35:34 +00:00
Andrey V. Elsukov
8922ddbe40 Move ip_ipsec_fwd() from ip_input() into ip_forward().
Remove check for presence PACKET_TAG_IPSEC_IN_DONE mbuf tag from
ip_ipsec_fwd(). PACKET_TAG_IPSEC_IN_DONE tag means that packet is
already handled by IPSEC code. This means that before IPSEC processing
it was destined to our address and security policy was checked in
the ip_ipsec_input(). After IPSEC processing packet has new IP
addresses and destination address isn't our own. So, anyway we can't
check security policy from the mbuf tag, because it corresponds
to different addresses.

We should check security policy that corresponds to packet
attributes in both cases - when it has a mbuf tag and when it has not.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-12-11 16:53:29 +00:00
Andrey V. Elsukov
e58320f127 Remove PACKET_TAG_IPSEC_IN_DONE mbuf tag lookup and usage of its
security policy. The changed block of code in ip*_ipsec_input() is
called when packet has ESP/AH header. Presence of
PACKET_TAG_IPSEC_IN_DONE mbuf tag in the same time means that
packet was already handled by IPSEC and reinjected in the netisr,
and it has another ESP/AH headers (encrypted twice?).
Since it was already processed by IPSEC code, the AH/ESP headers
was already stripped (and probably outer IP header was stripped too)
and security policy from the tdb_ident was applied to those headers.
It is incorrect to apply this security policy to current headers.

Also make ip_ipsec_input() prototype similar to ip6_ipsec_input().

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-12-11 14:58:55 +00:00
Andrey V. Elsukov
dd9cd45b44 Remove check for presence of PACKET_TAG_IPSEC_PENDING_TDB and
PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED mbuf tags. They aren't used in FreeBSD.

Instead check presence of PACKET_TAG_IPSEC_OUT_DONE mbuf tag. If it
is found, bypass security policy lookup as described in the comment.

PACKET_TAG_IPSEC_OUT_DONE tag added to mbuf when IPSEC code finishes
ESP/AH processing. Since it was already finished, this means the security
policy placed in the tdb_ident was already checked. And there is no reason
to check it again here.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-12-11 14:43:44 +00:00
Mark Johnston
a37271c3b8 Revert r275695: nd6_dad_find() was already correct.
Reported by:	ae, kib
Pointy hat to:	markj
2014-12-11 09:16:45 +00:00
Mark Johnston
97712e3efc Fix a bug in r266857: nd6_dad_find() must return NULL if it doesn't find
a matching element in the DAD queue.

Reported by:	Holger Hans Peter Freyther <holger@freyther.de>
MFC after:	3 days
2014-12-11 00:41:54 +00:00
Alexander V. Chernikov
ee7e9a4e17 * Do not assume lle has sockaddr key after struct lle:
use llt_fill_sa_entry() llt method to store lle address in sa.
* Eliminate L3_ADDR macro and either reference IPv4/IPv6 address
   directly from lle or use newly-created llt_fill_sa_entry().
* Do not store sockaddr inside arp/ndp lle anymore.
2014-12-09 00:48:08 +00:00
Alexander V. Chernikov
d82ed5051c Simplify lle lookup/create api by using addresses instead of sockaddrs. 2014-12-08 23:23:53 +00:00
Mark Johnston
d6ad6a865a Add refcounting to IPv6 DAD objects and simplify the DAD code to fix a
number of races which could cause double frees or use-after-frees when
performing DAD on an address. In particular, an IPv6 address can now only be
marked as a duplicate from the DAD callout.

Differential Revision:	https://reviews.freebsd.org/D1258
Reviewed by:		ae, hrs
Reported by:		rstone
MFC after:		1 month
2014-12-08 04:44:40 +00:00
Alexander V. Chernikov
73b52ad896 Use llt_prepare_static_entry method to prepare valid per-af static entry. 2014-12-07 23:59:44 +00:00
Alexander V. Chernikov
0368226e65 * Retire abstract llentry_free() in favor of lltable_drop_entry_queue()
and explicit calls to RTENTRY_FREE_LOCKED()
* Use lltable_prefix_free() in arp_ifscrub to be consistent with nd6.
* Rename <lltable_|llt>_delete function to _delete_addr() to note that
   this function is used to external callers. Make this function maintain
   its own locking.
* Use lookup/unlink/clear call chain from internal callers instead of
    delete_addr.
* Fix LLE_DELETED flag handling
2014-12-07 23:08:07 +00:00
Alexander V. Chernikov
721cd2e032 Do not enforce particular lle storage scheme:
* move lltable allocation to per-domain callbacks.
* make llentry_link/unlink functions overridable llt methods.
* make hash table traversal another overridable llt method.
2014-12-07 17:32:06 +00:00
Alexander V. Chernikov
a743ccd468 * Add llt_clear_entry() callback which is able to do all lle
cleanup including unlinking/freeing
* Relax locking in lltable_prefix_free_af/lltable_free
* Do not pass @llt to lle free callback: it is always NULL now.
* Unify arptimer/nd6_llinfo_timer: explicitly unlock lle avoiding
   unlock/lock sequinces
* Do not pass unlocked lle to nd6_ns_output(): add nd6_llinfo_get_holdsrc()
   to retrieve preferred source address from lle hold queue and pass it
   instead of lle.
* Finally, make nd6_create() create and return unlocked lle
* Separate defrtr handling code from nd6_free():
   use nd6_check_del_defrtr() to check if we need to keep entry instead of
    performing GC,
   use nd6_check_recalc_defrtr() to perform actual recalc on lle removal.
* Move isRouter handling from nd6_cache_lladdr() to separate
   nd6_check_router()
* Add initial code to maintain lle runtime flags in sync.
2014-12-07 15:42:46 +00:00
Michael Tuexen
457b4b8836 This is the SCTP specific companion of
https://svnweb.freebsd.org/changeset/base/275358
which was provided by Hans Petter Selasky.
2014-12-04 21:17:50 +00:00
Andrey V. Elsukov
2dfcd0ae9d Remove unneded check. No need to do m_pullup to the size that we prepended.
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-12-02 05:41:03 +00:00
Andrey V. Elsukov
2d957916ef Remove route chaching support from ipsec code. It isn't used for some time.
* remove sa_route_union declaration and route_cache member from struct secashead;
* remove key_sa_routechange() call from ICMP and ICMPv6 code;
* simplify ip_ipsec_mtu();
* remove #include <net/route.h>;

Sponsored by:	Yandex LLC
2014-12-02 04:20:50 +00:00
Alexander V. Chernikov
9b65db85e2 Do more fine-grained locking in lltable code: lltable_create_lle()
does actual new lle creation without extensive locking and existing
  lle search.
Move lle updating code from gigantic in_arpinput() to arp_update_llle()
  and some other functions.

IPv6 changes to follow.
2014-12-01 21:43:48 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Alexander V. Chernikov
ce313fdd71 * Unify lle table dump/prefix removal code.
* Rename lla_XXX -> lltable_XXX_lle to reduce number of name prefixes
  used by lltable code.
2014-11-30 14:35:01 +00:00
Alexander V. Chernikov
5d14e4cd76 Provide rte_<get|set> methods to access rtentry for external consumers. 2014-11-29 19:27:43 +00:00
Alexander V. Chernikov
74860d4f7c Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr -
return lle flags IFF needed.
Do not pass rte to arpresolve - pass is_gateway flag instead.
2014-11-27 23:06:25 +00:00
Andrey V. Elsukov
af6209a133 Skip L2 addresses lookups for p2p interfaces.
Discussed with:	melifaro
Sponsored by:	Yandex LLC
2014-11-24 21:51:43 +00:00
Alexander V. Chernikov
73d770287d Do more fine-grained lltable locking: use table runtime lock as rare
as we can.
2014-11-23 15:38:06 +00:00
Alexander V. Chernikov
9479029b1f * Add lltable llt_hash callback
* Move lltable items insertions/deletions to generic llt code.
2014-11-23 12:15:28 +00:00
Alexander V. Chernikov
7c066c18db Use less-invasive approach for IF_AFDATA lock: convert into 2 locks:
use rwlock accessible via external functions
    (IF_AFDATA_CFG_* -> if_afdata_cfg_*()) for all control plane tasks
  use rmlock (IF_AFDATA_RUN_*) for fast-path lookups.
2014-11-22 19:53:36 +00:00