freebsd-dev

Author	SHA1	Message	Date
Alexander V. Chernikov	ddd208f7ad	Unify setting lladdr for AF_INET[6].	2015-11-07 11:12:00 +00:00
Adrian Chadd	aaa46574b0	[netinet6]: Create a new IPv6 netisr which expects the frames to have been verified. This is required for fragments and encapsulated data (eg tunneling) to be redistributed to the RSS bucket based on the eventual IPv6 header and protocol (TCP, UDP, etc) header. * Add an mbuf tag with the state of IPv6 options parsing before the frame is queued into the direct dispatch handler; * Continue processing and complete the frame reception in the correct RSS bucket / netisr context. Testing results are in the phabricator review. Differential Revision: https://reviews.freebsd.org/D3563 Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn>	2015-11-06 23:07:43 +00:00
Alexander V. Chernikov	ba99cc0b86	Use m_cat() to reassembly IPv6 packets. Submitted by: jonloony_gmail.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3863	2015-10-27 22:11:09 +00:00
Alexander V. Chernikov	ab415c8307	Invoke lle_event for new entry iff it has lladdr set.	2015-10-04 19:10:27 +00:00
Alexander V. Chernikov	7503e0c783	Simplify if (lladdr) condition in nd6_cache_lladdr(): For case (7) (new entry) nothing has to be done except lle_event. Invoke this event directly from "create new lle" code block. For case (4) (existing entry, same mac) useless mac update was performed, along with LLENTRY_RESOLVED lle_event. There was no sense in doing that, since nothing really had changed. Simply avoid this condition instead. Given that, condition was simplified to (3),(5) states which can be merged with previous block.	2015-10-04 12:42:07 +00:00
Alexander V. Chernikov	9b420b3da4	Eliminate nd6_llinfo_settimer(). All consumers were converted to use nd6_llinfo_settimer_locked() in r216022. Make nd6_llinfo_settimer_locked() static: last external consumer was converted in r288124.	2015-10-04 08:33:16 +00:00
Alexander V. Chernikov	c0b8aeae2d	Add __noinline attribute to several functions to ease dtrace instrumentation	2015-10-04 08:21:15 +00:00
Alexander V. Chernikov	06a60e4bb0	Fix condition for nd6_llinfo_getholdsrc() introduced in r287484. Effectively it always returned NULL so SAS was always performed and sometimes the result might have been different. Fix state machine change accidentally introduced in r287985: state (4) inside nd6_cache_lladdr() (existing entry got nd message with the same lladdress) started to cause lle state transition to STALE instead of no-action.	2015-10-04 07:02:17 +00:00
Hiroki Sato	6401c828ce	- Schedule DAD for IN6_IFF_TENTATIVE addresses in nd6_timer(). This catches cases that DAD probes cannot be sent because of IFF_UP && !IFF_DRV_RUNNING. - nd6_dad_starttimer() now calls nd6_dad_ns_output(), instead of calling it before nd6_dad_starttimer(). - Do not release an entry in dadq when a duplicate entry is being added.	2015-10-03 12:09:12 +00:00
Andrey V. Elsukov	f367798498	Take extra reference to security policy before calling crypto_dispatch(). Currently we perform crypto requests for IPSEC synchronous for most of crypto providers (software, aesni) and only VIA padlock calls crypto callback asynchronous. In synchronous mode it is possible, that security policy will be removed during the processing crypto request. And crypto callback will release the last reference to SP. Then upon return into ipsec[46]_process_packet() IPSECREQUEST_UNLOCK() will be called to already freed request. To prevent this we will take extra reference to SP. PR: 201876 Sponsored by: Yandex LLC	2015-09-30 08:16:33 +00:00
Alexander V. Chernikov	1558cb2448	Eliminate nd6_nud_hint() and its TCP bindings. Initially function was introduced in r53541 (KAME initial commit) to "provide hints from upper layer protocols that indicate a connection is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability Confirmation). However, it was converted to do nothing (e.g. just return) in r122922 (tcp_hostcache implementation) back in 2003. Some defines were moved to tcp_var.h in r169541. Then, it was broken (for non-corner cases) by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So, right now this code is broken and has no "real" base users. Differential Revision: https://reviews.freebsd.org/D3699	2015-09-27 05:29:34 +00:00
Alexander V. Chernikov	4a336ef40c	rtsock requests for deleting interface address lles started to return EPERM instead of old "ignore-and-return 0" in r287789. This broke arp -da / ndp -cn behavior (they exit on rtsock command failure). Fix this by translating LLE_IFADDR to RTM_PINNED flag, passing it to userland and making arp/ndp ignore these entries in batched delete. MFC after: 2 weeks	2015-09-27 04:54:29 +00:00
Alexander V. Chernikov	f506d933b5	Use standard lle LLE_EXCLUSIVE request flags instead of its redefined version.	2015-09-22 20:45:04 +00:00
Bjoern A. Zeeb	7af7c754e4	Compare mbuf pointer to NULL rather than to 0. No functional change. MFC after: 2 weeks	2015-09-21 12:53:26 +00:00
Bjoern A. Zeeb	b1ce89f2bc	In the UDP over IPv6 implementation several cases are using the wrong protocol, e.g., based on wrong "next header" assumptions (which does not have to point to the upper layer protocol), or using hard-coded UDP instead of UDP or UDP-Lite possibly switching protocols. Fix those cases for UDP-Lite to work correctly. PR: 202788 Submitted by: Tiwei Bie (btw mail.ustc.edu.cn) [parts] Reviewed by: gnn, Tiwei Bie (btw mail.ustc.edu.cn), kevlo (earlier version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3686	2015-09-21 12:32:36 +00:00
Alexander V. Chernikov	aa5f023eaf	Unify nd6 state switching by using newly-created nd6_llinfo_setstate() function. The change is mostly mechanical with the following exception: Last piece of nd6_resolve_slow() was refactored: ND6_LLINFO_PERMANENT condition was removed as always-true, explicit ND6_LLINFO_NOSTATE -> ND6_LLINFO_INCOMPLETE state transition was removed as duplicate. Reviewed by: ae Sponsored by: Yandex LLC	2015-09-21 11:19:53 +00:00
Alexander V. Chernikov	1496229a91	Add "stale" timer back to nd6_cache_lladdr(). Setting timer was accidentally removed in r276844 due to misleading comment on its meaningless. Add it back to restore proper behaviour.	2015-09-21 10:24:34 +00:00
Alexander V. Chernikov	501adf0140	Cleanup nd6_cache_lladdr(). No functional changes. * Since new extries are now allocated explicitly, fill in all the necessary fields for lle _before_ attaching it to the table. * Remove ND6_LLINFO_INCOMPLETE check which was unused even in first KAME merge (r53541). * After that, the only new state that function can set, was ND6_LLINFO_STALE. Given everything above, simplify logic besides do_update and is_newentry. * Fix nd_resolve() comment.	2015-09-19 11:50:02 +00:00
Alexander V. Chernikov	41a31e783e	* Simplify logic besides llchange variable. * Refresh nd6_is_router() comment.	2015-09-18 07:18:10 +00:00
Alexander V. Chernikov	1fe201c322	Simplify the way of attaching IPv6 link-layer header. Problem description: How do we currently perform layer 2 resolution and header imposition: For IPv4 we have the following chain: ip_output() -> (ether\|atm\|whatever)_output() -> arpresolve() Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data. For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr() We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder). After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again. As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()). Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr() The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve() Error handling: Currently sending packet to non-existing la results in ip6_<output\|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether\|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0. (And EWOULDBLOCK is actually used by IB/TOE code). Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469	2015-09-16 14:26:28 +00:00
Alexander V. Chernikov	f0316e1acb	Constantify lookup key in several nd6_* functions.	2015-09-16 11:06:07 +00:00
Alexander V. Chernikov	0e2dcee6b2	Simplify nd6_cache_lladdr: * Move isRouter calculation code to separate nd6_is_router() function. * Make nd6_cache_lladdr() return void: its return value hasn't been used since r53541 KAME import in 1999. Sponsored by: Yandex LLC	2015-09-15 17:16:31 +00:00
Alexander V. Chernikov	d3cdb71655	* Require explicitl lle unlink prior to calling llentry_delete(). This one slightly decreases time of holding afdata wlock. * While here, make nd6_free() return void. No one has used its return value since r186119.	2015-09-15 06:48:19 +00:00
Eric van Gyzen	17a036563d	Fix the handling of IPv6 On-Link Redirects. On receipt of a redirect message, install an interface route for the redirected destination. On removal of the corresponding Neighbor Cache entry, remove the interface route. This requires changes in rtredirect_fib() to cope with an AF_LINK address for the gateway and with the absence of RTF_GATEWAY. This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo Phase 2 test suite. Unrelated to the above, fix a recursion on the radix node head lock triggered by the Tahi Redirected to Alternate Router test cases. When I first wrote this patch in October 2012, all Section 2 (Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE, and 8-STABLE. cem@ recently rebased the 10.x patch onto head and reported that it passes Tahi. (Thanks!) These other test cases also passed in 2012: * the RTF_MODIFIED case, with IPv4 and IPv6 (using a RTF_HOST\|RTF_GATEWAY route for the destination) * the redirected-to-self case, with IPv4 and IPv6 * a valid IPv4 redirect All testing in 2012 was done with WITNESS and INVARIANTS. Tested by: EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015, Mark Kelley <mark_kelley@dell.com> in 2012, TC Telkamp <terence_telkamp@dell.com> in 2012 PR: 152791 Reviewed by: melifaro (current rev), bz (earlier rev) Approved by: kib (mentor) MFC after: 1 month Relnotes: yes Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D3602	2015-09-14 19:17:25 +00:00
Alexander V. Chernikov	3e7a2321e3	* Do more fine-grained locking: call eventhandlers/free_entry without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573	2015-09-14 16:48:19 +00:00
Hiroki Sato	120ff2d73d	Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6 forgotten in the previous commit. MFC after: 3 days	2015-09-10 08:37:03 +00:00
Hiroki Sato	e3884653f6	- Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6. These are quite old APIs and there is no consumer now. MFC after: 3 days	2015-09-10 06:31:24 +00:00
Hiroki Sato	d0bec2c522	- Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6. These are quite old APIs and there is no consumer now. - Simplify first and duplicate LLA check. MFC after: 3 days	2015-09-10 06:29:18 +00:00
Hiroki Sato	1fce58fc62	Do not add IN6_IFF_TENTATIVE when ND6_IFF_NO_DAD. MFC after: 3 days	2015-09-10 06:10:30 +00:00
Hiroki Sato	3ba7e4ce9c	Remove IN6_IFF_NOPFX. This flag was no longer used. MFC after: 3 days	2015-09-10 06:08:42 +00:00
Adrian Chadd	68bb8d6249	Add support for receiving flowtype, flowid and RSS bucket information as part of recvmsg(). Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3562	2015-09-06 20:57:57 +00:00
Alexander V. Chernikov	26deb8826c	Do not pass lle to nd6_ns_output(). Use newly-added nd6_llinfo_get_holdsrc() to extract desired IPv6 source from holdchain and pass it to the nd6_ns_output().	2015-09-05 14:14:03 +00:00
Alexander V. Chernikov	deeedaa549	Do not skip entries without LLE_VALID flag. This one fixes showing incomplete entries in ndp -an. MFC after: 2 weeks	2015-09-05 06:24:00 +00:00
Alexander V. Chernikov	91bfd68e38	Make in6ifa_ifpwithaddr() take const param. Remove unneded DECONST from in6_lltable_rtcheck().	2015-09-05 05:54:09 +00:00
Alexander V. Chernikov	3b0fd911fa	Simplify lla_rt_output()/nd6_add_ifa_lle() by setting lle state in alloc handler, based on flags.	2015-08-31 05:03:36 +00:00
Adrian Chadd	0be189151f	Implement RSS hashing/re-hashing for IPv6 ingress packets. This mirrors the basic IPv4 implementation - IPv6 packets under RSS now are checked for a correct RSS hash and if one isn't provided, it's done in software. This only handles the initial receive - it doesn't yet handle reinjecting / rehashing packets after being decapsulated from various tunneling setups. That'll come in some follow-up work. For non-RSS users, this is almost a giant no-op. It does change a couple of ipv6 methods to use const mbuf * instead of mbuf * but it doesn't have any functional changes. So, the following now occurs: * If the NIC doesn't do any RSS hashing, it's all done in software. Single-queue, non-RSS NICs will now have the RX path distributed into multiple receive netisr queues. * If the NIC provides the wrong hash (eg only IPv6 hash when we needed an IPv6 TCP hash, or IPv6 UDP hash when we expected IPv6 hash) then the hash is recalculated. * .. if the hash is recalculated, it'll end up being injected into the correct netisr queue for v6 processing. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3504	2015-08-29 07:14:29 +00:00
Bjoern A. Zeeb	196074f3b2	remove a left-over after r220463 empty #ifdef INET check. MFC after: 1 week	2015-08-28 09:38:18 +00:00
Adrian Chadd	e5562eb934	Replace the printf()s with optional rate limited debugging for RSS. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3471	2015-08-28 05:58:16 +00:00
Bjoern A. Zeeb	a86e5c96af	get_inpcbinfo() and get_pcblist() are UDP local functions and do not do what one would expect by name. Prefix them with "udp_" to at least obviously limit the scope. This is a non-functional change. Reviewed by: gnn, rwatson MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3505	2015-08-27 15:27:41 +00:00
Adrian Chadd	2bf1d4880d	Call the new RSS hash calculation function to correctly calculate a hash based on the configured requirements for the protocol. Tested: * UDP IPv6 TX/RX testing, w/ RSS enabled, 82599 ixgbe(4) hardware	2015-08-25 06:12:59 +00:00
Adrian Chadd	20dbdf88a5	Implement the IPv6 RSS software hash function. This isn't yet linked into the receive/transmit paths anywhere just yet. This is part of a GSoC 2015 project. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Reviewed by: hiren, gnn Differential Revision: https://reviews.freebsd.org/D3423	2015-08-24 05:36:08 +00:00
Hiroki Sato	fb583bd228	- Deprecate IN6_IFF_NODAD. It was used to prevent DAD on a loopback interface but in6if_do_dad() already had a check for IFF_LOOPBACK. - Remove in6if_do_dad() check in in6_broadcast_ifa(). An address which needs DAD always has IN6_IFF_TENTATIVE there. - in6if_do_dad() now returns EAGAIN when the interface is not ready since DAD callout handler ignores such an interface. - In DAD callout handler, mark an address as IN6_IFF_TENTATIVE when the interface has ND6_IFF_IFDISABLED. And Do IFF_UP and IFF_DRV_RUNNING check consistently when DAD is required. - draft-ietf-6man-enhanced-dad is now published as RFC 7527. - Fix some typos.	2015-08-24 05:21:49 +00:00
Alexander V. Chernikov	5a2555160f	* Split allocation and table linking for lle's. Before that, the logic besides lle_create() was the following: return existing if found, create if not. This behaviour was error-prone since we had to deal with 'sudden' static<>dynamic lle changes. This commit fixes bunch of different issues like: - refcount leak when lle is converted to static. Simple check case: console 1: while true; do for i in `arp -an\|awk '$4~/incomp/{print$2}'\|tr -d '()'`; do arp -s $i 00:22:44:66:88:00 ; arp -d $i; done; done console 2: ping -f any-dead-host-in-L2 console 3: # watch for memory consumption: vmstat -m \| awk '$1~/lltable/{print$2}' - possible problems in arptimer() / nd6_timer() when dropping/reacquiring lock. New logic explicitly handles use-or-create cases in every lla_create user. Basically, most of the changes are purely mechanical. However, we explicitly avoid using existing lle's for interface/static LLE records. * While here, call lle_event handlers on all real table lle change. * Create lltable_free_entry() calling existing per-lltable lle_free_t callback for entry deletion	2015-08-20 12:05:17 +00:00
Alexander V. Chernikov	0447c1367a	Use single 'lle_timer' callout in lltable instead of two different names of the same timer.	2015-08-11 12:38:54 +00:00
Alexander V. Chernikov	314294de5c	Store addresses instead of sockaddrs inside llentry. This permits us having all (not fully true yet) all the info needed in lookup process in first 64 bytes of 'struct llentry'. struct llentry layout: BEFORE: [rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]] AFTER [ in[6]_addr MAC .. state .. rwlock ] Currently, address part of struct llentry has only 16 bytes for the key. However, lltable does not restrict any custom lltable consumers with long keys use the previous approach (store key at (lle+1)). Sponsored by: Yandex LLC	2015-08-11 09:26:11 +00:00
Alexander V. Chernikov	41cb42a633	MFP r276712. * Split lltable_init() into lltable_allocate_htbl() (alloc hash table with default callbacks) and lltable_link() ( links any lltable to the list). * Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field. * Move lltable setup to separate functions in in[6]_domifattach.	2015-08-11 05:51:00 +00:00
Alexander V. Chernikov	2caee4be35	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian	2015-08-10 20:50:31 +00:00
Alexander V. Chernikov	11cdad9873	Partially merge r274887,r275334,r275577,r275578,r275586 to minimize differences between projects/routing and HEAD. This commit tries to keep code logic the same while changing underlying code to use unified callbacks. * Add llt_foreach_entry method to traverse all entries in given llt * Add llt_dump_entry method to export particular lle entry in sysctl/rtsock format (code is not indented properly to minimize diff). Will be fixed in the next commits. * Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle. * Add llt_fill_sa_entry method to export address in the lle to sockaddr format. * Add llt_hash method to use in generic hash table support code. * Add llt_free_entry method which is used in llt_prefix_free code. * Prepare for fine-grained locking by separating lle unlink and deletion in lltable_free() and lltable_prefix_free(). * Provide lltable_get<ifp\|af>() functions to reduce direct 'struct lltable' access by external callers. * Remove @llt agrument from lle_free() lle callback since it was unused. * Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting. * Switch to per-af hashing code. * Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method. Update description from these functions. * Use unified lltable_free_entry() function instead of per-af one. Reviewed by: ae	2015-08-10 12:03:59 +00:00
Marius Strobl	6e4cd74673	Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT.	2015-08-08 21:41:59 +00:00
Alexander V. Chernikov	4bdf0b6a9a	MFP r274295: * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2015-08-08 18:14:59 +00:00
Alexander V. Chernikov	e362cf0e9f	MFP r274553: * Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete lla_create now returns with LLE_EXCLUSIVE lock for lle. * Provide typedefs for new/existing lltable callbacks. Reviewed by: ae	2015-08-08 17:48:54 +00:00
Alexander V. Chernikov	331dff0737	Simplify ip[6] simploop: Do not pass 'dst' sockaddr to ip[6]_mloopback: - We have explicit check for AF_INET in ip_output() - We assume ip header inside passed mbuf in ip_mloopback - We assume ip6 header inside passed mbuf in ip6_mloopback	2015-08-08 15:58:35 +00:00
Julien Charbon	ff9b006d61	Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc.	2015-08-03 12:13:54 +00:00
Andrey V. Elsukov	51a01baf23	Properly handle IPV6_NEXTHOP socket option in selectroute(). o remove disabled code; o if nexthop address is link-local, use embedded scope zone id to determine outgoing interface; o properly fill ro_dst before doing route lookup; o remove LLE lookup, instead check rt_flags for RTF_GATEWAY bit. Sponsored by: Yandex LLC	2015-08-02 12:40:56 +00:00
Andrey V. Elsukov	a6f7dea1fe	Remove redundant check.	2015-08-02 11:58:24 +00:00
Andrey V. Elsukov	10a0e0bf0a	Eliminate the use of m_copydata() in gif_encapcheck(). ip_encap already has inspected mbuf's data, at least an IP header. And it is safe to use mtod() and do direct access to needed fields. Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that mbuf has a packet header. Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also remove "martian filters" checks. According to RFC 4213 it is enough to verify that the source address is the address of the encapsulator, as configured on the decapsulator. Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-07-29 14:07:43 +00:00
Andrey V. Elsukov	cc0a3c8ca4	Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock. Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149	2015-07-29 08:12:05 +00:00
Michael Tuexen	4ff815b71c	Move including netinet/icmp6.h around to avoid a problem when including netinet/icmp6.h and net/netmap.h. Both use ni_flags... This allows to build multistack with SCTP support. MFC after: 1 week	2015-07-25 18:26:09 +00:00
Randall Stewart	f260c1b939	Fix inverted logic bug that David Wolfskill found (thanks David!) MFC after: 3 Weeks	2015-07-22 09:29:50 +00:00
Randall Stewart	c0d1be08f6	When a tunneling protocol is being used with UDP we must release the lock on the INP before calling the tunnel protocol, else a LOR may occur (it does with SCTP for sure). Instead we must acquire a ref count and release the lock, taking care to allow for the case where the UDP socket has gone away and not unlocking since the refcnt decrement on the inp will do the unlock in that case. Reviewed by: tuexen MFC after: 3 weeks	2015-07-21 09:54:31 +00:00
Andrey V. Elsukov	30aee13117	Add LLE event handler to report ND6 events to userland via rtsock. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2015-07-20 06:58:32 +00:00
Andrey V. Elsukov	585753c432	Invoke LLE event handler when entry is deleted. MFC after: 2 weeks Sponsored by: Yandex LLC	2015-07-20 06:54:50 +00:00
Andrey V. Elsukov	cb207f93ca	Keep IPv6 address specified by IPV6_PKTINFO socket option in kernel internal form to be able handle link-local IPv6 addresses. Reported by: kp Tested by: kp	2015-07-03 19:01:38 +00:00
Bjoern A. Zeeb	bfbc08b848	Move comment to the right position. PR: 152791 Submitted by: vangyzen (as part of the functional change) MFC after: 3 days	2015-07-03 09:53:56 +00:00
Michael Tuexen	d089f9b915	Add FIB support for SCTP. This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200379 MFC after: 3 days	2015-06-17 15:20:14 +00:00
Andrey V. Elsukov	4e870f943f	Move RTM announces into generic code to be independent from Layer2 code. This fixes bug introduced in 274988, when announces about new addresses don't sent for tunneling interfaces. Reported by: tuexen@ MFC after: 1 week	2015-05-29 10:24:16 +00:00
Michael Tuexen	b7d130befc	Fix and cleanup the debug information. This has no user-visible changes. Thanks to Irene Ruengeler for proving a patch. MFC after: 3 days	2015-05-28 16:00:23 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Andrey V. Elsukov	c1b4f79dfa	Add an ability accept encapsulated packets from different sources by one gif(4) interface. Add new option "ignore_source" for gif(4) interface. When it is enabled, gif's encapcheck function requires match only for packet's destination address. Differential Revision: https://reviews.freebsd.org/D2004 Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2015-05-15 12:19:45 +00:00
Hiroki Sato	59333867ff	- Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice because a link where looped back NS messages are permanently observed does not work with either NDP or ARP for IPv4. - draft-ietf-6man-enhanced-dad is now RFC 7527. Discussed with: hiren MFC after: 3 days	2015-05-12 03:31:57 +00:00
Andrey V. Elsukov	654bdb5abb	Mark data checksum as valid for multicast packets, that we send back to myself via simloop. Also remove duplicate check under #ifdef DIAGNOSTIC. PR: 180065 MFC after: 1 week	2015-05-07 14:17:43 +00:00
Andrey V. Elsukov	db037aa4ed	Remove unneded #ifdef INET6 and IPSEC. This file compiled only when both options are defined. Include opt_sctp.h and sctp_crc32.h to enable #ifdef SCTP code block and delayed checksum calculation for SCTP.	2015-05-07 12:15:45 +00:00
Gleb Smirnoff	0fa5aacd8b	Remove #ifdef IFT_FOO. Submitted by: Guy Yur <guyyur gmail.com>	2015-05-02 20:31:27 +00:00
Andrey V. Elsukov	3e92c37f32	Remove now unneded KEY_FREESP() for case when ipsec[46]_process_packet() returns EJUSTRETURN. Sponsored by: Yandex LLC	2015-04-27 01:11:09 +00:00
Andrey V. Elsukov	3d80e82d60	Fix possible use after free due to security policy deletion. When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(), we hold one reference to security policy and release it just after return from this function. But IPSec processing can be deffered and when we release reference to security policy after ipsec[46]_process_packet(), user can delete this security policy from SPDB. And when IPSec processing will be done, xform's callback function will do access to already freed memory. To fix this move KEY_FREESP() into callback function. Now IPSec code will release reference to SP after processing will be finished. Differential Revision: https://reviews.freebsd.org/D2324 No objections from: #network Sponsored by: Yandex LLC	2015-04-27 00:55:56 +00:00
Gleb Smirnoff	210b5c73e7	Fix r281649: don't call in6_clearscope() twice. Submitted by: ae	2015-04-17 15:26:08 +00:00
Gleb Smirnoff	28ebe80cab	Provide functions to determine presence of a given address configured on a given interface. Discussed with: np Sponsored by: Nginx, Inc.	2015-04-17 11:57:06 +00:00
Mark Johnston	dff78447a4	Fix a possible refcount leak in regen_tmpaddr(). public_ifa6 may be set to NULL after taking a reference to a previous address list element. Instead, only take the reference after leaving the loop but before releasing the address list lock. Differential Revision: https://reviews.freebsd.org/D2253 Reviewed by: ae MFC after: 2 weeks	2015-04-13 01:55:42 +00:00
Andrey V. Elsukov	e2956804dd	Fix the IPV6_MULTICAST_IF sockopt handling. RFC 3493 says when the interface index is specified as zero, the system should select the interface to use for outgoing multicast packets. Even the comment for the in6p_set_multicast_if() function says about index of zero. But in fact for zero index the function just returns EADDRNOTAVAIL. I.e. if you first set some interface and then will try reset it with zero ifindex, you will get EADDRNOTAVAIL. Reset im6o_multicast_ifp to NULL when interface index specified as zero. Also return EINVAL in case when ifnet_byindex() returns NULL. This will be the same behaviour as when ifindex is bigger than V_if_index. And return EADDRNOTAVAIL only when interface is not multicast capable. Reported by: Olivier Cochard-Labbé MFC after: 2 weeks Sponsored by: Yandex LLC	2015-04-10 19:09:51 +00:00
Andrey V. Elsukov	efb19cf6db	Fix the check for maximum mbuf's size needed to send ND6 NA and NS. It is acceptable that the size can be equal to MCLBYTES. In the later KAME's code this check has been moved under DIAGNOSTIC ifdef, because the size of NA and NS is much smaller than MCLBYTES. So, it is safe to replace the check with KASSERT. PR: 199304 Discussed with: glebius MFC after: 1 week	2015-04-09 12:57:58 +00:00
Kristof Provost	53deb05c36	Evaluate packet size after the firewall had its chance Defer the packet size check until after the firewall has had a look at it. This means that the firewall now has the opportunity to (re-)fragment an oversized packet. Differential Revision: https://reviews.freebsd.org/D1815 Reviewed by: ae Approved by: gnn (mentor)	2015-04-07 20:29:03 +00:00
Xin LI	dd3856601d	Mitigate Local Denial of Service with IPv6 Router Advertisements and log attack attempts. Submitted by: hrs Security: FreeBSD-SA-15:09.nd6 Security: CVE-2015-2923	2015-04-07 20:20:09 +00:00
Gleb Smirnoff	c151f24d08	o Make net.inet6.ip6.mif6table return special API structure, that doesn't contain kernel pointers, and instead has interface index. Bump __FreeBSD_version for that change. o Now, netstat/mroute6.c no longer needs to kvm_read(3) struct ifnet, and no longer needs to include if_var.h Note that this change is far from being a complete move of IPv6 multicast routing to a proper API. Other structures are still dumped into their sysctls as is, requiring userland application to #define _KERNEL when including ip6_mroute.h and then call kvm_read(3) to gather all bits and pieces. But fixing this is out of scope of the opaque ifnet project. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-04-06 22:12:18 +00:00
Kristof Provost	31e2e88c27	Remove duplicate code We'll just fall into the same local delivery block under the 'if (m->m_flags & M_FASTFWD_OURS)'. Suggested by: ae Differential Revision: https://reviews.freebsd.org/D2225 Approved by: gnn (mentor)	2015-04-06 19:08:44 +00:00
Kristof Provost	798318490e	Preserve IPv6 fragment IDs accross reassembly and refragmentation When forwarding fragmented IPv6 packets and filtering with PF we reassemble and refragment. That means we generate new fragment headers and a new fragment ID. We already save the fragment IDs so we can do the reassembly so it's straightforward to apply the incoming fragment ID on the refragmented packets. Differential Revision: https://reviews.freebsd.org/D2188 Approved by: gnn (mentor)	2015-04-01 12:15:01 +00:00
Gleb Smirnoff	20778ab5b4	Move ip6_sprintf() declaration from in6_var.h to in6.h. This is a simple function that works with in6_addr and it is not related to the INET6 stack implementation. Sponsored by: Nginx, Inc.	2015-03-24 16:45:50 +00:00
Andrey V. Elsukov	ff9f2a36de	To avoid a possible race, release the reference to ifa after return from nd6_dad_na_input(). Submitted by: Alexandre Martins MFC after: 1 week	2015-03-19 00:04:25 +00:00
Andrey V. Elsukov	fd8dd3a6d7	tcp6_ctlinput() doesn't pass MTU value to in6_pcbnotify(). Check cmdarg isn't NULL before dereference, this check was in the ip6_notify_pmtu() before r279588. Reported by: Florian Smeets MFC after: 1 week	2015-03-06 05:50:39 +00:00
Hiroki Sato	23e9ffb0e1	- Implement loopback probing state in enhanced DAD algorithm. - Add no_dad and ignoreloop per-IF knob. no_dad disables DAD completely, and ignoreloop is to prevent infinite loop in loopback probing state when loopback is permanently expected.	2015-03-05 21:27:49 +00:00
Andrey V. Elsukov	8f1beb889e	Fix deadlock in IPv6 PCB code. When several threads are trying to send datagram to the same destination, but fragmentation is disabled and datagram size exceeds link MTU, ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all sockets wanted to know MTU to this destination. And since all threads hold PCB lock while sending, taking the lock for each PCB in the in6_pcbnotify() leads to deadlock. RFC 3542 p.11.3 suggests notify all application wanted to receive IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message. But it doesn't require this, when we don't receive ICMPv6 message. Change ip6_notify_pmtu() function to be able use it directly from ip6_output() to notify only one socket, and to notify all sockets when ICMPv6 packet too big message received. PR: 197059 Differential Revision: https://reviews.freebsd.org/D1949 Reviewed by: no objection from #network Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2015-03-04 11:20:01 +00:00
Andrey V. Elsukov	1eef8a6c08	Create nd6_ns_output_fib() function with extra argument fibnum. Use it to initialize mbuf's fibnum. Uninitialized fibnum value can lead to panic in the routing code. Currently we use only RT_DEFAULT_FIB value for initialization. Differential Revision: https://reviews.freebsd.org/D1998 Reviewed by: hrs (previous version) Sponsored by: Yandex LLC	2015-03-03 10:50:03 +00:00
Hiroki Sato	8d56075939	Nonce has to be non-NULL for DAD even if net.inet6.ip6.dad_enhanced=0.	2015-03-03 04:28:19 +00:00
Hiroki Sato	11d8451df3	Implement Enhanced DAD algorithm for IPv6 described in draft-ietf-6man-enhanced-dad-13. This basically adds a random nonce option (RFC 3971) to NS messages for DAD probe to detect a looped back packet. This looped back packet prevented DAD on some pseudo-interfaces which aggregates multiple L2 links such as lagg(4). The length of the nonce is set to 6 bytes. This algorithm can be disabled by setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis. Reported by: hiren Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D1835	2015-03-02 17:30:26 +00:00
Gleb Smirnoff	e072c794ad	Now that all users of _WANT_IFADDR are fixed, remove this crutch and hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 23:16:10 +00:00
Gleb Smirnoff	9e62a5a379	- Rename 'struct mld_ifinfo' into 'struct mld_ifsoftc', since it really represents a context. - Preserve name 'struct mld_ifinfo' for a new structure, that will be stable API between userland and kernel. - Make sysctl_mld_ifinfo() return the new 'struct mld_ifinfo', instead of old one, which had a bunch of internal kernel structures in it. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 22:37:01 +00:00
Gleb Smirnoff	fd1b2a7c57	Widen _KERNEL ifdef to hide more kernel network stack structures from userland.	2015-02-19 06:24:27 +00:00
Gleb Smirnoff	a99c84d4e6	Use new struct mbufq instead of struct ifqueue to manage packet queues in IPv6 multicast code. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 01:21:23 +00:00
Gleb Smirnoff	6c269f6912	Factor out ip6_fragment() function, to be used in IPv6 stack and pf(4). Submitted by: Kristof Provost Differential Revision: D1766	2015-02-16 06:30:27 +00:00
Gleb Smirnoff	e5ee706031	Move ip6_deletefraghdr() to frag6.c. Suggested by: bz	2015-02-16 05:58:32 +00:00
Gleb Smirnoff	0b438b0fb8	Factor out ip6_deletefraghdr() function, to be shared between IPv6 stack and pf(4). Submitted by: Kristof Provost Reviewed by: ae Differential Revision: D1764	2015-02-16 01:12:20 +00:00
Randall Stewart	2575fbb827	This fixes a bug in the way that the LLE timers for nd6 and arp were being used. They basically would pass in the mutex to the callout_init. Because they used this method to the callout system, it was possible to "stop" the callout. When flushing the table and you stopped the running callout, the callout_stop code would return 1 indicating that it was going to stop the callout (that was about to run on the callout_wheel blocked by the function calling the stop). Now when 1 was returned, it would lower the reference count one extra time for the stopped timer, then a few lines later delete the memory. Of course the callout_wheel was stuck in the lock code and would then crash since it was accessing freed memory. By using callout_init(c, 1) we always get a 0 back and the reference counting bug does not rear its head. We do have to make a few adjustments to the callouts themselves though to make sure it does the proper thing if rescheduled as well as gets the lock. Commented upon by hiren and sbruno See Phabricator D1777 for more details. Commented upon by hiren and sbruno Reviewed by: adrian, jhb and bz Sponsored by: Netflix Inc.	2015-02-09 19:28:11 +00:00
Andrey V. Elsukov	46386183da	Print IPv6 address in log message instead of address of pointer. MFC after: 1 week	2015-02-05 16:29:26 +00:00
Adrian Chadd	b2bdc62a95	Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific bits. The motivation here is to eventually teach netisr and potentially other networking subsystems a bit more about how RSS work queues / buckets are configured so things have a hope of auto-configuring in the future. * net/rss_config.[ch] takes care of the generic bits for doing configuration, hash function selection, etc; * topelitz.[ch] is now in net/ rather than netinet/; * (and would be in libkern if it didn't directly include RSS_KEYSIZE; that's a later thing to fix up.) * netinet/in_rss.[ch] now just contains the IPv4 specific methods; * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods. This should have no functional impact on anyone currently using the RSS support. Differential Revision: D1383 Reviewed by: gnn, jfv (intel driver bits)	2015-01-18 18:06:40 +00:00
Gleb Smirnoff	ffec6ee527	Do not go one layer down to check ifqueue length. First, not all drivers use ifqueue at all. Second, there is no point in this lockless check. Either positive or negative result of the check could be incorrect after a tick. Sponsored by: Nginx, Inc.	2015-01-12 14:52:43 +00:00
Michael Tuexen	4be807c4d6	Minimize the usage of SCTP_BUF_IS_EXTENDED. This should help Robert...	2015-01-10 20:49:57 +00:00
Alexander V. Chernikov	d63e657c04	* Deal with ARCNET L2 multicast mapping for IPv6 the same way as in IPv4: handle it in arc_output() instead of nd6_storelladdr(). * Remove IFT_ARCNET check from arpresolve() since arc_output() does not use arpresolve() to handle broadcast/multicast. This check was there since r84931. It looks like it was not used since r89099 (initial import of Arcnet support where multicast is handled separately). * Remove IFT_IEEE1394 case from nd6_storelladdr() since firewire_output() calles nd6_storelladdr() for unicast addresses only. * Remove IFT_ARCNET case from nd6_storelladdr() since arc_output() now handles multicast by itself. As a result, we have the following pattern: all non-ethernet-style media have their own multicast map handling inside their appropriate routines. On the other hand, arpresolve() (and nd6_storelladdr()) which meant to be 'generic' ones de-facto handles ethernet-only multicast maps. MFC after: 3 weeks	2015-01-09 12:56:51 +00:00
Alexander V. Chernikov	abc1be9062	Add forgotten definition for nd6_output_ifp().	2015-01-08 18:29:54 +00:00
Alexander V. Chernikov	d7968c29ec	* Use newly-created nd6_grab_holdchain() function to retrieve lle hold mbuf chain instead of calling full-blown nd6_output_lle() for each packet. This simplifies both callers and nd6_output_lle() implementation. * Make nd6_output_lle() static and remove now-unused lle and chain arguments. * Rename nd6_output_flush() -> nd6_flush_holdchain() to be consistent. * Move all pre-send transmit hooks to newly-created nd6_output_ifp(). Now nd6_output(), nd6_output_lle() and nd6_flush_holdchain() are using it to send mbufs to if_output. * Remove SeND hook from nd6_na_input() because it was implemented incorrectly since the beginning (r211501): - it tagged initial input mbuf (m) instead of m_hold - tagging _all_ mbufs in holdchain seems to be wrong anyway.	2015-01-08 18:02:05 +00:00
Alexander V. Chernikov	3a7498636a	* Allocate hash tables separately * Make llt_hash() callback more flexible * Default hash size and hashing method is now per-af * Move lltable allocation to separate function	2015-01-05 17:23:02 +00:00
Alexander V. Chernikov	df485dbe3e	Do not call LLE_WUNLOCK() for deleted lle.	2015-01-05 16:10:54 +00:00
Robert Watson	ed6a66ca6c	To ease changes to underlying mbuf structure and the mbuf allocator, reduce the knowledge of mbuf layout, and in particular constants such as M_EXT, MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single M_ALIGN() macro, implemented by a now-inlined m_align() function: - Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline. - Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align(). - Update consumers around the tree to simply use M_ALIGN(). This change eliminates a number of cases where mbuf consumers must be aware of whether or not mbufs returned by the allocator use external storage, but also assumptions about the size of the returned mbuf. This will make it easier to introduce changes in how we use external storage, as well as features such as variable-size mbufs. Differential Revision: https://reviews.freebsd.org/D1436 Reviewed by: glebius, trasz, gnn, bz Sponsored by: EMC / Isilon Storage Division	2015-01-05 09:58:32 +00:00
Alexander V. Chernikov	b44a7d5d87	* Use unified code for deleting entry by sockaddr instead of per-af one. * Remove now unused llt_delete_addr callback.	2015-01-03 19:09:06 +00:00
Alexander V. Chernikov	20dd899505	* Hide lltable implementation details in if_llatbl_var.h * Make most of lltable_* methods 'normal' functions instead of inline * Add lltable_get_<af\|ifp>() functions to access given lltable fields * Temporarily resurrect nd6_lookup() function	2015-01-03 16:04:28 +00:00
Alexander V. Chernikov	787cea14a5	Since @ln is the result of LLTABLE6(ifp) lookup its originating interface must always be @ifp. So change ln->lle_tbl->llt_ifp to ifp.	2015-01-03 14:18:48 +00:00
Alexander V. Chernikov	d2e0f37c22	Finish r275628 #2 : remove remaining 'base' references.	2015-01-03 14:09:35 +00:00
Adrian Chadd	492ccbe14d	Migrate the RSS IPv6 hash code to use pointers to the v6 addresses rather than passing them in by value. The eventual aim is to do incremental hash construction rather than all of the memcpy()'ing into a contiguous buffer for the hash function, which does show up as taking quite a bit of CPU during profiling. Tested: * a variety of laptops/desktop setups I have, with v6 connectivity Differential Revision: D1404 Reviewed by: bz, rpaulo	2014-12-31 22:52:43 +00:00
Andrey V. Elsukov	f188f14d43	Extern declarations in C files loses compile-time checking that the functions' calls match their definitions. Move them to header files. Reviewed by: jilles (previous version)	2014-12-25 21:32:37 +00:00
Andrey V. Elsukov	132c449079	Remove in_gif.h and in6_gif.h files. They only contain function declarations used by gif(4). Instead declare these functions in C files. Also make some variables static.	2014-12-23 16:17:37 +00:00
Michael Tuexen	caeae63f97	Plug a memory leak in an error code path. Reported by: Coverity CID: 1018936 MFC after: 3 days	2014-12-17 20:19:57 +00:00
Andrey V. Elsukov	44eb8bbe7b	Do not count security policy violation twice. ipsec*_in_reject() do this by their own. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 19:20:13 +00:00
Andrey V. Elsukov	49ada98eac	Use ipsec6_in_reject() to simplify ip6_ipsec_fwd() and ip6_ipsec_input(). ipsec6_in_reject() does the same things, also it counts policy violation errors. Do IPSEC check in the ip6_forward() after addresses checks. Also use ip6_ipsec_fwd() to make code similar to IPv4 implementation. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 19:09:57 +00:00
Andrey V. Elsukov	0275b2e369	Remove flag/flags argument from the following functions: ipsec_getpolicybyaddr() ipsec4_checkpolicy() ip_ipsec_output() ip6_ipsec_output() The only flag used here was IP_FORWARDING. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 18:35:34 +00:00
Andrey V. Elsukov	8922ddbe40	Move ip_ipsec_fwd() from ip_input() into ip_forward(). Remove check for presence PACKET_TAG_IPSEC_IN_DONE mbuf tag from ip_ipsec_fwd(). PACKET_TAG_IPSEC_IN_DONE tag means that packet is already handled by IPSEC code. This means that before IPSEC processing it was destined to our address and security policy was checked in the ip_ipsec_input(). After IPSEC processing packet has new IP addresses and destination address isn't our own. So, anyway we can't check security policy from the mbuf tag, because it corresponds to different addresses. We should check security policy that corresponds to packet attributes in both cases - when it has a mbuf tag and when it has not. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 16:53:29 +00:00
Andrey V. Elsukov	e58320f127	Remove PACKET_TAG_IPSEC_IN_DONE mbuf tag lookup and usage of its security policy. The changed block of code in ip*_ipsec_input() is called when packet has ESP/AH header. Presence of PACKET_TAG_IPSEC_IN_DONE mbuf tag in the same time means that packet was already handled by IPSEC and reinjected in the netisr, and it has another ESP/AH headers (encrypted twice?). Since it was already processed by IPSEC code, the AH/ESP headers was already stripped (and probably outer IP header was stripped too) and security policy from the tdb_ident was applied to those headers. It is incorrect to apply this security policy to current headers. Also make ip_ipsec_input() prototype similar to ip6_ipsec_input(). Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 14:58:55 +00:00
Andrey V. Elsukov	dd9cd45b44	Remove check for presence of PACKET_TAG_IPSEC_PENDING_TDB and PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED mbuf tags. They aren't used in FreeBSD. Instead check presence of PACKET_TAG_IPSEC_OUT_DONE mbuf tag. If it is found, bypass security policy lookup as described in the comment. PACKET_TAG_IPSEC_OUT_DONE tag added to mbuf when IPSEC code finishes ESP/AH processing. Since it was already finished, this means the security policy placed in the tdb_ident was already checked. And there is no reason to check it again here. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 14:43:44 +00:00
Mark Johnston	a37271c3b8	Revert r275695: nd6_dad_find() was already correct. Reported by: ae, kib Pointy hat to: markj	2014-12-11 09:16:45 +00:00
Mark Johnston	97712e3efc	Fix a bug in r266857: nd6_dad_find() must return NULL if it doesn't find a matching element in the DAD queue. Reported by: Holger Hans Peter Freyther <holger@freyther.de> MFC after: 3 days	2014-12-11 00:41:54 +00:00
Alexander V. Chernikov	ee7e9a4e17	* Do not assume lle has sockaddr key after struct lle: use llt_fill_sa_entry() llt method to store lle address in sa. * Eliminate L3_ADDR macro and either reference IPv4/IPv6 address directly from lle or use newly-created llt_fill_sa_entry(). * Do not store sockaddr inside arp/ndp lle anymore.	2014-12-09 00:48:08 +00:00
Alexander V. Chernikov	d82ed5051c	Simplify lle lookup/create api by using addresses instead of sockaddrs.	2014-12-08 23:23:53 +00:00
Mark Johnston	d6ad6a865a	Add refcounting to IPv6 DAD objects and simplify the DAD code to fix a number of races which could cause double frees or use-after-frees when performing DAD on an address. In particular, an IPv6 address can now only be marked as a duplicate from the DAD callout. Differential Revision: https://reviews.freebsd.org/D1258 Reviewed by: ae, hrs Reported by: rstone MFC after: 1 month	2014-12-08 04:44:40 +00:00
Alexander V. Chernikov	73b52ad896	Use llt_prepare_static_entry method to prepare valid per-af static entry.	2014-12-07 23:59:44 +00:00
Alexander V. Chernikov	0368226e65	* Retire abstract llentry_free() in favor of lltable_drop_entry_queue() and explicit calls to RTENTRY_FREE_LOCKED() * Use lltable_prefix_free() in arp_ifscrub to be consistent with nd6. * Rename <lltable_\|llt>_delete function to _delete_addr() to note that this function is used to external callers. Make this function maintain its own locking. * Use lookup/unlink/clear call chain from internal callers instead of delete_addr. * Fix LLE_DELETED flag handling	2014-12-07 23:08:07 +00:00
Alexander V. Chernikov	721cd2e032	Do not enforce particular lle storage scheme: * move lltable allocation to per-domain callbacks. * make llentry_link/unlink functions overridable llt methods. * make hash table traversal another overridable llt method.	2014-12-07 17:32:06 +00:00
Alexander V. Chernikov	a743ccd468	* Add llt_clear_entry() callback which is able to do all lle cleanup including unlinking/freeing * Relax locking in lltable_prefix_free_af/lltable_free * Do not pass @llt to lle free callback: it is always NULL now. * Unify arptimer/nd6_llinfo_timer: explicitly unlock lle avoiding unlock/lock sequinces * Do not pass unlocked lle to nd6_ns_output(): add nd6_llinfo_get_holdsrc() to retrieve preferred source address from lle hold queue and pass it instead of lle. * Finally, make nd6_create() create and return unlocked lle * Separate defrtr handling code from nd6_free(): use nd6_check_del_defrtr() to check if we need to keep entry instead of performing GC, use nd6_check_recalc_defrtr() to perform actual recalc on lle removal. * Move isRouter handling from nd6_cache_lladdr() to separate nd6_check_router() * Add initial code to maintain lle runtime flags in sync.	2014-12-07 15:42:46 +00:00
Michael Tuexen	457b4b8836	This is the SCTP specific companion of https://svnweb.freebsd.org/changeset/base/275358 which was provided by Hans Petter Selasky.	2014-12-04 21:17:50 +00:00
Andrey V. Elsukov	2dfcd0ae9d	Remove unneded check. No need to do m_pullup to the size that we prepended. MFC after: 1 week Sponsored by: Yandex LLC	2014-12-02 05:41:03 +00:00
Andrey V. Elsukov	2d957916ef	Remove route chaching support from ipsec code. It isn't used for some time. * remove sa_route_union declaration and route_cache member from struct secashead; * remove key_sa_routechange() call from ICMP and ICMPv6 code; * simplify ip_ipsec_mtu(); * remove #include <net/route.h>; Sponsored by: Yandex LLC	2014-12-02 04:20:50 +00:00
Alexander V. Chernikov	9b65db85e2	Do more fine-grained locking in lltable code: lltable_create_lle() does actual new lle creation without extensive locking and existing lle search. Move lle updating code from gigantic in_arpinput() to arp_update_llle() and some other functions. IPv6 changes to follow.	2014-12-01 21:43:48 +00:00
Hans Petter Selasky	c25290420e	Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies	2014-12-01 11:45:24 +00:00
Alexander V. Chernikov	ce313fdd71	* Unify lle table dump/prefix removal code. * Rename lla_XXX -> lltable_XXX_lle to reduce number of name prefixes used by lltable code.	2014-11-30 14:35:01 +00:00
Alexander V. Chernikov	5d14e4cd76	Provide rte_<get\|set> methods to access rtentry for external consumers.	2014-11-29 19:27:43 +00:00
Alexander V. Chernikov	74860d4f7c	Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr - return lle flags IFF needed. Do not pass rte to arpresolve - pass is_gateway flag instead.	2014-11-27 23:06:25 +00:00
Andrey V. Elsukov	af6209a133	Skip L2 addresses lookups for p2p interfaces. Discussed with: melifaro Sponsored by: Yandex LLC	2014-11-24 21:51:43 +00:00
Alexander V. Chernikov	73d770287d	Do more fine-grained lltable locking: use table runtime lock as rare as we can.	2014-11-23 15:38:06 +00:00
Alexander V. Chernikov	9479029b1f	* Add lltable llt_hash callback * Move lltable items insertions/deletions to generic llt code.	2014-11-23 12:15:28 +00:00
Alexander V. Chernikov	7c066c18db	Use less-invasive approach for IF_AFDATA lock: convert into 2 locks: use rwlock accessible via external functions (IF_AFDATA_CFG_* -> if_afdata_cfg_()) for all control plane tasks use rmlock (IF_AFDATA_RUN_) for fast-path lookups.	2014-11-22 19:53:36 +00:00
Alexander V. Chernikov	27688dfe1d	Temporarily revert r274774.	2014-11-22 17:57:54 +00:00
Alexander V. Chernikov	4194b42144	Another r274774 fix.	2014-11-21 23:37:14 +00:00
Alexander V. Chernikov	86b94cffe4	Finish r274774: add more headers/fix build for non-debug case.	2014-11-21 23:36:21 +00:00
Alexander V. Chernikov	9883e41b4b	Switch IF_AFDATA lock to rmlock	2014-11-21 02:28:56 +00:00
Alexander V. Chernikov	4d56c133fb	Sync to HEAD@r274766	2014-11-21 01:22:33 +00:00
Alexander V. Chernikov	f9723c7705	Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp we need to return. Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags fields for all relevant functions.	2014-11-20 22:41:59 +00:00
Alexander V. Chernikov	7f948f12f6	Finish r274175: do control plane MTU tracking. Update route MTU in case of ifnet MTU change. Add new RTF_FIXEDMTU to track explicitly specified MTU. Old behavior: ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU. User has to manually update all routes. ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU. However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu gets updated. New behavior: ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them. ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu. route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag. route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag. PR: 194238 MFC after: 1 month CR: D1125	2014-11-17 01:05:29 +00:00
Alexander V. Chernikov	df629abf3e	Rework LLE code locking: * struct llentry is now basically split into 2 pieces: all fields within 64 bytes (amd64) are now protected by both ifdata lock AND lle lock, e.g. you require both locks to be held exclusively for modification. All data necessary for fast path operations is kept here. Some fields were added: - r_l3addr - makes lookup key liev within first 64 bytes. - r_flags - flags, containing pre-compiled decision whether given lle contains usable data or not. Current the only flag is RLLE_VALID. - r_len - prepend data len, currently unused - r_kick - used to provide feedback to control plane (see below). All other fields are protected by lle lock. * Add simple state machine for ARP to handle "about to expire" case: Current model (for the fast path) is the following: - rlock afdata - find / rlock rte - runlock afdata - see if "expire time" is approaching (time_uptime + la->la_preempt > la->la_expire) - if true, call arprequest() and decrease la_preempt - store MAC and runlock rte New model (data plane): - rlock afdata - find rte - check if it can be used using r_* fields only - if true, store MAC - if r_kick field != 0 set it to 0. - runlock afdata New mode (control plane): - schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries) seconds instead of V_arpt_keep. - on first timer invocation change state from ARP_LLINFO_REACHABLE to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in V_arpt_rexmit (default to 1 sec). - on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks for r_kick value: reschedule if not changed, and send arprequest() if set to zero (e.g. entry was used). * Convert IPv4 path to use new single-lock approach. IPv6 bits to follow. * Slow down in_arpinput(): now valid reply will (in most cases) require acquiring afdata WLOCK twice. This is requirement for storing changed lle data. This change will be slightly optimized in future. * Provide explicit hash link/unlink functions for both ipv4/ipv6 code. This will probably be moved to generic lle code once we have per-AF hashing callback inside lltable. * Perform lle unlink on deletion immediately instead of delaying it to the timer routine. * Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the presence of lle reference used for safe callout calls.	2014-11-16 20:12:49 +00:00
Alexander V. Chernikov	b4b1367ae4	* Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete Assume lla_create to return LLE_EXCLUSIVE lock for lle. * Rework lla_rt_output to perform all lle changes under afdata WLOCK. * change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().	2014-11-15 18:54:07 +00:00
Andrey V. Elsukov	794a349c6f	We don't return sp pointer, thus NULL assignment isn't needed. And reference to sp will be freed at the end. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-12 22:58:52 +00:00
Alexander V. Chernikov	670e8b3b8c	Kill custom in_matroute() radix mathing function removing one rte mutex lock. Initially in_matrote() in_clsroute() in their current state was introduced by r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them in route table, setting RTPRF_OURS flag and some expire time. After that, either GC came or RTPRF_OURS got removed on first-packet. It was a good solution in that days (and probably another decade after that) to keep TCP metrics. However, after moving metrics to TCP hostcache in r122922, most of in_rmx functionality became unused. It might had been used for flushing icmp-originated routes before rte mutexes/refcounting, but I'm not sure about that. So it looks like this is nearly impossible to make GC do its work nowadays: in_rtkill() ignores non-RTPRF_OURS routes. route can only become RTPRF_OURS after dropping last reference via rtfree() which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes. Dynamic routes can still be installed via received redirect, but they have default lifetime (no specific rt_expire) and no one has another trie walker to call RTFREE() on them. So, the changelist: * remove custom rnh_match / rnh_close matching function. * remove all GC functions * partially revert r256695 (proto3 is no more used inside kernel, it is not possible to use rt_expire from user point of view, proto3 support is not complete) * Finish r241884 (similar to this commit) and remove remaining IPv6 parts MFC after: 1 month	2014-11-11 02:52:40 +00:00
Andrey V. Elsukov	002c24396d	Add sa6_checkzone_ifp() function. It checks correctness of struct sockaddr_in6, usually obtained from the user level through ioctl. It initializes sin6_scope_id using given interface. Sponsored by: Yandex LLC	2014-11-10 16:12:51 +00:00
Alexander V. Chernikov	e0c0711e01	* Make nd6_dad_duplicated() constant. * Simplify refcounting by using nd6_dad_add() / nd6_dad_del(). Reviewed by: ae MFC after: 2 weeks Sponsored by: Yandex LLC	2014-11-10 16:01:39 +00:00
Andrey V. Elsukov	06fec20791	Remove link-local multicast routes remnants from in6_purgeaddr. Also merge in6_purgeaddr_mc with in6_purgeaddr. Sponsored by: Yandex LLC	2014-11-10 16:01:31 +00:00
Gleb Smirnoff	e6abaf91f4	Consistently use if_link. Reviewed by: ae, melifaro	2014-11-10 15:56:30 +00:00
Andrey V. Elsukov	45d1880a36	For now handle only multicast addresses, we still use routes to LLA unicasts yet. Sponsored by: Yandex LLC	2014-11-10 10:59:08 +00:00
Alexander V. Chernikov	f7bab8d0dd	Switch route radix to dual-lock model: use rmlock for data patch access, and config rwlock for conrol plane processing. Route table changes require bock locks held.	2014-11-10 00:07:06 +00:00
Andrey V. Elsukov	ea455de91d	Use embedded scope zone id to determine outgoing interface for link-local and node-local addresses.	2014-11-09 22:54:40 +00:00
Alexander V. Chernikov	36f34ac70b	Fix nd6_output_flush() prototype. Remove 'net/route_internal.h' header from stf.	2014-11-09 22:16:50 +00:00
Alexander V. Chernikov	603eaf792b	Renove faith(4) and faithd(8) from base. It looks like industry have chosen different (and more traditional) stateless/statuful NAT64 as translation mechanism. Last non-trivial commits to both faith(4) and faithd(8) happened more than 12 years ago, so I assume it is time to drop RFC3142 in FreeBSD. No objections from: net@	2014-11-09 21:33:01 +00:00
Alexander V. Chernikov	d0f9fca40d	Remove forgotten arguments.	2014-11-09 16:57:31 +00:00
Alexander V. Chernikov	033074c440	Replace 'struct route ' if_output() argument with 'struct nhop_info '. Leave 'struct route' as is for legacy routing api users. Remove most of rtalloc_ign*-derived functions.	2014-11-09 16:33:04 +00:00
Alexander V. Chernikov	9c9bde01d1	Remove unused 'struct route *' argument from nd6_output_flush().	2014-11-09 16:20:27 +00:00
Alexander V. Chernikov	55e5eda676	Separate radix and routing: use different structures for route and for other customers. Introduce new 'struct rib_head' for routing purposes and make all routing api use it.	2014-11-09 00:36:39 +00:00
Andrey V. Elsukov	3e88eb903b	Remove ip6_getdstifaddr() and all functions to work with auxiliary data. It isn't safe to keep unreferenced ifaddrs. Use in6ifa_ifwithaddr() to determine ifaddr corresponding to destination address. Since currently we keep addresses with embedded scope zone, in6ifa_ifwithaddr is called with zero zoneid and marked with XXX. Also remove route and lle lookups from ip6_input. Use in6ifa_ifwithaddr() instead. Sponsored by: Yandex LLC	2014-11-08 19:38:34 +00:00
Alexander V. Chernikov	a9413f6ca0	Sync to HEAD@r274297.	2014-11-08 18:13:35 +00:00
Alexander V. Chernikov	1398ffe5bc	Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2014-11-08 16:38:15 +00:00
Alexander V. Chernikov	3939f50c88	Finish r274290#2: remove unused IPv6 code.	2014-11-08 16:31:11 +00:00
Alexander V. Chernikov	22b08fd8b7	Split radix implementation and system route table structure: use new 'struct radix_head' for radix.	2014-11-07 22:52:02 +00:00
Andrey V. Elsukov	f325335caf	Overhaul if_gre(4). Split it into two modules: if_gre(4) for GRE encapsulation and if_me(4) for minimal encapsulation within IP. gre(4) changes: * convert to if_transmit; * rework locking: protect access to softc with rmlock, protect from concurrent ioctls with sx lock; * correct interface accounting for outgoing datagramms (count only payload size); * implement generic support for using IPv6 as delivery header; * make implementation conform to the RFC 2784 and partially to RFC 2890; * add support for GRE checksums - calculate for outgoing datagramms and check for inconming datagramms; * add support for sending sequence number in GRE header; * remove support of cached routes. This fixes problem, when gre(4) doesn't work at system startup. But this also removes support for having tunnels with the same addresses for inner and outer header. * deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD. Use our standard ioctls for tunnels. me(4): * implementation conform to RFC 2004; * use if_transmit; * use the same locking model as gre(4); PR: 164475 Differential Revision: D1023 No objections from: net@ Relnotes: yes Sponsored by: Yandex LLC	2014-11-07 19:13:19 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Gleb Smirnoff	428cf06b31	Remove VNET_SYSCTL_ARG(). The generic sysctl(9) code handles that. Reviewed by: ae Sponsored by: Nginx, Inc.	2014-11-07 08:58:05 +00:00
Alexander V. Chernikov	064b1bdb2d	Convert lle rtchecks to use new routing API. For inet/ case, this involves reverting r225947 which seem to be pretty strange commit and should be reverted in HEAD ad well.	2014-11-06 23:35:22 +00:00
Alexander V. Chernikov	146a181f28	Finish r274118: remove useless fields from struct domain. Sponsored by: Yandex LLC	2014-11-06 14:39:04 +00:00
Alexander V. Chernikov	1a75e3b20f	Make checks for rt_mtu generic: Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking might be an option in some situation, it is not feasible to do MTU checks there: generic (or per-domain) routing code is perfectly capable of doing this. We currrently have 3 places where MTU is altered: 1) route addition. In this case domain overrides radix _addroute callback (in[6]_addroute) and all necessary checks/fixes are/can be done there. 2) route change (especially, GW change). In this case, there are no explicit per-domain calls, but one can override rte by setting ifa_rtrequest hook to domain handler (inet6 does this). 3) ifconfig ifaceX mtu YYYY In this case, we have no callbacks, but ip[6]_output performes runtime checks and decreases rt_mtu if necessary. Generally, the goals are to be able to handle all MTU changes in control plane, not in runtime part, and properly deal with increased interface MTU. This commit changes the following: * removes hooks setting MTU from drivers side * adds proper per-doman MTU checks for case 1) * adds generic MTU check for case 2) * The latter is done by using new dom_ifmtu callback since if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size. However, IPv6 mtu might be different from if_mtu one (e.g. default 1280) for some cases, so we need an abstract way to know maximum MTU size for given interface and domain. * moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies user-supplied data which must be checked. * removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to use this functions on new non-inserted rte. More changes will follow soon. MFC after: 1 month Sponsored by: Yandex LLC	2014-11-06 13:13:09 +00:00
Alexander V. Chernikov	9f25cbe45e	Remove old hack abusing domattach from NFS code. According to IANA RPC uaddr registry, there are no AFs except IPv4 and IPv6, so it's not worth being too abstract here. Remove ne_rtable[AF_MAX+1] and use explicit per-AF radix tries. Use own initialization without relying on domattach code. While I admit that this was one of the rare places in kernel networking code which really was capable of doing multi-AF without any AF-depended code, it is not possible anymore to rely on dom* code. While here, change terrifying "Invalid radix node head, rn:" message, to different non-understandable "netcred already exists for given addr/mask", but less terrifying. Since we know that rn_addaddr() returns NULL if the same record already exists, we should provide more friendly error. MFC after: 1 month	2014-11-05 00:58:01 +00:00
Alexander V. Chernikov	69b74805d5	Convert gif and stf to use new routing api.	2014-11-04 18:48:13 +00:00
Alexander V. Chernikov	5c9ef37854	Sync to HEAD@r274095.	2014-11-04 18:22:33 +00:00
Alexander V. Chernikov	8c3cfe0be0	Hide 'struct rtentry' and all its macro inside new header: net/route_internal.h The goal is to make its opaque for all code except route/rtsock and proto domain _rmx.	2014-11-04 17:28:13 +00:00
Alexander V. Chernikov	a9ac00b76b	Convert in6p_lookup_mcast_ifp() to use new routing api. * Add special fib6_lookup_nh_ifp() to return rt_ifp instead of rt_ifa->ifa_ifp for that.	2014-11-04 17:05:24 +00:00
Alexander V. Chernikov	257480b8ab	Convert netinet6/ to use new routing API. * Remove &ifpp from ip6_output() in favor of ri->ri_nh_info * Provide different wrappers to in6_selectsrc: Currently it is used by 2 differenct type of customers: - socket-based one, which all are unsure about provided address scope and - in-kernel ones (ND code mostly), which don't have any sockets, options, crededentials, etc. So, we provide two different wrappers to in6_selectsrc() returning select source. * Make different versions of selectroute(): Currenly selectroute() is used in two scenarios: - SAS, via in6_selecsrc() -> in6_selectif() -> selectroute() - output, via in6_output -> wrapper -> selectroute() Provide different versions for each customer: - fib6_lookup_nh_basic()-based in6_selectif() which is capable of returning interface only, without MTU/NHOP/L2 calculations - full-blown fib6_selectroute() with cached route/multipath/ MTU/L2 * Stop using routing table for link-local address lookups * Add in6_ifawithifp_lla() to make for-us check faster for link-local * Add in6_splitscope / in6_setllascope for faster embed/deembed scopes	2014-11-04 15:39:56 +00:00
Hiroki Sato	da1304cb42	Fix a bug which prevented ND6_IFF_IFDISABLED flag from clearing when the newly-added IPv6 address was /128. PR: 188032	2014-11-02 21:58:31 +00:00
Andrey V. Elsukov	94a43496c2	Remove redundant code. if_detach already did these steps. Also, now we didn't keep routes to link-local addresses. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-10-30 12:44:46 +00:00
Andrey V. Elsukov	3c268b3afc	Move ifq drain into in6m_purge(). Suggested by: bms MFC after: 1 week Sponsored by: Yandex LLC	2014-10-30 11:34:07 +00:00
Andrey V. Elsukov	8ff1eae10d	Fix mbuf leak in IPv6 multicast code. When multicast capable interface goes away, it leaves multicast groups, this leads to generate MLD reports, but MLD code does deffered send and MLD reports are queued in the in6_multi's in6m_scq ifq. The problem is that in6_multi structures are freed when interface leaves multicast groups and thread that does deffered send will not take these queued packets. PR: 194577 MFC after: 1 week Sponsored by: Yandex LLC	2014-10-30 10:59:57 +00:00
Andrey V. Elsukov	c56173a626	Do not automatically install routes to link-local and interface-local multicast addresses. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-10-27 16:15:15 +00:00
Andrey V. Elsukov	8e4bdfa2db	Remove unused function. Sponsored by: Yandex LLC	2014-10-27 10:34:09 +00:00
Alexander V. Chernikov	30514718e7	Convert several places inside netinet6/ to new api.	2014-10-25 22:53:08 +00:00
Andrey V. Elsukov	a663aa4ce8	Remove redundant check and m_pullup() call.	2014-10-24 13:34:22 +00:00
Andrey V. Elsukov	0b9f5f8a5f	Overhaul if_gif(4): o convert to if_transmit; o use rmlock to protect access to gif_softc; o use sx lock to protect from concurrent ioctls; o remove a lot of unneeded and duplicated code; o remove cached route support (it won't work with concurrent io); o style fixes. Reviewed by: melifaro Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2014-10-14 13:31:47 +00:00
Robert Watson	f0cace5d94	When deciding whether to call m_pullup() even though there is adequate data in an mbuf, use M_WRITABLE() instead of a direct test of M_EXT; the latter both unnecessarily exposes mbuf-allocator internals in the protocol stack and is also insufficient to catch all cases of non-writability. (NB: m_pullup() does not actually guarantee that a writable mbuf is returned, so further refinement of all of these code paths continues to be required.) Reviewed by: bz MFC after: 3 days Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D900	2014-10-12 15:49:52 +00:00
Bryan Venteicher	81d3ec1763	Add context pointer and source address to the UDP tunnel callback These are needed for the forthcoming vxlan implementation. The context pointer means we do not have to use a spare pointer field in the inpcb, and the source address is required to populate vxlan's forwarding table. While I highly doubt there is an out of tree consumer of the UDP tunneling callback, this change may be a difficult to eventually MFC. Phabricator: https://reviews.freebsd.org/D383 Reviewed by: gnn	2014-10-10 06:08:59 +00:00
Bryan Venteicher	a0a9e1b57c	Add missing UDP multicast receive dtrace probes Phabricator: https://reviews.freebsd.org/D924 Reviewed by: rpaulo markj MFC after: 1 month	2014-10-09 22:36:21 +00:00
Bryan Venteicher	514929b193	Move the calls to u_tun_func() into udp6_append() A similar cleanup for UDPv4 was performed in r220620. Phabricator: https://reviews.freebsd.org/D383 Reviewed by: gnn MFC after: 1 month	2014-10-09 05:42:07 +00:00
Michael Tuexen	5558cc334d	Fix a bug introduced in https://svnweb.freebsd.org/base?view=revision&revision=272347 MFC after: 3 days	2014-10-07 16:01:17 +00:00
Michael Tuexen	4e1730b532	UPD and UDPLite require a checksum. So check for it. MFC after: 3 days	2014-10-03 08:46:49 +00:00
Michael Tuexen	5055cfcb4d	Check for UDP/IPv6 packets that the length in the UDP header is at least the minimum. Make the check similar to the one for UDPLite/IPv6. MFC after: 3 days	2014-10-02 10:49:01 +00:00
Michael Tuexen	76b96fbc9e	Fix the checksum computation for UDPLite/IPv6. This requires the usage of a function computing the checksum only over a part of the function. Therefore introduce in6_cksum_partial() and implement in6_cksum() based on that. While there, ensure that the UDPLite packet contains at least enough bytes to contain the header. Reviewed by: kevlo MFC after: 3 days	2014-10-02 10:32:24 +00:00
Hiroki Sato	9c57a5b630	Add an additional routing table lookup when m->m_pkthdr.fibnum is changed at a PFIL hook in ip{,6}_output(). IPFW setfib rule did not perform a routing table lookup when the destination address was not changed. CR: D805	2014-10-02 00:25:57 +00:00
Alexander V. Chernikov	31f0d081d8	Remove lock init from radix.c. Radix has never managed its locking itself. The only consumer using radix with embeded rwlock is system routing table. Move per-AF lock inits there.	2014-10-01 14:39:06 +00:00
Michael Tuexen	83e95fb30b	The default for UDPLITE_RECV_CSCOV is zero. RFC 3828 recommend that this means full checksum coverage for received packets. If an application is willing to accept packets with partial coverage, it is expected to use the socekt option and provice the minimum coverage it accepts. Reviewed by: kevlo MFC after: 3 days	2014-10-01 05:43:29 +00:00
Michael Tuexen	0f4a03663b	If the checksum coverage field in the UDPLITE header is the length of the complete UDPLITE packet, the packet has full checksum coverage. SO fix the condition. Reviewed by: kevlo MFC after: 3 days	2014-09-30 18:17:28 +00:00
Andrey V. Elsukov	d1729484d4	Remove redundant call to ipsec_getpolicybyaddr(). ipsec_hdrsiz() will call it internally. Sponsored by: Yandex LLC	2014-09-30 13:15:19 +00:00
Kevin Lo	0bc40ebf00	When plen != ulen, it should only be checked when this is UDP. Spotted by: bryanv	2014-09-30 07:28:31 +00:00
Alan Somers	4f8585e021	Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and ifa_ifwithdstaddr. For the sake of backwards compatibility, the new arguments were added to new functions named ifa_ifwithnet_fib and ifa_ifwithdstaddr_fib, while the old functions became wrappers around the new ones that passed RT_ALL_FIBS for the fib argument. However, the backwards compatibility is not desired for FreeBSD 11, because there are numerous other incompatible changes to the ifnet(9) API. We therefore decided to remove it from head but leave it in place for stable/9 and stable/10. In addition, this commit adds the fib argument to ifa_ifwithbroadaddr for consistency's sake. sys/sys/param.h Increment __FreeBSD_version sys/net/if.c sys/net/if_var.h sys/net/route.c Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute. sys/net/route.c sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_options.c sys/netinet/ip_output.c sys/netinet6/nd6.c Fixup calls of modified functions. share/man/man9/ifnet.9 Document changed API. CR: https://reviews.freebsd.org/D458 MFC after: Never Sponsored by: Spectra Logic	2014-09-11 20:21:03 +00:00
Andrey V. Elsukov	343e440f63	Add const qualifier to in6_addrhash() function. Add in6ifa_ifwithaddr() function. It is similar to ifa_ifwithaddr, but does fast lookup in the hash of inet6 addresses. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-11 13:18:41 +00:00
Andrey V. Elsukov	80803aa289	* use M_ZERO flag with malloc instead of explicit zeroing. * remove MULTI_SCOPE ifdef. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-11 12:54:17 +00:00
Andrey V. Elsukov	41874e85d6	Introduce new scope related functions. * new macro to remove magic number - IPV6_ADDR_SCOPES_COUNT; * sa6_checkzone() - this function checks sockaddr_in6 structure for correctness of sin6_scope_id. It also can fill correct value sometimes. * in6_getscopezone() - this function returns scope zone id for specified interface and scope. * in6_getlinkifnet() - this function returns struct ifnet for corresponding zone id of link-local scope. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-11 12:33:37 +00:00
Andrey V. Elsukov	573791d01c	* constify argument of in6_addrscope(); * use IN6_IS_ADDR_XXX() macro instead of hardcoded values; * for multicast addresses just return scope value, the only exception is addresses with 0x0F scope value (RFC 4291 p2.7.0); Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-11 10:27:59 +00:00
Andrey V. Elsukov	9196891fc9	Add additional checks for IPV6_PKTINFO handling (RFC 3542): * Return ENETDOWN when interface specified by ipi6_ifindex is not enabled for IPv6 use. * Return EADDRNOTAVAIL when ipi6_ifindex specifies an interface, but the address ipi6_addr is not available for use on that interface. * Return EINVAL when ipi6_addr is multicast address. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-10 14:32:07 +00:00
Andrey V. Elsukov	a7e201bbac	Make in6_pcblookup_hash_locked and in6_pcbladdr static. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-10 13:17:35 +00:00
Andrey V. Elsukov	1b44e5ffe3	Introduce INP6_PCBHASHKEY macro. Replace usage of hardcoded part of IPv6 address as hash key in all places. Obtained from: Yandex LLC	2014-09-10 12:35:42 +00:00
Andrey V. Elsukov	5dbfa43f65	Add the ability to set `prefer_source' flag to an IPv6 address. It affects the IPv6 source address selection algorithm (RFC 6724) and allows override the last rule ("longest matching prefix") for choosing among equivalent addresses. The address with `prefer_source' will be preferred source address. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2014-09-09 10:52:50 +00:00
Adrian Chadd	a4d98bf442	Add basic RSS awareness for the UDPv6 send path. This doesn't include the same kind of userland overriding that the IPv4 path has; nor does it yet know about 2-tuple versus 4-tuple hashing. That'll come later. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan	2014-09-09 04:20:53 +00:00
Adrian Chadd	b174de323a	Add IP_NODEFAULTFLOWID awareness to ip6_output(). Differential Revision: https://reviews.freebsd.org/D527	2014-09-09 00:21:21 +00:00
Michael Tuexen	24aaac8d59	Use union sctp_sockstore instead of struct sockaddr_storage. This eliminiates some warnings when building in userland. Thanks to Patrick Laimbock for reporting this issue. Remove also some unnecessary casts. There should be no functional change. MFC after: 1 week	2014-09-07 09:06:26 +00:00
Andrey V. Elsukov	ccc53de916	Add the reverse part to rule #9 . Also change its description in the netstat(8) output. MFC after: 1 week	2014-09-01 09:30:34 +00:00
Mark Johnston	5fc2632281	Add some missing checks for unsupported interfaces (e.g. pflog(4)) when handling ioctls. While here, remove duplicated checks for a NULL ifp in in6_control(): this check is already done near the beginning of the function. PR: 189117 Reviewed by: hrs MFC after: 2 weeks	2014-08-22 19:21:08 +00:00
Kevin Lo	73d76e77b6	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
Kevin Lo	8f5a8818f5	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
Andrey V. Elsukov	d6e6b9943b	Add new rule to source address selection algorithm. It prefers address with better virtual status. Use ifa_preferred() to choose better address. PR: 187341 Tested by: des MFC after: 1 week	2014-07-30 15:08:12 +00:00
Gleb Smirnoff	9753faf553	Garbage collect couple of unused fields from struct ifaddr: - ifa_claim_addr() unused since removal of NetAtalk - ifa_metric seems to be never utilized, always a copy of if_metric	2014-07-29 15:01:29 +00:00
Hiroki Sato	9be09a6e43	Fix EtherIP. TOS field must be initialized when the inner protocol is PF_LINK, and multicast/broadcast flag should always be dropped because the outer protocol uses unicast even when the inner address is not for unicast. It had been broken since r236951 when gif_output() started to use IFQ_HANDOFF().	2014-07-24 10:42:47 +00:00
Adrian Chadd	0ae3f42231	When it's time to do 4-tuple UDP IPv6 hashing, make sure this is a known type.	2014-07-20 07:39:54 +00:00
Adrian Chadd	c7c0d94874	Add IPv6 flowid, bindmulti and RSS awareness.	2014-07-12 05:46:33 +00:00
Adrian Chadd	a8a2d8003a	Add INP_RSS_BUCKET_SET awareness for IPv6 pcbgroup entries. This ensures that a listen socket with INP_RSS_BUCKET_SET set will use the pre-determined PCBGROUP rather than what the hashing path chooses.	2014-07-12 05:45:53 +00:00
Adrian Chadd	6e4405cee1	Add the IPv6 versions of the multi-bind, hash/hash type and RSS options.	2014-07-12 05:44:16 +00:00
Andrey V. Elsukov	ff899182ec	Fix condition. Sponsored by: Yandex LLC	2014-07-11 06:34:15 +00:00
Bryan Venteicher	6700a7d44b	Use the appropriate IPv6 hashtype defines when looking up the PCBGROUP Reviewed by: adrian@	2014-07-07 00:02:49 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Hajimu UMEMOTO	f4839cbc0a	Make nd6_gctimer tunable. MFC after: 1 week	2014-06-23 16:27:29 +00:00
Kevin Lo	ea93c6a613	Catch up with r186809, correct comments.	2014-06-23 05:17:39 +00:00
Andrey V. Elsukov	45b4fb0449	Remove unused variable. Sponsored by: Yandex LLC	2014-06-08 09:08:51 +00:00
Alan Somers	2f308a343f	Fix unintended KBI change from r264905. Add _fib versions of ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the _fib() versions with RT_ALL_FIBS, preserving legacy behavior. sys/net/if_var.h sys/net/if.c Add legacy-compatible functions as described above. Ensure legacy behavior when RT_ALL_FIBS is passed as fibnum. sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/net/route.c sys/net/rtsock.c sys/netinet6/nd6.c Call with _fib() functions if we must use a specific fib, or the legacy functions otherwise. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c Improve the udp_dontroute test. The bug that this test exercises is that ifa_ifwithnet() will return the wrong address, if multiple interfaces have addresses on the same subnet but with different fibs. The previous version of the test only considered one possible failure mode: that ifa_ifwithnet_fib() might fail to find any suitable address at all. The new version also checks whether ifa_ifwithnet_fib() finds the correct address by checking where the ARP request goes. Reported by: bz, hrs Reviewed by: hrs MFC after: 1 week X-MFC-with: 264905 Sponsored by: Spectra Logic	2014-05-29 21:03:49 +00:00
Hiroki Sato	82a9fa4a1d	Add rwlock to struct dadq. A panic could occur when a large number of addresses performed DAD at the same time.	2014-05-29 20:53:53 +00:00
VANHULLEBUS Yvan	aaf2cfc0d6	Fixed IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels. For IPv6-in-IPv4, you may need to do the following command on the tunnel interface if it is configured as IPv4 only: ifconfig <interface> inet6 -ifdisabled Code logic inspired from NetBSD. PR: kern/169438 Submitted by: emeric.poupon@netasq.com Reviewed by: fabient, ae Obtained from: NETASQ	2014-05-28 12:45:27 +00:00
Hiroki Sato	705bef548a	Cancel DAD for an ifa when the ifp has ND6_IFF_IFDISABLED as early as possible and do not clear IN6_IFF_TENTATIVE. If IFDISABLED was accidentally set after a DAD started, TENTATIVE could be cleared because no NA was received due to IFDISABLED, and as a result it could prevent DAD when manually clearing IFDISABLED after that.	2014-05-16 15:53:31 +00:00
Alexander V. Chernikov	b980262e63	Pass radix head ptr along with rte to rtexpunge(). Rename rtexpunge to rt_expunge().	2014-05-03 16:28:54 +00:00
Alexander V. Chernikov	cf58751a44	Use "hash" value in rtalloc_mpath_fib() instead of RTF_ANNOUNCE flag. Hashing method is the same as in in6_src.c. (Probably we need better one). MFC after: 2 weeks	2014-04-26 16:46:33 +00:00
Alexander V. Chernikov	36d55f0f9d	Unify sa_equal() macro usage. MFC after: 2 weeks	2014-04-26 14:52:03 +00:00
Alan Somers	0cfee0c223	Fix subnet and default routes on different FIBs on the same subnet. These two bugs are closely related. The root cause is that ifa_ifwithnet does not consider FIBs when searching for an interface address. sys/net/if_var.h sys/net/if.c Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those functions will only return an address whose interface fib equals the argument. sys/net/route.c Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib arguments. sys/netinet/in.c Update in_addprefix to consider the interface fib when adding prefixes. This will prevent it from not adding a subnet route when one already exists on a different fib. sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/netinet6/nd6.c Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet. In some cases it there wasn't a clear specific fib number to use. In others, I was unable to test those functions so I chose RT_DEFAULT_FIB to minimize divergence from current behavior. I will fix some of the latter changes along with PR kern/187553. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c tests/sys/netinet/Makefile Revert r263738. The udp_dontroute test was right all along. However, bugs kern/187550 and kern/187553 cancelled each other out when it came to this test. Because of kern/187553, ifa_ifwithnet searched the default fib instead of the requested one, but because of kern/187550, there was an applicable subnet route on the default fib. The new test added in r263738 doesn't work right, however. I can verify with dtrace that ifa_ifwithnet returned the wrong address before I applied this commit, but route(8) miraculously found the correct interface to use anyway. I don't know how. Clear expected failure messages for kern/187550 and kern/187552. PR: kern/187550 PR: kern/187552 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic	2014-04-24 23:56:56 +00:00
Andrey V. Elsukov	52c57247d3	Remove unused variable. PR: 173521 MFC after: 1 week Sponsored by: Yandex LLC	2014-04-17 06:40:11 +00:00
Andrey V. Elsukov	4fd913364f	Properly release the in6_multi lock. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-12 02:05:31 +00:00
Kevin Lo	d1b18731d9	Minor style cleanups.	2014-04-07 01:55:53 +00:00
Kevin Lo	e06e816f67	Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks. Tested with vlc and a test suite [1]. [1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz Reviewed by: jhb, glebius, adrian	2014-04-07 01:53:03 +00:00
Andrey V. Elsukov	cd71804c84	Remove unused label. MFC after: 1 week	2014-03-31 14:40:35 +00:00
Andrey V. Elsukov	27aa751c90	Don't generate an ICMPv6 error message if packet was consumed by filter. MFC after: 1 week Sponsored by: Yandex LLC	2014-03-31 14:27:22 +00:00
Robert Watson	7527624efa	Several years after initial development, merge prototype support for linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge)	2014-03-15 00:57:50 +00:00
Gleb Smirnoff	aa69c61235	Since both netinet/ and netinet6/ call into netipsec/ and netpfil/, the protocol specific mbuf flags are shared between them. - Move all M_FOO definitions into a single place: netinet/in6.h, to avoid future clashes. - Resolve clash between M_DECRYPTED and M_SKIP_FIREWALL which resulted in a failure of operation of IPSEC and packet filters. Thanks to Nicolas and Georgios for all the hard work on bisecting, testing and finally finding the root of the problem. PR: kern/186755 PR: kern/185876 In collaboration with: Georgios Amanakis <gamanakis gmail.com> In collaboration with: Nicolas DEFFAYET <nicolas-ml deffayet.com> Sponsored by: Nginx, Inc.	2014-03-12 14:29:08 +00:00
Gleb Smirnoff	e3a7aa6f56	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-05 01:17:47 +00:00
John Baldwin	5b26ea5df3	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
Craig Rodrigues	47a79fadc6	Remove KASSERT from in6p_lookup_mcast_ifp(). When the devel/jenkins port, version 1.551 was started, the kernel would panic if INVARIANTS was enabled in the kernel config. Suggested by: bms	2014-02-23 01:27:22 +00:00
Gleb Smirnoff	0ff96b4f55	o Remove at compile time the HASH_ALL code, that was never tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-17 11:50:56 +00:00
Alexander V. Chernikov	f6990c4e3e	Further simplify nd6_output_lle. Currently we have 3 usage patterns: 1) nd6_output (most traffic flow, no lle supplied, lle RLOCK sufficient) 2) corner cases for output (no lle, STALE lle, so on). lle WLOCK needed. 3) nd* iunternal machinery (WLOCK'ed lle provided, perform packet queing). We separate case 1 and implement it inside its only customer - nd6_output. This leads to some code duplication (especialy SEND stuff, which should be hooked to output in a different way), but simplifies locking and control flow logic fir nd6_output_lle. Reviewed by: ae MFC after: 3 weeks Sponsored by: Yandex LLC	2014-02-13 19:09:04 +00:00
Andrey V. Elsukov	e4c77ca0c0	Drop packets to multicast address whose scop field contains the reserved value 0. MFC after: 1 week Sponsored by: Yandex LLC	2014-02-13 14:10:44 +00:00
Christian Brueffer	d37872314f	Only count table lookups when we're actually processing packets. PR: 183462 Submitted by: Sven-Thorsten Dietrich <thebigcorporation at gmail.com> Reviewed by: bms MFC after: 1 month	2014-02-10 14:47:51 +00:00
Christian Brueffer	1b55364ed9	For IPv6, return the same error code as IPv4 when mrouter is not initialized. PR: 178472 Submitted by: Sven-Thorsten Dietrich <sven at vyatta.com> Reviewed by: bms	2014-02-10 14:36:51 +00:00
Alexander V. Chernikov	9dffa6a3f3	Simplify nd6_output_lle: * Check ND6_IFF_IFDISABLED before acquiring any locks * Assume m is always non-NULL * remove 'bad' case not used anymore * Simply if_output conditional MFC after: 2 weeks Sponsored by: Yandex LLC	2014-02-10 12:52:33 +00:00
Gleb Smirnoff	5d6d7e756b	o Revamp API between flowtable and netinet, netinet6. - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-07 15:18:23 +00:00
Andrey V. Elsukov	74a976fffd	Unlock entry before retry. Submitted by: melifaro MFC after: 1 week	2014-02-07 10:58:46 +00:00
Andrey V. Elsukov	51eecdc35a	Take exclusive lock only when lle isn't NULL. We don't need write access to lle in most cases. MFC after: 1 week Sponsored by: Yandex LLC	2014-02-02 07:28:04 +00:00
Alexander V. Chernikov	f6b84910bb	Further rework netinet6 address handling code: * Set ia address/mask values BEFORE attaching to address lists. Inet6 address assignment is not atomic, so the simplest way to do this atomically is to fill in ia before attach. * Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code). * Do some renamings: in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here) in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code) in6_ifaddloop -> nd6_add_ifa_lle in6_ifremloop -> nd6_rem_ifa_lle * Split working with LLE and route announce code for last two. Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour. * Call device SIOCSIFADDR handler IFF we're adding first address. In IPv4 we have to call it on every address change since ARP record is installed by arp_ifinit() which is called by given handler. IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so there is no reason to call SIOCSIFADDR often.	2014-01-19 16:07:27 +00:00
Alexander V. Chernikov	0c5d4bde90	Use in6_localip() instead of hand-rolled cycle. MFC after: 2 weeks	2014-01-18 20:54:55 +00:00
Alexander V. Chernikov	9080e7d023	Add in6_prepare_ifra() function to ease preparing in-kernel IPv6 address requests. MFC after: 2 weeks	2014-01-18 20:32:59 +00:00
Alexander V. Chernikov	b6a16fc853	Do some style(9) not done in r260851 to improve readability. MFC after: 2 weeks	2014-01-18 15:57:43 +00:00
Alexander V. Chernikov	60d7c722a5	Split in6_update_ifa() into smaller pieces leaving functionality intact. Discussed with: ae MFC after: 2 weeks	2014-01-18 15:52:52 +00:00
Andrey V. Elsukov	e74966f60b	Mechanically replace direct accessing to if_xname to using if_name() macro.	2014-01-10 12:33:28 +00:00
John-Mark Gurney	f2effe745c	revert part of r260485 which changes how part of the header gets included.. netstat uses -DKERNEL=1 to get these parts and breaks the build w/o it... melifaro@ says that ae@ is probably asleep, and the PR doesn't have this part of the patch... Probably a local change got in by accident.. PR: 185148 Pointy hat to: ae@	2014-01-09 22:41:18 +00:00
Andrey V. Elsukov	78415d1082	Remove extra nesting from X_ip6_mforward() function. Also remove disabled definitions from ip6_mroute.h. PR: 185148 Sponsored by: Yandex LLC	2014-01-09 15:38:28 +00:00
Andrey V. Elsukov	0a6b0ffa54	Add MRT6_DLOG() macro for debugging. Reduce number of MRT6DEBUG ifdefs and fix some broken format strings. MFC after: 1 week Sponsored by: Yandex LLC	2014-01-09 14:58:06 +00:00
Alexander V. Chernikov	1dc8f6a82c	Introduce IN6_MASK_ADDR() macro to unify various hand-rolled code to do IPv6 addr & mask in different places. MFC after: 2 weeks	2014-01-08 22:13:32 +00:00
Andrey V. Elsukov	b88aef1dcf	Use pointer to struct sockaddr_in6 in lla_lookup() call. This prevents from triggering KASSERT in in6_lltable_lookup.	2014-01-03 02:40:56 +00:00
Andrey V. Elsukov	e2d14d9317	Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with LLE_CREATE flag. MFC after: 1 week	2014-01-03 02:32:05 +00:00
Andrey V. Elsukov	ea0c377602	lla_lookup() does modification only when LLE_CREATE is specified. Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing lla_lookup() without LLE_CREATE flag. Reviewed by: glebius, adrian MFC after: 1 week Sponsored by: Yandex LLC	2014-01-02 08:40:37 +00:00
Adrian Chadd	c445d2520d	Use an RLOCK here instead of an RWLOCK - matching all the other calls to lla_lookup(). This drastically reduces the very high lock contention when doing parallel TCP throughput tests (> 1024 sockets) with IPv6. Tested: * parallel IPv6 TCP bulk data exchange, 8192 sockets MFC after: 1 week Sponsored by: Netflix, Inc.	2014-01-01 00:56:26 +00:00
Bjoern A. Zeeb	010c2b8192	Correct warnings comparing unsigned variables < 0 constantly reported while building kernels. All instances removed are indeed unsigned so the expressions could not be true. MFC after: 1 week	2013-12-25 20:08:44 +00:00
Dimitry Andric	6c5a340e56	In sys/netinet6/in6_mcast.c, in6m_is_ifp_detached() is only used whenever KTR is defined, so put it between #ifdef KTR guards. This avoids a warning about a unused function if KTR is not enabled. MFC after: 3 days	2013-12-24 20:30:13 +00:00
Andrey V. Elsukov	569aad57d2	Free mbuf in case of error. MFC after: 1 week	2013-12-17 10:53:17 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Andrey V. Elsukov	ee674966f4	Fix panic with RADIX_MPATH, when RTFREE_LOCKED() called for already unlocked route. Use in6_rtalloc() instead of in6_rtalloc1. This helps simplify the code and remove several now unused variables. PR: 156283 MFC after: 2 weeks	2013-11-11 12:49:00 +00:00
Gleb Smirnoff	555036b5f6	Remove never used ioctls that originate from KAME. The proof of their zero usage was exp-run from misc/183538.	2013-11-11 05:39:42 +00:00
Michael Tuexen	b54ddf225f	Changes from upstream to improve compilation when INET or INET6 or none of them is defined. MFC after: 3 days	2013-11-02 20:12:19 +00:00
Gleb Smirnoff	c3322cb91c	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
Gleb Smirnoff	eedc7fd9e8	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Andrey V. Elsukov	baa09f1891	Initialize inc_fibnum for properly handling ICMP6_PACKET_TOO_BIG errors in multifib environment. PR: 183265 MFC after: 1 week	2013-10-25 01:02:25 +00:00
Gleb Smirnoff	7caf4ab7ac	- Utilize counter(9) to accumulate statistics on interface addresses. Add four counters to struct ifaddr. This kills '+=' on a variables shared between processors for every packet. - Nuke struct if_data from struct ifaddr. - In ip_input() do not put a reference on ifaddr, instead update statistics right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9) for every packet. [1] - To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in rtsock.c fill if_data fields using counter_u64_fetch(). - Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which took if_data not from the ifaddr, but from ifaddr's ifnet. [2] Submitted by: melifaro [1], pluknet[2] Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 11:37:57 +00:00
Gleb Smirnoff	4675896098	Remove ifa_init() and provide ifa_alloc() that will allocate and setup struct ifaddr internally. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:31:42 +00:00
Gleb Smirnoff	6ed910fabe	Hide 'struct ifaddr' definition from userland. Two tools left that use it, namely ipftest(1) and ifmcstat(1). These sniff structure definition using _WANT_IFADDR define. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:19:24 +00:00
Gleb Smirnoff	3fa98cf9ac	Remove unsigned < 0 check.	2013-10-15 10:12:19 +00:00
Gleb Smirnoff	ca695e0807	Remove useless check of ia6 against NULL, right after dereferencing it.	2013-10-15 10:11:23 +00:00
Gleb Smirnoff	0218539652	Now counter_u64_t is known to userland, thus remove hack from r253086. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:09:33 +00:00

... 4 5 6 7 8 ...

1797 Commits