freebsd-dev

Author	SHA1	Message	Date
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Andrey V. Elsukov	c1b4f79dfa	Add an ability accept encapsulated packets from different sources by one gif(4) interface. Add new option "ignore_source" for gif(4) interface. When it is enabled, gif's encapcheck function requires match only for packet's destination address. Differential Revision: https://reviews.freebsd.org/D2004 Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2015-05-15 12:19:45 +00:00
Hiroki Sato	59333867ff	- Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice because a link where looped back NS messages are permanently observed does not work with either NDP or ARP for IPv4. - draft-ietf-6man-enhanced-dad is now RFC 7527. Discussed with: hiren MFC after: 3 days	2015-05-12 03:31:57 +00:00
Andrey V. Elsukov	654bdb5abb	Mark data checksum as valid for multicast packets, that we send back to myself via simloop. Also remove duplicate check under #ifdef DIAGNOSTIC. PR: 180065 MFC after: 1 week	2015-05-07 14:17:43 +00:00
Andrey V. Elsukov	db037aa4ed	Remove unneded #ifdef INET6 and IPSEC. This file compiled only when both options are defined. Include opt_sctp.h and sctp_crc32.h to enable #ifdef SCTP code block and delayed checksum calculation for SCTP.	2015-05-07 12:15:45 +00:00
Gleb Smirnoff	0fa5aacd8b	Remove #ifdef IFT_FOO. Submitted by: Guy Yur <guyyur gmail.com>	2015-05-02 20:31:27 +00:00
Andrey V. Elsukov	3e92c37f32	Remove now unneded KEY_FREESP() for case when ipsec[46]_process_packet() returns EJUSTRETURN. Sponsored by: Yandex LLC	2015-04-27 01:11:09 +00:00
Andrey V. Elsukov	3d80e82d60	Fix possible use after free due to security policy deletion. When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(), we hold one reference to security policy and release it just after return from this function. But IPSec processing can be deffered and when we release reference to security policy after ipsec[46]_process_packet(), user can delete this security policy from SPDB. And when IPSec processing will be done, xform's callback function will do access to already freed memory. To fix this move KEY_FREESP() into callback function. Now IPSec code will release reference to SP after processing will be finished. Differential Revision: https://reviews.freebsd.org/D2324 No objections from: #network Sponsored by: Yandex LLC	2015-04-27 00:55:56 +00:00
Gleb Smirnoff	210b5c73e7	Fix r281649: don't call in6_clearscope() twice. Submitted by: ae	2015-04-17 15:26:08 +00:00
Gleb Smirnoff	28ebe80cab	Provide functions to determine presence of a given address configured on a given interface. Discussed with: np Sponsored by: Nginx, Inc.	2015-04-17 11:57:06 +00:00
Mark Johnston	dff78447a4	Fix a possible refcount leak in regen_tmpaddr(). public_ifa6 may be set to NULL after taking a reference to a previous address list element. Instead, only take the reference after leaving the loop but before releasing the address list lock. Differential Revision: https://reviews.freebsd.org/D2253 Reviewed by: ae MFC after: 2 weeks	2015-04-13 01:55:42 +00:00
Andrey V. Elsukov	e2956804dd	Fix the IPV6_MULTICAST_IF sockopt handling. RFC 3493 says when the interface index is specified as zero, the system should select the interface to use for outgoing multicast packets. Even the comment for the in6p_set_multicast_if() function says about index of zero. But in fact for zero index the function just returns EADDRNOTAVAIL. I.e. if you first set some interface and then will try reset it with zero ifindex, you will get EADDRNOTAVAIL. Reset im6o_multicast_ifp to NULL when interface index specified as zero. Also return EINVAL in case when ifnet_byindex() returns NULL. This will be the same behaviour as when ifindex is bigger than V_if_index. And return EADDRNOTAVAIL only when interface is not multicast capable. Reported by: Olivier Cochard-Labbé MFC after: 2 weeks Sponsored by: Yandex LLC	2015-04-10 19:09:51 +00:00
Andrey V. Elsukov	efb19cf6db	Fix the check for maximum mbuf's size needed to send ND6 NA and NS. It is acceptable that the size can be equal to MCLBYTES. In the later KAME's code this check has been moved under DIAGNOSTIC ifdef, because the size of NA and NS is much smaller than MCLBYTES. So, it is safe to replace the check with KASSERT. PR: 199304 Discussed with: glebius MFC after: 1 week	2015-04-09 12:57:58 +00:00
Kristof Provost	53deb05c36	Evaluate packet size after the firewall had its chance Defer the packet size check until after the firewall has had a look at it. This means that the firewall now has the opportunity to (re-)fragment an oversized packet. Differential Revision: https://reviews.freebsd.org/D1815 Reviewed by: ae Approved by: gnn (mentor)	2015-04-07 20:29:03 +00:00
Xin LI	dd3856601d	Mitigate Local Denial of Service with IPv6 Router Advertisements and log attack attempts. Submitted by: hrs Security: FreeBSD-SA-15:09.nd6 Security: CVE-2015-2923	2015-04-07 20:20:09 +00:00
Gleb Smirnoff	c151f24d08	o Make net.inet6.ip6.mif6table return special API structure, that doesn't contain kernel pointers, and instead has interface index. Bump __FreeBSD_version for that change. o Now, netstat/mroute6.c no longer needs to kvm_read(3) struct ifnet, and no longer needs to include if_var.h Note that this change is far from being a complete move of IPv6 multicast routing to a proper API. Other structures are still dumped into their sysctls as is, requiring userland application to #define _KERNEL when including ip6_mroute.h and then call kvm_read(3) to gather all bits and pieces. But fixing this is out of scope of the opaque ifnet project. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-04-06 22:12:18 +00:00
Kristof Provost	31e2e88c27	Remove duplicate code We'll just fall into the same local delivery block under the 'if (m->m_flags & M_FASTFWD_OURS)'. Suggested by: ae Differential Revision: https://reviews.freebsd.org/D2225 Approved by: gnn (mentor)	2015-04-06 19:08:44 +00:00
Kristof Provost	798318490e	Preserve IPv6 fragment IDs accross reassembly and refragmentation When forwarding fragmented IPv6 packets and filtering with PF we reassemble and refragment. That means we generate new fragment headers and a new fragment ID. We already save the fragment IDs so we can do the reassembly so it's straightforward to apply the incoming fragment ID on the refragmented packets. Differential Revision: https://reviews.freebsd.org/D2188 Approved by: gnn (mentor)	2015-04-01 12:15:01 +00:00
Gleb Smirnoff	20778ab5b4	Move ip6_sprintf() declaration from in6_var.h to in6.h. This is a simple function that works with in6_addr and it is not related to the INET6 stack implementation. Sponsored by: Nginx, Inc.	2015-03-24 16:45:50 +00:00
Andrey V. Elsukov	ff9f2a36de	To avoid a possible race, release the reference to ifa after return from nd6_dad_na_input(). Submitted by: Alexandre Martins MFC after: 1 week	2015-03-19 00:04:25 +00:00
Andrey V. Elsukov	fd8dd3a6d7	tcp6_ctlinput() doesn't pass MTU value to in6_pcbnotify(). Check cmdarg isn't NULL before dereference, this check was in the ip6_notify_pmtu() before r279588. Reported by: Florian Smeets MFC after: 1 week	2015-03-06 05:50:39 +00:00
Hiroki Sato	23e9ffb0e1	- Implement loopback probing state in enhanced DAD algorithm. - Add no_dad and ignoreloop per-IF knob. no_dad disables DAD completely, and ignoreloop is to prevent infinite loop in loopback probing state when loopback is permanently expected.	2015-03-05 21:27:49 +00:00
Andrey V. Elsukov	8f1beb889e	Fix deadlock in IPv6 PCB code. When several threads are trying to send datagram to the same destination, but fragmentation is disabled and datagram size exceeds link MTU, ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all sockets wanted to know MTU to this destination. And since all threads hold PCB lock while sending, taking the lock for each PCB in the in6_pcbnotify() leads to deadlock. RFC 3542 p.11.3 suggests notify all application wanted to receive IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message. But it doesn't require this, when we don't receive ICMPv6 message. Change ip6_notify_pmtu() function to be able use it directly from ip6_output() to notify only one socket, and to notify all sockets when ICMPv6 packet too big message received. PR: 197059 Differential Revision: https://reviews.freebsd.org/D1949 Reviewed by: no objection from #network Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2015-03-04 11:20:01 +00:00
Andrey V. Elsukov	1eef8a6c08	Create nd6_ns_output_fib() function with extra argument fibnum. Use it to initialize mbuf's fibnum. Uninitialized fibnum value can lead to panic in the routing code. Currently we use only RT_DEFAULT_FIB value for initialization. Differential Revision: https://reviews.freebsd.org/D1998 Reviewed by: hrs (previous version) Sponsored by: Yandex LLC	2015-03-03 10:50:03 +00:00
Hiroki Sato	8d56075939	Nonce has to be non-NULL for DAD even if net.inet6.ip6.dad_enhanced=0.	2015-03-03 04:28:19 +00:00
Hiroki Sato	11d8451df3	Implement Enhanced DAD algorithm for IPv6 described in draft-ietf-6man-enhanced-dad-13. This basically adds a random nonce option (RFC 3971) to NS messages for DAD probe to detect a looped back packet. This looped back packet prevented DAD on some pseudo-interfaces which aggregates multiple L2 links such as lagg(4). The length of the nonce is set to 6 bytes. This algorithm can be disabled by setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis. Reported by: hiren Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D1835	2015-03-02 17:30:26 +00:00
Gleb Smirnoff	e072c794ad	Now that all users of _WANT_IFADDR are fixed, remove this crutch and hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 23:16:10 +00:00
Gleb Smirnoff	9e62a5a379	- Rename 'struct mld_ifinfo' into 'struct mld_ifsoftc', since it really represents a context. - Preserve name 'struct mld_ifinfo' for a new structure, that will be stable API between userland and kernel. - Make sysctl_mld_ifinfo() return the new 'struct mld_ifinfo', instead of old one, which had a bunch of internal kernel structures in it. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 22:37:01 +00:00
Gleb Smirnoff	fd1b2a7c57	Widen _KERNEL ifdef to hide more kernel network stack structures from userland.	2015-02-19 06:24:27 +00:00
Gleb Smirnoff	a99c84d4e6	Use new struct mbufq instead of struct ifqueue to manage packet queues in IPv6 multicast code. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 01:21:23 +00:00
Gleb Smirnoff	6c269f6912	Factor out ip6_fragment() function, to be used in IPv6 stack and pf(4). Submitted by: Kristof Provost Differential Revision: D1766	2015-02-16 06:30:27 +00:00
Gleb Smirnoff	e5ee706031	Move ip6_deletefraghdr() to frag6.c. Suggested by: bz	2015-02-16 05:58:32 +00:00
Gleb Smirnoff	0b438b0fb8	Factor out ip6_deletefraghdr() function, to be shared between IPv6 stack and pf(4). Submitted by: Kristof Provost Reviewed by: ae Differential Revision: D1764	2015-02-16 01:12:20 +00:00
Randall Stewart	2575fbb827	This fixes a bug in the way that the LLE timers for nd6 and arp were being used. They basically would pass in the mutex to the callout_init. Because they used this method to the callout system, it was possible to "stop" the callout. When flushing the table and you stopped the running callout, the callout_stop code would return 1 indicating that it was going to stop the callout (that was about to run on the callout_wheel blocked by the function calling the stop). Now when 1 was returned, it would lower the reference count one extra time for the stopped timer, then a few lines later delete the memory. Of course the callout_wheel was stuck in the lock code and would then crash since it was accessing freed memory. By using callout_init(c, 1) we always get a 0 back and the reference counting bug does not rear its head. We do have to make a few adjustments to the callouts themselves though to make sure it does the proper thing if rescheduled as well as gets the lock. Commented upon by hiren and sbruno See Phabricator D1777 for more details. Commented upon by hiren and sbruno Reviewed by: adrian, jhb and bz Sponsored by: Netflix Inc.	2015-02-09 19:28:11 +00:00
Andrey V. Elsukov	46386183da	Print IPv6 address in log message instead of address of pointer. MFC after: 1 week	2015-02-05 16:29:26 +00:00
Adrian Chadd	b2bdc62a95	Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific bits. The motivation here is to eventually teach netisr and potentially other networking subsystems a bit more about how RSS work queues / buckets are configured so things have a hope of auto-configuring in the future. * net/rss_config.[ch] takes care of the generic bits for doing configuration, hash function selection, etc; * topelitz.[ch] is now in net/ rather than netinet/; * (and would be in libkern if it didn't directly include RSS_KEYSIZE; that's a later thing to fix up.) * netinet/in_rss.[ch] now just contains the IPv4 specific methods; * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods. This should have no functional impact on anyone currently using the RSS support. Differential Revision: D1383 Reviewed by: gnn, jfv (intel driver bits)	2015-01-18 18:06:40 +00:00
Gleb Smirnoff	ffec6ee527	Do not go one layer down to check ifqueue length. First, not all drivers use ifqueue at all. Second, there is no point in this lockless check. Either positive or negative result of the check could be incorrect after a tick. Sponsored by: Nginx, Inc.	2015-01-12 14:52:43 +00:00
Michael Tuexen	4be807c4d6	Minimize the usage of SCTP_BUF_IS_EXTENDED. This should help Robert...	2015-01-10 20:49:57 +00:00
Alexander V. Chernikov	d63e657c04	* Deal with ARCNET L2 multicast mapping for IPv6 the same way as in IPv4: handle it in arc_output() instead of nd6_storelladdr(). * Remove IFT_ARCNET check from arpresolve() since arc_output() does not use arpresolve() to handle broadcast/multicast. This check was there since r84931. It looks like it was not used since r89099 (initial import of Arcnet support where multicast is handled separately). * Remove IFT_IEEE1394 case from nd6_storelladdr() since firewire_output() calles nd6_storelladdr() for unicast addresses only. * Remove IFT_ARCNET case from nd6_storelladdr() since arc_output() now handles multicast by itself. As a result, we have the following pattern: all non-ethernet-style media have their own multicast map handling inside their appropriate routines. On the other hand, arpresolve() (and nd6_storelladdr()) which meant to be 'generic' ones de-facto handles ethernet-only multicast maps. MFC after: 3 weeks	2015-01-09 12:56:51 +00:00
Alexander V. Chernikov	abc1be9062	Add forgotten definition for nd6_output_ifp().	2015-01-08 18:29:54 +00:00
Alexander V. Chernikov	d7968c29ec	* Use newly-created nd6_grab_holdchain() function to retrieve lle hold mbuf chain instead of calling full-blown nd6_output_lle() for each packet. This simplifies both callers and nd6_output_lle() implementation. * Make nd6_output_lle() static and remove now-unused lle and chain arguments. * Rename nd6_output_flush() -> nd6_flush_holdchain() to be consistent. * Move all pre-send transmit hooks to newly-created nd6_output_ifp(). Now nd6_output(), nd6_output_lle() and nd6_flush_holdchain() are using it to send mbufs to if_output. * Remove SeND hook from nd6_na_input() because it was implemented incorrectly since the beginning (r211501): - it tagged initial input mbuf (m) instead of m_hold - tagging _all_ mbufs in holdchain seems to be wrong anyway.	2015-01-08 18:02:05 +00:00
Alexander V. Chernikov	3a7498636a	* Allocate hash tables separately * Make llt_hash() callback more flexible * Default hash size and hashing method is now per-af * Move lltable allocation to separate function	2015-01-05 17:23:02 +00:00
Alexander V. Chernikov	df485dbe3e	Do not call LLE_WUNLOCK() for deleted lle.	2015-01-05 16:10:54 +00:00
Robert Watson	ed6a66ca6c	To ease changes to underlying mbuf structure and the mbuf allocator, reduce the knowledge of mbuf layout, and in particular constants such as M_EXT, MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single M_ALIGN() macro, implemented by a now-inlined m_align() function: - Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline. - Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align(). - Update consumers around the tree to simply use M_ALIGN(). This change eliminates a number of cases where mbuf consumers must be aware of whether or not mbufs returned by the allocator use external storage, but also assumptions about the size of the returned mbuf. This will make it easier to introduce changes in how we use external storage, as well as features such as variable-size mbufs. Differential Revision: https://reviews.freebsd.org/D1436 Reviewed by: glebius, trasz, gnn, bz Sponsored by: EMC / Isilon Storage Division	2015-01-05 09:58:32 +00:00
Alexander V. Chernikov	b44a7d5d87	* Use unified code for deleting entry by sockaddr instead of per-af one. * Remove now unused llt_delete_addr callback.	2015-01-03 19:09:06 +00:00
Alexander V. Chernikov	20dd899505	* Hide lltable implementation details in if_llatbl_var.h * Make most of lltable_* methods 'normal' functions instead of inline * Add lltable_get_<af\|ifp>() functions to access given lltable fields * Temporarily resurrect nd6_lookup() function	2015-01-03 16:04:28 +00:00
Alexander V. Chernikov	787cea14a5	Since @ln is the result of LLTABLE6(ifp) lookup its originating interface must always be @ifp. So change ln->lle_tbl->llt_ifp to ifp.	2015-01-03 14:18:48 +00:00
Alexander V. Chernikov	d2e0f37c22	Finish r275628 #2 : remove remaining 'base' references.	2015-01-03 14:09:35 +00:00
Adrian Chadd	492ccbe14d	Migrate the RSS IPv6 hash code to use pointers to the v6 addresses rather than passing them in by value. The eventual aim is to do incremental hash construction rather than all of the memcpy()'ing into a contiguous buffer for the hash function, which does show up as taking quite a bit of CPU during profiling. Tested: * a variety of laptops/desktop setups I have, with v6 connectivity Differential Revision: D1404 Reviewed by: bz, rpaulo	2014-12-31 22:52:43 +00:00
Andrey V. Elsukov	f188f14d43	Extern declarations in C files loses compile-time checking that the functions' calls match their definitions. Move them to header files. Reviewed by: jilles (previous version)	2014-12-25 21:32:37 +00:00
Andrey V. Elsukov	132c449079	Remove in_gif.h and in6_gif.h files. They only contain function declarations used by gif(4). Instead declare these functions in C files. Also make some variables static.	2014-12-23 16:17:37 +00:00
Michael Tuexen	caeae63f97	Plug a memory leak in an error code path. Reported by: Coverity CID: 1018936 MFC after: 3 days	2014-12-17 20:19:57 +00:00
Andrey V. Elsukov	44eb8bbe7b	Do not count security policy violation twice. ipsec*_in_reject() do this by their own. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 19:20:13 +00:00
Andrey V. Elsukov	49ada98eac	Use ipsec6_in_reject() to simplify ip6_ipsec_fwd() and ip6_ipsec_input(). ipsec6_in_reject() does the same things, also it counts policy violation errors. Do IPSEC check in the ip6_forward() after addresses checks. Also use ip6_ipsec_fwd() to make code similar to IPv4 implementation. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 19:09:57 +00:00
Andrey V. Elsukov	0275b2e369	Remove flag/flags argument from the following functions: ipsec_getpolicybyaddr() ipsec4_checkpolicy() ip_ipsec_output() ip6_ipsec_output() The only flag used here was IP_FORWARDING. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 18:35:34 +00:00
Andrey V. Elsukov	8922ddbe40	Move ip_ipsec_fwd() from ip_input() into ip_forward(). Remove check for presence PACKET_TAG_IPSEC_IN_DONE mbuf tag from ip_ipsec_fwd(). PACKET_TAG_IPSEC_IN_DONE tag means that packet is already handled by IPSEC code. This means that before IPSEC processing it was destined to our address and security policy was checked in the ip_ipsec_input(). After IPSEC processing packet has new IP addresses and destination address isn't our own. So, anyway we can't check security policy from the mbuf tag, because it corresponds to different addresses. We should check security policy that corresponds to packet attributes in both cases - when it has a mbuf tag and when it has not. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 16:53:29 +00:00
Andrey V. Elsukov	e58320f127	Remove PACKET_TAG_IPSEC_IN_DONE mbuf tag lookup and usage of its security policy. The changed block of code in ip*_ipsec_input() is called when packet has ESP/AH header. Presence of PACKET_TAG_IPSEC_IN_DONE mbuf tag in the same time means that packet was already handled by IPSEC and reinjected in the netisr, and it has another ESP/AH headers (encrypted twice?). Since it was already processed by IPSEC code, the AH/ESP headers was already stripped (and probably outer IP header was stripped too) and security policy from the tdb_ident was applied to those headers. It is incorrect to apply this security policy to current headers. Also make ip_ipsec_input() prototype similar to ip6_ipsec_input(). Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 14:58:55 +00:00
Andrey V. Elsukov	dd9cd45b44	Remove check for presence of PACKET_TAG_IPSEC_PENDING_TDB and PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED mbuf tags. They aren't used in FreeBSD. Instead check presence of PACKET_TAG_IPSEC_OUT_DONE mbuf tag. If it is found, bypass security policy lookup as described in the comment. PACKET_TAG_IPSEC_OUT_DONE tag added to mbuf when IPSEC code finishes ESP/AH processing. Since it was already finished, this means the security policy placed in the tdb_ident was already checked. And there is no reason to check it again here. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 14:43:44 +00:00
Mark Johnston	a37271c3b8	Revert r275695: nd6_dad_find() was already correct. Reported by: ae, kib Pointy hat to: markj	2014-12-11 09:16:45 +00:00
Mark Johnston	97712e3efc	Fix a bug in r266857: nd6_dad_find() must return NULL if it doesn't find a matching element in the DAD queue. Reported by: Holger Hans Peter Freyther <holger@freyther.de> MFC after: 3 days	2014-12-11 00:41:54 +00:00
Alexander V. Chernikov	ee7e9a4e17	* Do not assume lle has sockaddr key after struct lle: use llt_fill_sa_entry() llt method to store lle address in sa. * Eliminate L3_ADDR macro and either reference IPv4/IPv6 address directly from lle or use newly-created llt_fill_sa_entry(). * Do not store sockaddr inside arp/ndp lle anymore.	2014-12-09 00:48:08 +00:00
Alexander V. Chernikov	d82ed5051c	Simplify lle lookup/create api by using addresses instead of sockaddrs.	2014-12-08 23:23:53 +00:00
Mark Johnston	d6ad6a865a	Add refcounting to IPv6 DAD objects and simplify the DAD code to fix a number of races which could cause double frees or use-after-frees when performing DAD on an address. In particular, an IPv6 address can now only be marked as a duplicate from the DAD callout. Differential Revision: https://reviews.freebsd.org/D1258 Reviewed by: ae, hrs Reported by: rstone MFC after: 1 month	2014-12-08 04:44:40 +00:00
Alexander V. Chernikov	73b52ad896	Use llt_prepare_static_entry method to prepare valid per-af static entry.	2014-12-07 23:59:44 +00:00
Alexander V. Chernikov	0368226e65	* Retire abstract llentry_free() in favor of lltable_drop_entry_queue() and explicit calls to RTENTRY_FREE_LOCKED() * Use lltable_prefix_free() in arp_ifscrub to be consistent with nd6. * Rename <lltable_\|llt>_delete function to _delete_addr() to note that this function is used to external callers. Make this function maintain its own locking. * Use lookup/unlink/clear call chain from internal callers instead of delete_addr. * Fix LLE_DELETED flag handling	2014-12-07 23:08:07 +00:00
Alexander V. Chernikov	721cd2e032	Do not enforce particular lle storage scheme: * move lltable allocation to per-domain callbacks. * make llentry_link/unlink functions overridable llt methods. * make hash table traversal another overridable llt method.	2014-12-07 17:32:06 +00:00
Alexander V. Chernikov	a743ccd468	* Add llt_clear_entry() callback which is able to do all lle cleanup including unlinking/freeing * Relax locking in lltable_prefix_free_af/lltable_free * Do not pass @llt to lle free callback: it is always NULL now. * Unify arptimer/nd6_llinfo_timer: explicitly unlock lle avoiding unlock/lock sequinces * Do not pass unlocked lle to nd6_ns_output(): add nd6_llinfo_get_holdsrc() to retrieve preferred source address from lle hold queue and pass it instead of lle. * Finally, make nd6_create() create and return unlocked lle * Separate defrtr handling code from nd6_free(): use nd6_check_del_defrtr() to check if we need to keep entry instead of performing GC, use nd6_check_recalc_defrtr() to perform actual recalc on lle removal. * Move isRouter handling from nd6_cache_lladdr() to separate nd6_check_router() * Add initial code to maintain lle runtime flags in sync.	2014-12-07 15:42:46 +00:00
Michael Tuexen	457b4b8836	This is the SCTP specific companion of https://svnweb.freebsd.org/changeset/base/275358 which was provided by Hans Petter Selasky.	2014-12-04 21:17:50 +00:00
Andrey V. Elsukov	2dfcd0ae9d	Remove unneded check. No need to do m_pullup to the size that we prepended. MFC after: 1 week Sponsored by: Yandex LLC	2014-12-02 05:41:03 +00:00
Andrey V. Elsukov	2d957916ef	Remove route chaching support from ipsec code. It isn't used for some time. * remove sa_route_union declaration and route_cache member from struct secashead; * remove key_sa_routechange() call from ICMP and ICMPv6 code; * simplify ip_ipsec_mtu(); * remove #include <net/route.h>; Sponsored by: Yandex LLC	2014-12-02 04:20:50 +00:00
Alexander V. Chernikov	9b65db85e2	Do more fine-grained locking in lltable code: lltable_create_lle() does actual new lle creation without extensive locking and existing lle search. Move lle updating code from gigantic in_arpinput() to arp_update_llle() and some other functions. IPv6 changes to follow.	2014-12-01 21:43:48 +00:00
Hans Petter Selasky	c25290420e	Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies	2014-12-01 11:45:24 +00:00
Alexander V. Chernikov	ce313fdd71	* Unify lle table dump/prefix removal code. * Rename lla_XXX -> lltable_XXX_lle to reduce number of name prefixes used by lltable code.	2014-11-30 14:35:01 +00:00
Alexander V. Chernikov	5d14e4cd76	Provide rte_<get\|set> methods to access rtentry for external consumers.	2014-11-29 19:27:43 +00:00
Alexander V. Chernikov	74860d4f7c	Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr - return lle flags IFF needed. Do not pass rte to arpresolve - pass is_gateway flag instead.	2014-11-27 23:06:25 +00:00
Andrey V. Elsukov	af6209a133	Skip L2 addresses lookups for p2p interfaces. Discussed with: melifaro Sponsored by: Yandex LLC	2014-11-24 21:51:43 +00:00
Alexander V. Chernikov	73d770287d	Do more fine-grained lltable locking: use table runtime lock as rare as we can.	2014-11-23 15:38:06 +00:00
Alexander V. Chernikov	9479029b1f	* Add lltable llt_hash callback * Move lltable items insertions/deletions to generic llt code.	2014-11-23 12:15:28 +00:00
Alexander V. Chernikov	7c066c18db	Use less-invasive approach for IF_AFDATA lock: convert into 2 locks: use rwlock accessible via external functions (IF_AFDATA_CFG_* -> if_afdata_cfg_()) for all control plane tasks use rmlock (IF_AFDATA_RUN_) for fast-path lookups.	2014-11-22 19:53:36 +00:00
Alexander V. Chernikov	27688dfe1d	Temporarily revert r274774.	2014-11-22 17:57:54 +00:00
Alexander V. Chernikov	4194b42144	Another r274774 fix.	2014-11-21 23:37:14 +00:00
Alexander V. Chernikov	86b94cffe4	Finish r274774: add more headers/fix build for non-debug case.	2014-11-21 23:36:21 +00:00
Alexander V. Chernikov	9883e41b4b	Switch IF_AFDATA lock to rmlock	2014-11-21 02:28:56 +00:00
Alexander V. Chernikov	4d56c133fb	Sync to HEAD@r274766	2014-11-21 01:22:33 +00:00
Alexander V. Chernikov	f9723c7705	Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp we need to return. Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags fields for all relevant functions.	2014-11-20 22:41:59 +00:00
Alexander V. Chernikov	7f948f12f6	Finish r274175: do control plane MTU tracking. Update route MTU in case of ifnet MTU change. Add new RTF_FIXEDMTU to track explicitly specified MTU. Old behavior: ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU. User has to manually update all routes. ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU. However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu gets updated. New behavior: ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them. ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu. route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag. route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag. PR: 194238 MFC after: 1 month CR: D1125	2014-11-17 01:05:29 +00:00
Alexander V. Chernikov	df629abf3e	Rework LLE code locking: * struct llentry is now basically split into 2 pieces: all fields within 64 bytes (amd64) are now protected by both ifdata lock AND lle lock, e.g. you require both locks to be held exclusively for modification. All data necessary for fast path operations is kept here. Some fields were added: - r_l3addr - makes lookup key liev within first 64 bytes. - r_flags - flags, containing pre-compiled decision whether given lle contains usable data or not. Current the only flag is RLLE_VALID. - r_len - prepend data len, currently unused - r_kick - used to provide feedback to control plane (see below). All other fields are protected by lle lock. * Add simple state machine for ARP to handle "about to expire" case: Current model (for the fast path) is the following: - rlock afdata - find / rlock rte - runlock afdata - see if "expire time" is approaching (time_uptime + la->la_preempt > la->la_expire) - if true, call arprequest() and decrease la_preempt - store MAC and runlock rte New model (data plane): - rlock afdata - find rte - check if it can be used using r_* fields only - if true, store MAC - if r_kick field != 0 set it to 0. - runlock afdata New mode (control plane): - schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries) seconds instead of V_arpt_keep. - on first timer invocation change state from ARP_LLINFO_REACHABLE to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in V_arpt_rexmit (default to 1 sec). - on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks for r_kick value: reschedule if not changed, and send arprequest() if set to zero (e.g. entry was used). * Convert IPv4 path to use new single-lock approach. IPv6 bits to follow. * Slow down in_arpinput(): now valid reply will (in most cases) require acquiring afdata WLOCK twice. This is requirement for storing changed lle data. This change will be slightly optimized in future. * Provide explicit hash link/unlink functions for both ipv4/ipv6 code. This will probably be moved to generic lle code once we have per-AF hashing callback inside lltable. * Perform lle unlink on deletion immediately instead of delaying it to the timer routine. * Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the presence of lle reference used for safe callout calls.	2014-11-16 20:12:49 +00:00
Alexander V. Chernikov	b4b1367ae4	* Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete Assume lla_create to return LLE_EXCLUSIVE lock for lle. * Rework lla_rt_output to perform all lle changes under afdata WLOCK. * change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().	2014-11-15 18:54:07 +00:00
Andrey V. Elsukov	794a349c6f	We don't return sp pointer, thus NULL assignment isn't needed. And reference to sp will be freed at the end. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-12 22:58:52 +00:00
Alexander V. Chernikov	670e8b3b8c	Kill custom in_matroute() radix mathing function removing one rte mutex lock. Initially in_matrote() in_clsroute() in their current state was introduced by r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them in route table, setting RTPRF_OURS flag and some expire time. After that, either GC came or RTPRF_OURS got removed on first-packet. It was a good solution in that days (and probably another decade after that) to keep TCP metrics. However, after moving metrics to TCP hostcache in r122922, most of in_rmx functionality became unused. It might had been used for flushing icmp-originated routes before rte mutexes/refcounting, but I'm not sure about that. So it looks like this is nearly impossible to make GC do its work nowadays: in_rtkill() ignores non-RTPRF_OURS routes. route can only become RTPRF_OURS after dropping last reference via rtfree() which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes. Dynamic routes can still be installed via received redirect, but they have default lifetime (no specific rt_expire) and no one has another trie walker to call RTFREE() on them. So, the changelist: * remove custom rnh_match / rnh_close matching function. * remove all GC functions * partially revert r256695 (proto3 is no more used inside kernel, it is not possible to use rt_expire from user point of view, proto3 support is not complete) * Finish r241884 (similar to this commit) and remove remaining IPv6 parts MFC after: 1 month	2014-11-11 02:52:40 +00:00
Andrey V. Elsukov	002c24396d	Add sa6_checkzone_ifp() function. It checks correctness of struct sockaddr_in6, usually obtained from the user level through ioctl. It initializes sin6_scope_id using given interface. Sponsored by: Yandex LLC	2014-11-10 16:12:51 +00:00
Alexander V. Chernikov	e0c0711e01	* Make nd6_dad_duplicated() constant. * Simplify refcounting by using nd6_dad_add() / nd6_dad_del(). Reviewed by: ae MFC after: 2 weeks Sponsored by: Yandex LLC	2014-11-10 16:01:39 +00:00
Andrey V. Elsukov	06fec20791	Remove link-local multicast routes remnants from in6_purgeaddr. Also merge in6_purgeaddr_mc with in6_purgeaddr. Sponsored by: Yandex LLC	2014-11-10 16:01:31 +00:00
Gleb Smirnoff	e6abaf91f4	Consistently use if_link. Reviewed by: ae, melifaro	2014-11-10 15:56:30 +00:00
Andrey V. Elsukov	45d1880a36	For now handle only multicast addresses, we still use routes to LLA unicasts yet. Sponsored by: Yandex LLC	2014-11-10 10:59:08 +00:00
Alexander V. Chernikov	f7bab8d0dd	Switch route radix to dual-lock model: use rmlock for data patch access, and config rwlock for conrol plane processing. Route table changes require bock locks held.	2014-11-10 00:07:06 +00:00
Andrey V. Elsukov	ea455de91d	Use embedded scope zone id to determine outgoing interface for link-local and node-local addresses.	2014-11-09 22:54:40 +00:00
Alexander V. Chernikov	36f34ac70b	Fix nd6_output_flush() prototype. Remove 'net/route_internal.h' header from stf.	2014-11-09 22:16:50 +00:00
Alexander V. Chernikov	603eaf792b	Renove faith(4) and faithd(8) from base. It looks like industry have chosen different (and more traditional) stateless/statuful NAT64 as translation mechanism. Last non-trivial commits to both faith(4) and faithd(8) happened more than 12 years ago, so I assume it is time to drop RFC3142 in FreeBSD. No objections from: net@	2014-11-09 21:33:01 +00:00
Alexander V. Chernikov	d0f9fca40d	Remove forgotten arguments.	2014-11-09 16:57:31 +00:00
Alexander V. Chernikov	033074c440	Replace 'struct route ' if_output() argument with 'struct nhop_info '. Leave 'struct route' as is for legacy routing api users. Remove most of rtalloc_ign*-derived functions.	2014-11-09 16:33:04 +00:00
Alexander V. Chernikov	9c9bde01d1	Remove unused 'struct route *' argument from nd6_output_flush().	2014-11-09 16:20:27 +00:00
Alexander V. Chernikov	55e5eda676	Separate radix and routing: use different structures for route and for other customers. Introduce new 'struct rib_head' for routing purposes and make all routing api use it.	2014-11-09 00:36:39 +00:00
Andrey V. Elsukov	3e88eb903b	Remove ip6_getdstifaddr() and all functions to work with auxiliary data. It isn't safe to keep unreferenced ifaddrs. Use in6ifa_ifwithaddr() to determine ifaddr corresponding to destination address. Since currently we keep addresses with embedded scope zone, in6ifa_ifwithaddr is called with zero zoneid and marked with XXX. Also remove route and lle lookups from ip6_input. Use in6ifa_ifwithaddr() instead. Sponsored by: Yandex LLC	2014-11-08 19:38:34 +00:00
Alexander V. Chernikov	a9413f6ca0	Sync to HEAD@r274297.	2014-11-08 18:13:35 +00:00
Alexander V. Chernikov	1398ffe5bc	Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2014-11-08 16:38:15 +00:00
Alexander V. Chernikov	3939f50c88	Finish r274290#2: remove unused IPv6 code.	2014-11-08 16:31:11 +00:00
Alexander V. Chernikov	22b08fd8b7	Split radix implementation and system route table structure: use new 'struct radix_head' for radix.	2014-11-07 22:52:02 +00:00
Andrey V. Elsukov	f325335caf	Overhaul if_gre(4). Split it into two modules: if_gre(4) for GRE encapsulation and if_me(4) for minimal encapsulation within IP. gre(4) changes: * convert to if_transmit; * rework locking: protect access to softc with rmlock, protect from concurrent ioctls with sx lock; * correct interface accounting for outgoing datagramms (count only payload size); * implement generic support for using IPv6 as delivery header; * make implementation conform to the RFC 2784 and partially to RFC 2890; * add support for GRE checksums - calculate for outgoing datagramms and check for inconming datagramms; * add support for sending sequence number in GRE header; * remove support of cached routes. This fixes problem, when gre(4) doesn't work at system startup. But this also removes support for having tunnels with the same addresses for inner and outer header. * deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD. Use our standard ioctls for tunnels. me(4): * implementation conform to RFC 2004; * use if_transmit; * use the same locking model as gre(4); PR: 164475 Differential Revision: D1023 No objections from: net@ Relnotes: yes Sponsored by: Yandex LLC	2014-11-07 19:13:19 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Gleb Smirnoff	428cf06b31	Remove VNET_SYSCTL_ARG(). The generic sysctl(9) code handles that. Reviewed by: ae Sponsored by: Nginx, Inc.	2014-11-07 08:58:05 +00:00
Alexander V. Chernikov	064b1bdb2d	Convert lle rtchecks to use new routing API. For inet/ case, this involves reverting r225947 which seem to be pretty strange commit and should be reverted in HEAD ad well.	2014-11-06 23:35:22 +00:00
Alexander V. Chernikov	146a181f28	Finish r274118: remove useless fields from struct domain. Sponsored by: Yandex LLC	2014-11-06 14:39:04 +00:00
Alexander V. Chernikov	1a75e3b20f	Make checks for rt_mtu generic: Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking might be an option in some situation, it is not feasible to do MTU checks there: generic (or per-domain) routing code is perfectly capable of doing this. We currrently have 3 places where MTU is altered: 1) route addition. In this case domain overrides radix _addroute callback (in[6]_addroute) and all necessary checks/fixes are/can be done there. 2) route change (especially, GW change). In this case, there are no explicit per-domain calls, but one can override rte by setting ifa_rtrequest hook to domain handler (inet6 does this). 3) ifconfig ifaceX mtu YYYY In this case, we have no callbacks, but ip[6]_output performes runtime checks and decreases rt_mtu if necessary. Generally, the goals are to be able to handle all MTU changes in control plane, not in runtime part, and properly deal with increased interface MTU. This commit changes the following: * removes hooks setting MTU from drivers side * adds proper per-doman MTU checks for case 1) * adds generic MTU check for case 2) * The latter is done by using new dom_ifmtu callback since if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size. However, IPv6 mtu might be different from if_mtu one (e.g. default 1280) for some cases, so we need an abstract way to know maximum MTU size for given interface and domain. * moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies user-supplied data which must be checked. * removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to use this functions on new non-inserted rte. More changes will follow soon. MFC after: 1 month Sponsored by: Yandex LLC	2014-11-06 13:13:09 +00:00
Alexander V. Chernikov	9f25cbe45e	Remove old hack abusing domattach from NFS code. According to IANA RPC uaddr registry, there are no AFs except IPv4 and IPv6, so it's not worth being too abstract here. Remove ne_rtable[AF_MAX+1] and use explicit per-AF radix tries. Use own initialization without relying on domattach code. While I admit that this was one of the rare places in kernel networking code which really was capable of doing multi-AF without any AF-depended code, it is not possible anymore to rely on dom* code. While here, change terrifying "Invalid radix node head, rn:" message, to different non-understandable "netcred already exists for given addr/mask", but less terrifying. Since we know that rn_addaddr() returns NULL if the same record already exists, we should provide more friendly error. MFC after: 1 month	2014-11-05 00:58:01 +00:00
Alexander V. Chernikov	69b74805d5	Convert gif and stf to use new routing api.	2014-11-04 18:48:13 +00:00
Alexander V. Chernikov	5c9ef37854	Sync to HEAD@r274095.	2014-11-04 18:22:33 +00:00
Alexander V. Chernikov	8c3cfe0be0	Hide 'struct rtentry' and all its macro inside new header: net/route_internal.h The goal is to make its opaque for all code except route/rtsock and proto domain _rmx.	2014-11-04 17:28:13 +00:00
Alexander V. Chernikov	a9ac00b76b	Convert in6p_lookup_mcast_ifp() to use new routing api. * Add special fib6_lookup_nh_ifp() to return rt_ifp instead of rt_ifa->ifa_ifp for that.	2014-11-04 17:05:24 +00:00
Alexander V. Chernikov	257480b8ab	Convert netinet6/ to use new routing API. * Remove &ifpp from ip6_output() in favor of ri->ri_nh_info * Provide different wrappers to in6_selectsrc: Currently it is used by 2 differenct type of customers: - socket-based one, which all are unsure about provided address scope and - in-kernel ones (ND code mostly), which don't have any sockets, options, crededentials, etc. So, we provide two different wrappers to in6_selectsrc() returning select source. * Make different versions of selectroute(): Currenly selectroute() is used in two scenarios: - SAS, via in6_selecsrc() -> in6_selectif() -> selectroute() - output, via in6_output -> wrapper -> selectroute() Provide different versions for each customer: - fib6_lookup_nh_basic()-based in6_selectif() which is capable of returning interface only, without MTU/NHOP/L2 calculations - full-blown fib6_selectroute() with cached route/multipath/ MTU/L2 * Stop using routing table for link-local address lookups * Add in6_ifawithifp_lla() to make for-us check faster for link-local * Add in6_splitscope / in6_setllascope for faster embed/deembed scopes	2014-11-04 15:39:56 +00:00
Hiroki Sato	da1304cb42	Fix a bug which prevented ND6_IFF_IFDISABLED flag from clearing when the newly-added IPv6 address was /128. PR: 188032	2014-11-02 21:58:31 +00:00
Andrey V. Elsukov	94a43496c2	Remove redundant code. if_detach already did these steps. Also, now we didn't keep routes to link-local addresses. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-10-30 12:44:46 +00:00
Andrey V. Elsukov	3c268b3afc	Move ifq drain into in6m_purge(). Suggested by: bms MFC after: 1 week Sponsored by: Yandex LLC	2014-10-30 11:34:07 +00:00
Andrey V. Elsukov	8ff1eae10d	Fix mbuf leak in IPv6 multicast code. When multicast capable interface goes away, it leaves multicast groups, this leads to generate MLD reports, but MLD code does deffered send and MLD reports are queued in the in6_multi's in6m_scq ifq. The problem is that in6_multi structures are freed when interface leaves multicast groups and thread that does deffered send will not take these queued packets. PR: 194577 MFC after: 1 week Sponsored by: Yandex LLC	2014-10-30 10:59:57 +00:00
Andrey V. Elsukov	c56173a626	Do not automatically install routes to link-local and interface-local multicast addresses. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-10-27 16:15:15 +00:00
Andrey V. Elsukov	8e4bdfa2db	Remove unused function. Sponsored by: Yandex LLC	2014-10-27 10:34:09 +00:00
Alexander V. Chernikov	30514718e7	Convert several places inside netinet6/ to new api.	2014-10-25 22:53:08 +00:00
Andrey V. Elsukov	a663aa4ce8	Remove redundant check and m_pullup() call.	2014-10-24 13:34:22 +00:00
Andrey V. Elsukov	0b9f5f8a5f	Overhaul if_gif(4): o convert to if_transmit; o use rmlock to protect access to gif_softc; o use sx lock to protect from concurrent ioctls; o remove a lot of unneeded and duplicated code; o remove cached route support (it won't work with concurrent io); o style fixes. Reviewed by: melifaro Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2014-10-14 13:31:47 +00:00
Robert Watson	f0cace5d94	When deciding whether to call m_pullup() even though there is adequate data in an mbuf, use M_WRITABLE() instead of a direct test of M_EXT; the latter both unnecessarily exposes mbuf-allocator internals in the protocol stack and is also insufficient to catch all cases of non-writability. (NB: m_pullup() does not actually guarantee that a writable mbuf is returned, so further refinement of all of these code paths continues to be required.) Reviewed by: bz MFC after: 3 days Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D900	2014-10-12 15:49:52 +00:00
Bryan Venteicher	81d3ec1763	Add context pointer and source address to the UDP tunnel callback These are needed for the forthcoming vxlan implementation. The context pointer means we do not have to use a spare pointer field in the inpcb, and the source address is required to populate vxlan's forwarding table. While I highly doubt there is an out of tree consumer of the UDP tunneling callback, this change may be a difficult to eventually MFC. Phabricator: https://reviews.freebsd.org/D383 Reviewed by: gnn	2014-10-10 06:08:59 +00:00
Bryan Venteicher	a0a9e1b57c	Add missing UDP multicast receive dtrace probes Phabricator: https://reviews.freebsd.org/D924 Reviewed by: rpaulo markj MFC after: 1 month	2014-10-09 22:36:21 +00:00
Bryan Venteicher	514929b193	Move the calls to u_tun_func() into udp6_append() A similar cleanup for UDPv4 was performed in r220620. Phabricator: https://reviews.freebsd.org/D383 Reviewed by: gnn MFC after: 1 month	2014-10-09 05:42:07 +00:00
Michael Tuexen	5558cc334d	Fix a bug introduced in https://svnweb.freebsd.org/base?view=revision&revision=272347 MFC after: 3 days	2014-10-07 16:01:17 +00:00
Michael Tuexen	4e1730b532	UPD and UDPLite require a checksum. So check for it. MFC after: 3 days	2014-10-03 08:46:49 +00:00
Michael Tuexen	5055cfcb4d	Check for UDP/IPv6 packets that the length in the UDP header is at least the minimum. Make the check similar to the one for UDPLite/IPv6. MFC after: 3 days	2014-10-02 10:49:01 +00:00
Michael Tuexen	76b96fbc9e	Fix the checksum computation for UDPLite/IPv6. This requires the usage of a function computing the checksum only over a part of the function. Therefore introduce in6_cksum_partial() and implement in6_cksum() based on that. While there, ensure that the UDPLite packet contains at least enough bytes to contain the header. Reviewed by: kevlo MFC after: 3 days	2014-10-02 10:32:24 +00:00
Hiroki Sato	9c57a5b630	Add an additional routing table lookup when m->m_pkthdr.fibnum is changed at a PFIL hook in ip{,6}_output(). IPFW setfib rule did not perform a routing table lookup when the destination address was not changed. CR: D805	2014-10-02 00:25:57 +00:00
Alexander V. Chernikov	31f0d081d8	Remove lock init from radix.c. Radix has never managed its locking itself. The only consumer using radix with embeded rwlock is system routing table. Move per-AF lock inits there.	2014-10-01 14:39:06 +00:00
Michael Tuexen	83e95fb30b	The default for UDPLITE_RECV_CSCOV is zero. RFC 3828 recommend that this means full checksum coverage for received packets. If an application is willing to accept packets with partial coverage, it is expected to use the socekt option and provice the minimum coverage it accepts. Reviewed by: kevlo MFC after: 3 days	2014-10-01 05:43:29 +00:00
Michael Tuexen	0f4a03663b	If the checksum coverage field in the UDPLITE header is the length of the complete UDPLITE packet, the packet has full checksum coverage. SO fix the condition. Reviewed by: kevlo MFC after: 3 days	2014-09-30 18:17:28 +00:00
Andrey V. Elsukov	d1729484d4	Remove redundant call to ipsec_getpolicybyaddr(). ipsec_hdrsiz() will call it internally. Sponsored by: Yandex LLC	2014-09-30 13:15:19 +00:00
Kevin Lo	0bc40ebf00	When plen != ulen, it should only be checked when this is UDP. Spotted by: bryanv	2014-09-30 07:28:31 +00:00
Alan Somers	4f8585e021	Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and ifa_ifwithdstaddr. For the sake of backwards compatibility, the new arguments were added to new functions named ifa_ifwithnet_fib and ifa_ifwithdstaddr_fib, while the old functions became wrappers around the new ones that passed RT_ALL_FIBS for the fib argument. However, the backwards compatibility is not desired for FreeBSD 11, because there are numerous other incompatible changes to the ifnet(9) API. We therefore decided to remove it from head but leave it in place for stable/9 and stable/10. In addition, this commit adds the fib argument to ifa_ifwithbroadaddr for consistency's sake. sys/sys/param.h Increment __FreeBSD_version sys/net/if.c sys/net/if_var.h sys/net/route.c Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute. sys/net/route.c sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_options.c sys/netinet/ip_output.c sys/netinet6/nd6.c Fixup calls of modified functions. share/man/man9/ifnet.9 Document changed API. CR: https://reviews.freebsd.org/D458 MFC after: Never Sponsored by: Spectra Logic	2014-09-11 20:21:03 +00:00
Andrey V. Elsukov	343e440f63	Add const qualifier to in6_addrhash() function. Add in6ifa_ifwithaddr() function. It is similar to ifa_ifwithaddr, but does fast lookup in the hash of inet6 addresses. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-11 13:18:41 +00:00
Andrey V. Elsukov	80803aa289	* use M_ZERO flag with malloc instead of explicit zeroing. * remove MULTI_SCOPE ifdef. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-11 12:54:17 +00:00
Andrey V. Elsukov	41874e85d6	Introduce new scope related functions. * new macro to remove magic number - IPV6_ADDR_SCOPES_COUNT; * sa6_checkzone() - this function checks sockaddr_in6 structure for correctness of sin6_scope_id. It also can fill correct value sometimes. * in6_getscopezone() - this function returns scope zone id for specified interface and scope. * in6_getlinkifnet() - this function returns struct ifnet for corresponding zone id of link-local scope. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-11 12:33:37 +00:00
Andrey V. Elsukov	573791d01c	* constify argument of in6_addrscope(); * use IN6_IS_ADDR_XXX() macro instead of hardcoded values; * for multicast addresses just return scope value, the only exception is addresses with 0x0F scope value (RFC 4291 p2.7.0); Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-11 10:27:59 +00:00
Andrey V. Elsukov	9196891fc9	Add additional checks for IPV6_PKTINFO handling (RFC 3542): * Return ENETDOWN when interface specified by ipi6_ifindex is not enabled for IPv6 use. * Return EADDRNOTAVAIL when ipi6_ifindex specifies an interface, but the address ipi6_addr is not available for use on that interface. * Return EINVAL when ipi6_addr is multicast address. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-10 14:32:07 +00:00
Andrey V. Elsukov	a7e201bbac	Make in6_pcblookup_hash_locked and in6_pcbladdr static. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-09-10 13:17:35 +00:00
Andrey V. Elsukov	1b44e5ffe3	Introduce INP6_PCBHASHKEY macro. Replace usage of hardcoded part of IPv6 address as hash key in all places. Obtained from: Yandex LLC	2014-09-10 12:35:42 +00:00
Andrey V. Elsukov	5dbfa43f65	Add the ability to set `prefer_source' flag to an IPv6 address. It affects the IPv6 source address selection algorithm (RFC 6724) and allows override the last rule ("longest matching prefix") for choosing among equivalent addresses. The address with `prefer_source' will be preferred source address. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2014-09-09 10:52:50 +00:00
Adrian Chadd	a4d98bf442	Add basic RSS awareness for the UDPv6 send path. This doesn't include the same kind of userland overriding that the IPv4 path has; nor does it yet know about 2-tuple versus 4-tuple hashing. That'll come later. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan	2014-09-09 04:20:53 +00:00
Adrian Chadd	b174de323a	Add IP_NODEFAULTFLOWID awareness to ip6_output(). Differential Revision: https://reviews.freebsd.org/D527	2014-09-09 00:21:21 +00:00
Michael Tuexen	24aaac8d59	Use union sctp_sockstore instead of struct sockaddr_storage. This eliminiates some warnings when building in userland. Thanks to Patrick Laimbock for reporting this issue. Remove also some unnecessary casts. There should be no functional change. MFC after: 1 week	2014-09-07 09:06:26 +00:00
Andrey V. Elsukov	ccc53de916	Add the reverse part to rule #9 . Also change its description in the netstat(8) output. MFC after: 1 week	2014-09-01 09:30:34 +00:00
Mark Johnston	5fc2632281	Add some missing checks for unsupported interfaces (e.g. pflog(4)) when handling ioctls. While here, remove duplicated checks for a NULL ifp in in6_control(): this check is already done near the beginning of the function. PR: 189117 Reviewed by: hrs MFC after: 2 weeks	2014-08-22 19:21:08 +00:00
Kevin Lo	73d76e77b6	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
Kevin Lo	8f5a8818f5	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
Andrey V. Elsukov	d6e6b9943b	Add new rule to source address selection algorithm. It prefers address with better virtual status. Use ifa_preferred() to choose better address. PR: 187341 Tested by: des MFC after: 1 week	2014-07-30 15:08:12 +00:00
Gleb Smirnoff	9753faf553	Garbage collect couple of unused fields from struct ifaddr: - ifa_claim_addr() unused since removal of NetAtalk - ifa_metric seems to be never utilized, always a copy of if_metric	2014-07-29 15:01:29 +00:00
Hiroki Sato	9be09a6e43	Fix EtherIP. TOS field must be initialized when the inner protocol is PF_LINK, and multicast/broadcast flag should always be dropped because the outer protocol uses unicast even when the inner address is not for unicast. It had been broken since r236951 when gif_output() started to use IFQ_HANDOFF().	2014-07-24 10:42:47 +00:00
Adrian Chadd	0ae3f42231	When it's time to do 4-tuple UDP IPv6 hashing, make sure this is a known type.	2014-07-20 07:39:54 +00:00
Adrian Chadd	c7c0d94874	Add IPv6 flowid, bindmulti and RSS awareness.	2014-07-12 05:46:33 +00:00
Adrian Chadd	a8a2d8003a	Add INP_RSS_BUCKET_SET awareness for IPv6 pcbgroup entries. This ensures that a listen socket with INP_RSS_BUCKET_SET set will use the pre-determined PCBGROUP rather than what the hashing path chooses.	2014-07-12 05:45:53 +00:00
Adrian Chadd	6e4405cee1	Add the IPv6 versions of the multi-bind, hash/hash type and RSS options.	2014-07-12 05:44:16 +00:00
Andrey V. Elsukov	ff899182ec	Fix condition. Sponsored by: Yandex LLC	2014-07-11 06:34:15 +00:00
Bryan Venteicher	6700a7d44b	Use the appropriate IPv6 hashtype defines when looking up the PCBGROUP Reviewed by: adrian@	2014-07-07 00:02:49 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Hajimu UMEMOTO	f4839cbc0a	Make nd6_gctimer tunable. MFC after: 1 week	2014-06-23 16:27:29 +00:00
Kevin Lo	ea93c6a613	Catch up with r186809, correct comments.	2014-06-23 05:17:39 +00:00
Andrey V. Elsukov	45b4fb0449	Remove unused variable. Sponsored by: Yandex LLC	2014-06-08 09:08:51 +00:00
Alan Somers	2f308a343f	Fix unintended KBI change from r264905. Add _fib versions of ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the _fib() versions with RT_ALL_FIBS, preserving legacy behavior. sys/net/if_var.h sys/net/if.c Add legacy-compatible functions as described above. Ensure legacy behavior when RT_ALL_FIBS is passed as fibnum. sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/net/route.c sys/net/rtsock.c sys/netinet6/nd6.c Call with _fib() functions if we must use a specific fib, or the legacy functions otherwise. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c Improve the udp_dontroute test. The bug that this test exercises is that ifa_ifwithnet() will return the wrong address, if multiple interfaces have addresses on the same subnet but with different fibs. The previous version of the test only considered one possible failure mode: that ifa_ifwithnet_fib() might fail to find any suitable address at all. The new version also checks whether ifa_ifwithnet_fib() finds the correct address by checking where the ARP request goes. Reported by: bz, hrs Reviewed by: hrs MFC after: 1 week X-MFC-with: 264905 Sponsored by: Spectra Logic	2014-05-29 21:03:49 +00:00
Hiroki Sato	82a9fa4a1d	Add rwlock to struct dadq. A panic could occur when a large number of addresses performed DAD at the same time.	2014-05-29 20:53:53 +00:00
VANHULLEBUS Yvan	aaf2cfc0d6	Fixed IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels. For IPv6-in-IPv4, you may need to do the following command on the tunnel interface if it is configured as IPv4 only: ifconfig <interface> inet6 -ifdisabled Code logic inspired from NetBSD. PR: kern/169438 Submitted by: emeric.poupon@netasq.com Reviewed by: fabient, ae Obtained from: NETASQ	2014-05-28 12:45:27 +00:00
Hiroki Sato	705bef548a	Cancel DAD for an ifa when the ifp has ND6_IFF_IFDISABLED as early as possible and do not clear IN6_IFF_TENTATIVE. If IFDISABLED was accidentally set after a DAD started, TENTATIVE could be cleared because no NA was received due to IFDISABLED, and as a result it could prevent DAD when manually clearing IFDISABLED after that.	2014-05-16 15:53:31 +00:00
Alexander V. Chernikov	b980262e63	Pass radix head ptr along with rte to rtexpunge(). Rename rtexpunge to rt_expunge().	2014-05-03 16:28:54 +00:00
Alexander V. Chernikov	cf58751a44	Use "hash" value in rtalloc_mpath_fib() instead of RTF_ANNOUNCE flag. Hashing method is the same as in in6_src.c. (Probably we need better one). MFC after: 2 weeks	2014-04-26 16:46:33 +00:00
Alexander V. Chernikov	36d55f0f9d	Unify sa_equal() macro usage. MFC after: 2 weeks	2014-04-26 14:52:03 +00:00
Alan Somers	0cfee0c223	Fix subnet and default routes on different FIBs on the same subnet. These two bugs are closely related. The root cause is that ifa_ifwithnet does not consider FIBs when searching for an interface address. sys/net/if_var.h sys/net/if.c Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those functions will only return an address whose interface fib equals the argument. sys/net/route.c Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib arguments. sys/netinet/in.c Update in_addprefix to consider the interface fib when adding prefixes. This will prevent it from not adding a subnet route when one already exists on a different fib. sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/netinet6/nd6.c Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet. In some cases it there wasn't a clear specific fib number to use. In others, I was unable to test those functions so I chose RT_DEFAULT_FIB to minimize divergence from current behavior. I will fix some of the latter changes along with PR kern/187553. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c tests/sys/netinet/Makefile Revert r263738. The udp_dontroute test was right all along. However, bugs kern/187550 and kern/187553 cancelled each other out when it came to this test. Because of kern/187553, ifa_ifwithnet searched the default fib instead of the requested one, but because of kern/187550, there was an applicable subnet route on the default fib. The new test added in r263738 doesn't work right, however. I can verify with dtrace that ifa_ifwithnet returned the wrong address before I applied this commit, but route(8) miraculously found the correct interface to use anyway. I don't know how. Clear expected failure messages for kern/187550 and kern/187552. PR: kern/187550 PR: kern/187552 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic	2014-04-24 23:56:56 +00:00
Andrey V. Elsukov	52c57247d3	Remove unused variable. PR: 173521 MFC after: 1 week Sponsored by: Yandex LLC	2014-04-17 06:40:11 +00:00
Andrey V. Elsukov	4fd913364f	Properly release the in6_multi lock. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-12 02:05:31 +00:00
Kevin Lo	d1b18731d9	Minor style cleanups.	2014-04-07 01:55:53 +00:00
Kevin Lo	e06e816f67	Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks. Tested with vlc and a test suite [1]. [1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz Reviewed by: jhb, glebius, adrian	2014-04-07 01:53:03 +00:00
Andrey V. Elsukov	cd71804c84	Remove unused label. MFC after: 1 week	2014-03-31 14:40:35 +00:00
Andrey V. Elsukov	27aa751c90	Don't generate an ICMPv6 error message if packet was consumed by filter. MFC after: 1 week Sponsored by: Yandex LLC	2014-03-31 14:27:22 +00:00
Robert Watson	7527624efa	Several years after initial development, merge prototype support for linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge)	2014-03-15 00:57:50 +00:00
Gleb Smirnoff	aa69c61235	Since both netinet/ and netinet6/ call into netipsec/ and netpfil/, the protocol specific mbuf flags are shared between them. - Move all M_FOO definitions into a single place: netinet/in6.h, to avoid future clashes. - Resolve clash between M_DECRYPTED and M_SKIP_FIREWALL which resulted in a failure of operation of IPSEC and packet filters. Thanks to Nicolas and Georgios for all the hard work on bisecting, testing and finally finding the root of the problem. PR: kern/186755 PR: kern/185876 In collaboration with: Georgios Amanakis <gamanakis gmail.com> In collaboration with: Nicolas DEFFAYET <nicolas-ml deffayet.com> Sponsored by: Nginx, Inc.	2014-03-12 14:29:08 +00:00
Gleb Smirnoff	e3a7aa6f56	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-05 01:17:47 +00:00
John Baldwin	5b26ea5df3	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
Craig Rodrigues	47a79fadc6	Remove KASSERT from in6p_lookup_mcast_ifp(). When the devel/jenkins port, version 1.551 was started, the kernel would panic if INVARIANTS was enabled in the kernel config. Suggested by: bms	2014-02-23 01:27:22 +00:00
Gleb Smirnoff	0ff96b4f55	o Remove at compile time the HASH_ALL code, that was never tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-17 11:50:56 +00:00
Alexander V. Chernikov	f6990c4e3e	Further simplify nd6_output_lle. Currently we have 3 usage patterns: 1) nd6_output (most traffic flow, no lle supplied, lle RLOCK sufficient) 2) corner cases for output (no lle, STALE lle, so on). lle WLOCK needed. 3) nd* iunternal machinery (WLOCK'ed lle provided, perform packet queing). We separate case 1 and implement it inside its only customer - nd6_output. This leads to some code duplication (especialy SEND stuff, which should be hooked to output in a different way), but simplifies locking and control flow logic fir nd6_output_lle. Reviewed by: ae MFC after: 3 weeks Sponsored by: Yandex LLC	2014-02-13 19:09:04 +00:00
Andrey V. Elsukov	e4c77ca0c0	Drop packets to multicast address whose scop field contains the reserved value 0. MFC after: 1 week Sponsored by: Yandex LLC	2014-02-13 14:10:44 +00:00
Christian Brueffer	d37872314f	Only count table lookups when we're actually processing packets. PR: 183462 Submitted by: Sven-Thorsten Dietrich <thebigcorporation at gmail.com> Reviewed by: bms MFC after: 1 month	2014-02-10 14:47:51 +00:00
Christian Brueffer	1b55364ed9	For IPv6, return the same error code as IPv4 when mrouter is not initialized. PR: 178472 Submitted by: Sven-Thorsten Dietrich <sven at vyatta.com> Reviewed by: bms	2014-02-10 14:36:51 +00:00
Alexander V. Chernikov	9dffa6a3f3	Simplify nd6_output_lle: * Check ND6_IFF_IFDISABLED before acquiring any locks * Assume m is always non-NULL * remove 'bad' case not used anymore * Simply if_output conditional MFC after: 2 weeks Sponsored by: Yandex LLC	2014-02-10 12:52:33 +00:00
Gleb Smirnoff	5d6d7e756b	o Revamp API between flowtable and netinet, netinet6. - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-07 15:18:23 +00:00
Andrey V. Elsukov	74a976fffd	Unlock entry before retry. Submitted by: melifaro MFC after: 1 week	2014-02-07 10:58:46 +00:00
Andrey V. Elsukov	51eecdc35a	Take exclusive lock only when lle isn't NULL. We don't need write access to lle in most cases. MFC after: 1 week Sponsored by: Yandex LLC	2014-02-02 07:28:04 +00:00
Alexander V. Chernikov	f6b84910bb	Further rework netinet6 address handling code: * Set ia address/mask values BEFORE attaching to address lists. Inet6 address assignment is not atomic, so the simplest way to do this atomically is to fill in ia before attach. * Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code). * Do some renamings: in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here) in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code) in6_ifaddloop -> nd6_add_ifa_lle in6_ifremloop -> nd6_rem_ifa_lle * Split working with LLE and route announce code for last two. Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour. * Call device SIOCSIFADDR handler IFF we're adding first address. In IPv4 we have to call it on every address change since ARP record is installed by arp_ifinit() which is called by given handler. IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so there is no reason to call SIOCSIFADDR often.	2014-01-19 16:07:27 +00:00
Alexander V. Chernikov	0c5d4bde90	Use in6_localip() instead of hand-rolled cycle. MFC after: 2 weeks	2014-01-18 20:54:55 +00:00
Alexander V. Chernikov	9080e7d023	Add in6_prepare_ifra() function to ease preparing in-kernel IPv6 address requests. MFC after: 2 weeks	2014-01-18 20:32:59 +00:00
Alexander V. Chernikov	b6a16fc853	Do some style(9) not done in r260851 to improve readability. MFC after: 2 weeks	2014-01-18 15:57:43 +00:00
Alexander V. Chernikov	60d7c722a5	Split in6_update_ifa() into smaller pieces leaving functionality intact. Discussed with: ae MFC after: 2 weeks	2014-01-18 15:52:52 +00:00
Andrey V. Elsukov	e74966f60b	Mechanically replace direct accessing to if_xname to using if_name() macro.	2014-01-10 12:33:28 +00:00
John-Mark Gurney	f2effe745c	revert part of r260485 which changes how part of the header gets included.. netstat uses -DKERNEL=1 to get these parts and breaks the build w/o it... melifaro@ says that ae@ is probably asleep, and the PR doesn't have this part of the patch... Probably a local change got in by accident.. PR: 185148 Pointy hat to: ae@	2014-01-09 22:41:18 +00:00
Andrey V. Elsukov	78415d1082	Remove extra nesting from X_ip6_mforward() function. Also remove disabled definitions from ip6_mroute.h. PR: 185148 Sponsored by: Yandex LLC	2014-01-09 15:38:28 +00:00
Andrey V. Elsukov	0a6b0ffa54	Add MRT6_DLOG() macro for debugging. Reduce number of MRT6DEBUG ifdefs and fix some broken format strings. MFC after: 1 week Sponsored by: Yandex LLC	2014-01-09 14:58:06 +00:00
Alexander V. Chernikov	1dc8f6a82c	Introduce IN6_MASK_ADDR() macro to unify various hand-rolled code to do IPv6 addr & mask in different places. MFC after: 2 weeks	2014-01-08 22:13:32 +00:00
Andrey V. Elsukov	b88aef1dcf	Use pointer to struct sockaddr_in6 in lla_lookup() call. This prevents from triggering KASSERT in in6_lltable_lookup.	2014-01-03 02:40:56 +00:00
Andrey V. Elsukov	e2d14d9317	Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with LLE_CREATE flag. MFC after: 1 week	2014-01-03 02:32:05 +00:00
Andrey V. Elsukov	ea0c377602	lla_lookup() does modification only when LLE_CREATE is specified. Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing lla_lookup() without LLE_CREATE flag. Reviewed by: glebius, adrian MFC after: 1 week Sponsored by: Yandex LLC	2014-01-02 08:40:37 +00:00
Adrian Chadd	c445d2520d	Use an RLOCK here instead of an RWLOCK - matching all the other calls to lla_lookup(). This drastically reduces the very high lock contention when doing parallel TCP throughput tests (> 1024 sockets) with IPv6. Tested: * parallel IPv6 TCP bulk data exchange, 8192 sockets MFC after: 1 week Sponsored by: Netflix, Inc.	2014-01-01 00:56:26 +00:00
Bjoern A. Zeeb	010c2b8192	Correct warnings comparing unsigned variables < 0 constantly reported while building kernels. All instances removed are indeed unsigned so the expressions could not be true. MFC after: 1 week	2013-12-25 20:08:44 +00:00
Dimitry Andric	6c5a340e56	In sys/netinet6/in6_mcast.c, in6m_is_ifp_detached() is only used whenever KTR is defined, so put it between #ifdef KTR guards. This avoids a warning about a unused function if KTR is not enabled. MFC after: 3 days	2013-12-24 20:30:13 +00:00
Andrey V. Elsukov	569aad57d2	Free mbuf in case of error. MFC after: 1 week	2013-12-17 10:53:17 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Andrey V. Elsukov	ee674966f4	Fix panic with RADIX_MPATH, when RTFREE_LOCKED() called for already unlocked route. Use in6_rtalloc() instead of in6_rtalloc1. This helps simplify the code and remove several now unused variables. PR: 156283 MFC after: 2 weeks	2013-11-11 12:49:00 +00:00
Gleb Smirnoff	555036b5f6	Remove never used ioctls that originate from KAME. The proof of their zero usage was exp-run from misc/183538.	2013-11-11 05:39:42 +00:00
Michael Tuexen	b54ddf225f	Changes from upstream to improve compilation when INET or INET6 or none of them is defined. MFC after: 3 days	2013-11-02 20:12:19 +00:00
Gleb Smirnoff	c3322cb91c	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
Gleb Smirnoff	eedc7fd9e8	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Andrey V. Elsukov	baa09f1891	Initialize inc_fibnum for properly handling ICMP6_PACKET_TOO_BIG errors in multifib environment. PR: 183265 MFC after: 1 week	2013-10-25 01:02:25 +00:00
Gleb Smirnoff	7caf4ab7ac	- Utilize counter(9) to accumulate statistics on interface addresses. Add four counters to struct ifaddr. This kills '+=' on a variables shared between processors for every packet. - Nuke struct if_data from struct ifaddr. - In ip_input() do not put a reference on ifaddr, instead update statistics right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9) for every packet. [1] - To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in rtsock.c fill if_data fields using counter_u64_fetch(). - Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which took if_data not from the ifaddr, but from ifaddr's ifnet. [2] Submitted by: melifaro [1], pluknet[2] Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 11:37:57 +00:00
Gleb Smirnoff	4675896098	Remove ifa_init() and provide ifa_alloc() that will allocate and setup struct ifaddr internally. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:31:42 +00:00
Gleb Smirnoff	6ed910fabe	Hide 'struct ifaddr' definition from userland. Two tools left that use it, namely ipftest(1) and ifmcstat(1). These sniff structure definition using _WANT_IFADDR define. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:19:24 +00:00
Gleb Smirnoff	3fa98cf9ac	Remove unsigned < 0 check.	2013-10-15 10:12:19 +00:00
Gleb Smirnoff	ca695e0807	Remove useless check of ia6 against NULL, right after dereferencing it.	2013-10-15 10:11:23 +00:00
Gleb Smirnoff	0218539652	Now counter_u64_t is known to userland, thus remove hack from r253086. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:09:33 +00:00
Hiroki Sato	6378e1f369	Do not try to detach if the interface does not support IPv6. Tested by: hselasky PR: usb/182820 Approved by: re (glebius)	2013-10-10 09:43:15 +00:00
Gleb Smirnoff	491b520174	Fix mbuf leak. Submitted by: Loganaden Velvindron <logan elandsys.com> Obtained from: NetBSD Approved by: re (kib)	2013-10-07 12:07:40 +00:00
Bjoern A. Zeeb	fd291ae3ec	Update comment from draft to RFC number. Submitted by: Loganaden Velvindron (logan elandsys.com) Approved by: re (gjb) MFC after: 6 days	2013-09-22 14:53:07 +00:00
Mikolaj Golub	4d3dfd450a	Unregister inet/inet6 pfil hooks on vnet destroy. Discussed with: andre Approved by: re (rodrigc)	2013-09-13 18:45:10 +00:00
Dag-Erling Smørgrav	1a05c762b9	Fix the length calculation for the final block of a sendfile(2) transmission which could be tricked into rounding up to the nearest page size, leaking up to a page of kernel memory. [13:11] In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR and SIOCSIFNETMASK at the socket layer rather than pass them on to the link layer without validation or credential checks. [SA-13:12] Prevent cross-mount hardlinks between different nullfs mounts of the same underlying filesystem. [SA-13:13] Security: CVE-2013-5666 Security: FreeBSD-SA-13:11.sendfile Security: CVE-2013-5691 Security: FreeBSD-SA-13:12.ifioctl Security: CVE-2013-5710 Security: FreeBSD-SA-13:13.nullfs Approved by: re	2013-09-10 10:05:59 +00:00
John Baldwin	fa302f207f	Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. This matches the types used when computing hash indices and the type of the maximum size of mfchashtbl[]. PR: kern/181821 Submitted by: Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4) MFC after: 1 week	2013-09-05 14:16:37 +00:00
John Baldwin	fd77bbb967	Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery	2013-08-26 18:16:05 +00:00
Mark Johnston	57f6086735	Implement the ip, tcp, and udp DTrace providers. The probe definitions use dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD. Tested by: gnn, hiren MFC after: 1 month	2013-08-25 21:54:41 +00:00
Michael Tuexen	1a94cdbea7	Provide human readable debug output.	2013-08-25 12:44:03 +00:00
Andre Oppermann	9850f95989	For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits. The upper 32bits are not occupied for now. Sponsored by: The FreeBSD Foundation	2013-08-25 09:49:00 +00:00
Andre Oppermann	1b4381afbb	Restructure the mbuf pkthdr to make it fit for upcoming capabilities and features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation	2013-08-24 19:51:18 +00:00
Xin LI	acde2476c4	Fix an integer overflow in computing the size of a temporary buffer can result in a buffer which is too small for the requested operation. Security: CVE-2013-3077 Security: FreeBSD-SA-13:09.ip_multicast	2013-08-22 00:51:37 +00:00
Andre Oppermann	86bd049144	Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings. Consistently use it within IP, IPv6 and ethernet protocols. Discussed with: trociny, glebius	2013-08-19 13:27:32 +00:00
Andre Oppermann	88388bdcbe	Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific flag instead. The flag is only used within the IP and IPv6 layer 3 protocols. Because some firewall packages treat IPv4 and IPv6 packets the same the flag should have the same value for both. Discussed with: trociny, glebius	2013-08-19 11:08:36 +00:00
Hiroki Sato	5a04191532	Return 0 in nbi->expire when la_expire == 0. Conversion from time_uptime to time_second should not be performed in this case.	2013-08-17 07:14:45 +00:00
Hiroki Sato	ffa0165ae0	Fix incompatibility in ICMPV6CTL_ND6_PRLIST sysctl, and SIOCGPRLST_IN6, SIOCGDRLST_IN6, and SIOCGNBRINFO_IN6 ioctl. These userland interfaces treat expiration times in time_second, not time_uptime.	2013-08-06 17:10:52 +00:00
Hiroki Sato	7d26db1792	- Use time_uptime instead of time_second in data structures for PF_INET6 in kernel. This fixes various malfunction when the wall time clock is changed. Bump __FreeBSD_version to 1000041. - Use clock_gettime(CLOCK_MONOTONIC_FAST) in userland utilities. MFC after: 1 month	2013-08-05 20:13:02 +00:00

... 3 4 5 6 7 ...

1680 Commits