freebsd-skq

Author	SHA1	Message	Date
tuexen	fcb532bf04	Use consistently uint32_t for mtu values. This does not change functionality, but this cleanup is need for further improvements of ICMP handling. MFC after: 1 week	2017-04-26 19:26:40 +00:00
kp	c87b6306b6	Rename variable for clarity Rename the mtu variable in ip6_fragment(), because mtu is misleading. The variable actually holds the fragment length. No functional change. Suggested by: ae	2017-04-22 13:04:36 +00:00
kp	4f3397263b	pf: Fix possible incorrect IPv6 fragmentation When forwarding pf tracks the size of the largest fragment in a fragmented packet, and refragments based on this size. It failed to ensure that this size was a multiple of 8 (as is required for all but the last fragment), so it could end up generating incorrect fragments. For example, if we received an 8 byte and 12 byte fragment pf would emit a first fragment with 12 bytes of payload and the final fragment would claim to be at offset 8 (not 12). We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so other users won't make the same mistake. Reported by: Antonios Atlasis <aatlasis at secfu net> MFC after: 3 days	2017-04-20 09:05:53 +00:00
ae	3433e16f91	Rework r316770 to make it protocol independent and general, like we do for streaming sockets. And do more cleanup in the sbappendaddr_locked_internal() to prevent leak information from existing mbuf to the one, that will be possible created later by netgraph. Suggested by: glebius Tested by: Irina Liakh <spell at itl ua> MFC after: 1 week	2017-04-14 09:00:48 +00:00
ae	629029d020	Clear h/w csum flags on mbuf handled by UDP. When checksums of received IP and UDP header already checked, UDP uses sbappendaddr_locked() to pass received data to the socket. sbappendaddr_locked() uses given mbuf as is, and if NIC supports checksum offloading, mbuf contains csum_data and csum_flags that were calculated for already stripped headers. Some NICs support only limited checksums offloading and do not use CSUM_PSEUDO_HDR flag, and csum_data contains some value that UDP/TCP should use for pseudo header checksum calculation. When L2TP is used for tunneling with mpd5, ng_ksocket receives mbuf with filled csum_flags and csum_data, that were calculated for outer headers. When L2TP header is stripped, a packet that was tunneled goes to the IP layer and due to presence of csum_flags (without CSUM_PSEUDO_HDR) and csum_data, the UDP/TCP checksum check fails for this packet. Reported by: Irina Liakh <spell at itl ua> Tested by: Irina Liakh <spell at itl ua> MFC after: 1 week	2017-04-13 17:03:57 +00:00
smh	688581b7e7	Allow explicitly assigned IPv6 loopback address to be used in jails If a jail has an explicitly assigned IPv6 loopback address then allow it to be used instead of remapping requests for the loopback adddress to the first IPv6 address assigned to the jail. This fixes issues where applications attempt to detect their bound port where they requested a loopback address, which was available, but instead the kernel remapped it to the jails first address. This is the same fix applied to IPv4 fix by: r316313 Also: * Correct the description of prison_check_ip6_locked to match the code. MFC after: 2 weeks Relnotes: Yes Sponsored by: Multiplay	2017-03-31 09:10:05 +00:00
karels	2563deee1a	Fix reference count leak with L2 caching. ip_forward, TCP/IPv6, and probably SCTP leaked references to L2 cache entry because they used their own routes on the stack, not in_pcb routes. The original model for route caching was callers that provided a route structure to ip{,6}input() would keep the route, and this model was used for L2 caching as well. Instead, change L2 caching to be done by default only when using a route structure in the in_pcb; the pcb deallocation code frees L2 as well as L3 cacches. A separate change will add route caching to TCP/IPv6. Another suggestion was to have the transport protocols indicate willingness to use L2 caching, but this approach keeps the changes in the network level Reviewed by: ae gnn MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D10059 and those below, will be ignored-- > Description of fields to fill in above: 76 columns --\| > PR: If and which Problem Report is related. > Submitted by: If someone else sent in the change. > Reported by: If someone else reported the issue. > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]\|week[s]\|month[s]]. Request a reminder email. > MFH: Ports tree branch name. Request approval for merge. > Relnotes: Set to 'yes' for mention in release notes. > Security: Vulnerability reference (one per line) or description. > Sponsored by: If the change was sponsored by an organization. > Differential Revision: https://reviews.freebsd.org/D### (full phabric URL needed). > Empty fields above will be automatically removed. M netinet/in_pcb.c M netinet/ip_output.c M netinet6/ip6_output.c	2017-03-25 15:06:28 +00:00
asomers	d1b74add9a	Constrain IPv6 routes to single FIBs when net.add_addr_allfibs=0 sys/netinet6/icmp6.c Use the interface's FIB for source address selection in ICMPv6 error responses. sys/netinet6/in6.c In in6_newaddrmsg, announce arrival of local addresses on the interface's FIB only. In in6_lltable_rtcheck, use a per-fib ND6 cache instead of a single cache. sys/netinet6/in6_src.c In in6_selectsrc, use the caller's fib instead of the default fib. In in6_selectsrc_socket, remove a superfluous check. sys/netinet6/nd6.c In nd6_lle_event, use the interface's fib for routing socket messages. In nd6_is_new_addr_neighbor, check all FIBs when trying to determine whether an address is a neighbor. Also, simplify the code for point to point interfaces. sys/netinet6/nd6.h sys/netinet6/nd6.c sys/netinet6/nd6_rtr.c Make defrouter_select fib-aware, and make all of its callers pass in the interface fib. sys/netinet6/nd6_nbr.c When inputting a Neighbor Solicitation packet, consider the interface fib instead of the default fib for DAD. Output NS and Neighbor Advertisement packets on the correct fib. sys/netinet6/nd6_rtr.c Allow installing the same host route on different interfaces in different FIBs. If rt_add_addr_allfibs=0, only install or delete the prefix route on the interface fib. tests/sys/netinet/fibs_test.sh Clear some expected failures, but add a skip for the newly revealed BUG217871. PR: 196361 Submitted by: Erick Turnquist <jhujhiti@adjectivism.org> Reported by: Jason Healy <jhealy@logn.net> Reviewed by: asomers MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D9451	2017-03-17 16:50:37 +00:00
eri	f2a480c25c	The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Reviewed by: adrian, aw Approved by: ae (mentor) Sponsored by: rsync.net Differential Revision: D9235	2017-03-06 04:01:58 +00:00
imp	7e6cabd06e	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
ae	0969d8271c	When IPv6 fragments reassembly is complete, update mbuf's csum_data and csum_flags using information from all fragments. This fixes dropping of reassembled packets due to wrong checksum when the IPv6 checksum offloading is enabled on a network card. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-02-28 22:58:19 +00:00
ae	5a443cfa6c	Remove IPsec related PCB code from SCTP. The inpcb structure has inp_sp pointer that is initialized by ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec security policies associated with a specific socket. An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options to configure these security policies. Then ip[6]_output() uses inpcb pointer to specify that an outgoing packet is associated with some socket. And IPSEC_OUTPUT() method can use a security policy stored in the inp_sp. For inbound packet the protocol-specific input routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms to inbound security policy configured in the inpcb. SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends packets. Thus IPSEC_OUTPUT() method does not consider such packets as associated with some socket and can not apply security policies from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY() method is called from protocol-specific input routine, it can specify inpcb pointer and associated with socket inbound policy will be checked. But there are two problems: 1. Such check is asymmetric, becasue we can not apply security policy from inpcb for outgoing packet. 2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and access to inp_sp is protected. But for SCTP this is not correct, becasue SCTP uses own locks to protect inpcb. To fix these problems remove IPsec related PCB code from SCTP. This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options will be not applicable to SCTP sockets. To be able correctly check inbound security policies for SCTP, mark its protocol header with the PR_LASTHDR flag. Reported by: tuexen Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D9538	2017-02-13 11:37:52 +00:00
eri	bb348ba959	Committed without approval from mentor. Reported by: gnn	2017-02-12 06:56:33 +00:00
eri	8e401abc42	Use proper value for socket option on IPv6 Reported-by: ohartmann@walstatt.org	2017-02-10 06:20:27 +00:00
eri	6898c4334b	Revert r313527 Heh svn is not git	2017-02-10 05:58:16 +00:00
eri	b429db62bc	Correct missed variable name. Reported-by: ohartmann@walstatt.org	2017-02-10 05:51:39 +00:00
eri	ed45b31494	The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD. The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application. This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets. Sponsored-by: rsync.net Differential Revision: D9235 Reviewed-by: adrian	2017-02-10 05:16:14 +00:00
ae	0fb6ad528e	Merge projects/ipsec into head/. Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352	2017-02-06 08:49:57 +00:00
avos	91fc509b91	Garbage collect IFT_IEEE80211 (but leave the define for possible reuse) This interface type ("a parent interface of wlanX") is not used since r287197 Reviewed by: adrian, glebius Differential Revision: https://reviews.freebsd.org/D9308	2017-01-28 17:08:40 +00:00
loos	5bd3158a3b	After the in_control() changes in r257692, an existing address is (intentionally) deleted first and then completely added again (so all the events, announces and hooks are given a chance to run). This cause an issue with CARP where the existing CARP data structure is removed together with the last address for a given VHID, which will cause a subsequent fail when the address is later re-added. This change fixes this issue by adding a new flag to keep the CARP data structure when an address is not being removed. There was an additional issue with IPv6 CARP addresses, where the CARP data structure would never be removed after a change and lead to VHIDs which cannot be destroyed. Reviewed by: glebius Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2017-01-25 19:04:08 +00:00
hselasky	efa6326974	Implement kernel support for hardware rate limited sockets. - Add RATELIMIT kernel configuration keyword which must be set to enable the new functionality. - Add support for hardware driven, Receive Side Scaling, RSS aware, rate limited sendqueues and expose the functionality through the already established SO_MAX_PACING_RATE setsockopt(). The API support rates in the range from 1 to 4Gbytes/s which are suitable for regular TCP and UDP streams. The setsockopt(2) manual page has been updated. - Add rate limit function callback API to "struct ifnet" which supports the following operations: if_snd_tag_alloc(), if_snd_tag_modify(), if_snd_tag_query() and if_snd_tag_free(). - Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT flag, which tells if a network driver supports rate limiting or not. - This patch also adds support for rate limiting through VLAN and LAGG intermediate network devices. - How rate limiting works: 1) The userspace application calls setsockopt() after accepting or making a new connection to set the rate which is then stored in the socket structure in the kernel. Later on when packets are transmitted a check is made in the transmit path for rate changes. A rate change implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the destination network interface, which then sets up a custom sendqueue with the given rate limitation parameter. A "struct m_snd_tag" pointer is returned which serves as a "snd_tag" hint in the m_pkthdr for the subsequently transmitted mbufs. 2) When the network driver sees the "m->m_pkthdr.snd_tag" different from NULL, it will move the packets into a designated rate limited sendqueue given by the snd_tag pointer. It is up to the individual drivers how the rate limited traffic will be rate limited. 3) Route changes are detected by the NIC drivers in the ifp->if_transmit() routine when the ifnet pointer in the incoming snd_tag mismatches the one of the network interface. The network adapter frees the mbuf and returns EAGAIN which causes the ip_output() to release and clear the send tag. Upon next ip_output() a new "snd_tag" will be tried allocated. 4) When the PCB is detached the custom sendqueue will be released by a non-blocking ifp->if_snd_tag_free() call to the currently bound network interface. Reviewed by: wblock (manpages), adrian, gallatin, scottl (network) Differential Revision: https://reviews.freebsd.org/D3687 Sponsored by: Mellanox Technologies MFC after: 3 months	2017-01-18 13:31:17 +00:00
markj	85f660ec50	Improve some of the sysctl descriptions added in r299827. Submitted by: Marie Helene Kvello-Aune <marieheleneka@gmail.com> (original version) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D5336	2017-01-16 19:35:19 +00:00
sobomax	701697521c	Add a new socket option SO_TS_CLOCK to pick from several different clock sources to return timestamps when SO_TIMESTAMP is enabled. Two additional clock sources are: o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME); o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC). In addition to this, this option provides unified interface to get bintime (equivalent of using SO_BINTIME), except it also supported with IPv6 where SO_BINTIME has never been supported. The long term plan is to depreciate SO_BINTIME and move everything to using SO_TS_CLOCK. Idea for this enhancement has been briefly discussed on the Net session during dev summit in Ottawa last June and the general input was positive. This change is believed to benefit network benchmarks/profiling as well as other scenarios where precise time of arrival measurement is necessary. There are two regression test cases as part of this commit: one extends unix domain test code (unix_cmsg) to test new SCM_XXX types and another one implementis totally new test case which exchanges UDP packets between two processes using both conventional methods (i.e. calling clock_gettime(2) before recv(2) and after send(2)), as well as using setsockopt()+recv() in receive path. The resulting delays are checked for sanity for all supported clock types. Reviewed by: adrian, gnn Differential Revision: https://reviews.freebsd.org/D9171	2017-01-16 17:46:38 +00:00
markj	9c324f0735	Release the ND6 list lock before making a prefix off-link in nd6_timer(). Reported by: Jim <BM-2cWfdfG5CJsquqkJyry7hZT9LypbSEWEkQ@bitmessage.ch> X-MFC With: r306829	2017-01-08 18:46:00 +00:00
tuexen	6266aedc70	Whitespace changes. The toolchain for processing the sources has been updated. No functional change. MFC after: 3 days	2016-12-26 11:06:41 +00:00
markj	dfc508adea	Remove a bogus KASSERT from nd6_prefix_unlink(). The caller may unlink a prefix before purging referencing addresses. An identical assertion in nd6_prefix_del() verifies that the addresses are purged before the prefix is freed. PR: 215372 X-MFC With: r306829	2016-12-19 19:21:28 +00:00
ae	2c8179e485	ip[6]_tryforward does inbound and outbound packet firewall processing. This can lead to change of mbuf pointer (packet filter could do m_pullup(), NAT, etc). Also in case of change of destination address, tryforward can decide that packet should be handled by local system. In this case modified mbuf can be returned to the ip[6]_input(). To handle this correctly, check M_FASTFWD_OURS flag after return from ip[6]_tryforward. And if it is present, update variables that depend from mbuf pointer and skip another inbound firewall processing. No objection from: #network MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D8764	2016-12-19 11:02:49 +00:00
ae	55258bdaf6	Modify IPv6 statistic accounting in ip6_input(). Add rcvif local variable to keep inbound interface pointer. Count ifs6_in_discard errors in all "goto bad" cases. Now it will count errors even if mbuf was freed. Modify all places where m->m_pkthdr.rcvif is used to use local rcvif variable. Obtained from: Yandex LLC MFC after: 1 month	2016-12-12 11:26:59 +00:00
ae	d9654156fd	Add ip6_tryforward() - a run to completion forwarding implementation for IPv6. It gets performance benefits from reduced number of checks. It doesn't copy mbuf to be able send ICMPv6 error message, because it keeps mbuf unchanged until the moment, when the route decision has been made. It doesn't do IPsec checks, and when some IPsec security policies present, ip6_input() uses normal slow path. Reviewed by: bz, gnn Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D8527	2016-12-12 10:57:32 +00:00
tuexen	ae1856036a	Whitespace changes. The tools using to generate the sources has been updated and produces different whitespaces. Commit this seperately to avoid intermixing these with real code changes. MFC after: 3 days	2016-12-06 10:21:25 +00:00
tuexen	219874f57e	Make ICMPv6 hard error handling for TCP consistent with the ICMPv4 handling. Ensure that: * Protocol unreachable errors are handled by indicating ECONNREFUSED to the TCP user for both IPv4 and IPv6. These were ignored for IPv6. * Communication prohibited errors are handled by indicating ECONNREFUSED to the TCP user for both IPv4 and IPv6. These were ignored for IPv6. * Hop Limited exceeded errors are handled by indicating EHOSTUNREACH to the TCP user for both IPv4 and IPv6. For IPv6 the TCP connected was dropped but errno wasn't set. Reviewed by: gallatin, rrs MFC after: 1 month Sponsored by: Netflix Differential Revision: 7904	2016-10-21 10:32:57 +00:00
gnn	8c492572ae	Limit the number of mbufs that can be allocated for IPV6_2292PKTOPTIONS (and IPV6_PKTOPTIONS). PR: 100219 Submitted by: Joseph Kong MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D5157	2016-10-17 23:25:31 +00:00
glebius	444f6777c7	- Revert r300854, r303657 which tried to fix regression from r297225. - Fix the regression proper way using RO_RTFREE(). Submitted by: ae	2016-10-13 20:15:47 +00:00
markj	0e9bbe2171	Lock the ND prefix list and add refcounting for prefixes. This change extends the nd6 lock to protect the ND prefix list as well as the list of advertising routers associated with each prefix. To handle cases where the nd6 lock must be dropped while iterating over either the prefix or default router lists, a generation counter is used to track modifications to the lists. Additionally, a new mutex is used to serialize prefix on-link/off-link transitions. This mutex must be acquired before the nd6 lock and is held while updating the routing table in nd6_prefix_onlink() and nd6_prefix_offlink(). Reviewed by: ae, tuexen (SCTP bits) Tested by: Jason Wolfe <jason@llnw.com>, Larry Rosenman <ler@lerctr.org> MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D8125	2016-10-07 21:10:53 +00:00
markj	671e7fcca9	Reduce the number of conditional statements in nd6_prefix_onlink(). MFC after: 1 week	2016-10-07 21:03:18 +00:00
markj	d443c7f48a	Combine several checks in nd6_prefix_offlink() into one. MFC after: 1 week	2016-10-07 21:02:30 +00:00
markj	87c0df02e3	Fix whitespace around prototypes in nd6_rtr.c. MFC after: 1 week	2016-10-07 00:36:18 +00:00
markj	9346a8b630	Fix a typo. MFC after: 1 week	2016-10-07 00:35:28 +00:00
markj	71534a1a0c	Shorten and simplify some of the loops in pfxlist_onlink_check(). No functional change intended. MFC after: 1 week	2016-10-07 00:34:57 +00:00
markj	149b5ce5db	Use a const reference to prefixes in nd6_is_new_addr_neighbor(). MFC after: 1 week	2016-10-07 00:26:36 +00:00
markj	b0fce0526c	nd6_dad_timer(): don't assert that the address is tentative. It appears that this assertion can be tripped in some cases when multiple interfaces are on the same link. Until this is resolved, revert a part of r306305 and simply log a message if the DAD timer fires on a non-tentative address. Reported by: jhb X-MFC With: r306305	2016-10-01 01:30:34 +00:00
ae	f7637dcb59	Fix bug introduced in r274300. In icmp6_reflect() use original source address of erroneous packet as destination address for source selection algorithm when original destination address is not one of our own. Reported by: Mark Kamichoff <prox at prolixium com> Tested by: Mark Kamichoff <prox at prolixium com> MFC after: 1 week	2016-09-29 19:57:37 +00:00
markj	67747e91f9	Convert checks in nd6_dad_start() and nd6_dad_timer() to assertions. In particular, these functions can assume they are operating on tentative addresses. MFC after: 2 weeks	2016-09-24 21:40:24 +00:00
markj	e89e3efaa6	Rename ndpr_refcnt to ndpr_addrcnt. This field counts derived addresses and is not a true refcount for prefix objects, so the previous name was misleading. MFC after: 1 week	2016-09-24 01:14:25 +00:00
markj	bc48ae0071	Reduce code duplication around NDP message handlers in icmp6_input(). No functional change intended. MFC after: 2 weeks	2016-09-20 18:08:17 +00:00
kevlo	518bc28463	Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878	2016-09-15 07:41:48 +00:00
karels	821d0f98a3	Fix L2 caching for UDP over IPv6 ip6_output() was missing cache invalidation code analougous to ip_output.c. r304545 disabled L2 caching for UDP/IPv6 as a workaround. This change adds the missing cache invalidation code and reverts r304545. Reviewed by: gnn Approved by: gnn (mentor) Tested by: peter@, Mike Andrews MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D7591	2016-08-24 00:52:30 +00:00
bz	55cbdc7ad3	Remove the kernel optoion for IPSEC_FILTERTUNNEL, which was deprecated more than 7 years ago in favour of a sysctl in r192648.	2016-08-21 18:55:30 +00:00
karels	2fb476f05a	Disable L2 caching for UDP over IPv6 The ip6_output routine is missing L2 cache invalication as done in ip_output. Even with that code, some problems with UDP over IPv6 have been reported. Diabling L2 cache for that problem works around the problem for now. PR: 211872 211926 Reviewed by: gnn Approved by: gnn (mentor) MFC after: immediate	2016-08-20 20:46:53 +00:00
ae	8c03d2551f	Add ipfw_nat64 module that implements stateless and stateful NAT64. The module works together with ipfw(4) and implemented as its external action module. Stateless NAT64 registers external action with name nat64stl. This keyword should be used to create NAT64 instance and to address this instance in rules. Stateless NAT64 uses two lookup tables with mapped IPv4->IPv6 and IPv6->IPv4 addresses to perform translation. A configuration of instance should looks like this: 1. Create lookup tables: # ipfw table T46 create type addr valtype ipv6 # ipfw table T64 create type addr valtype ipv4 2. Fill T46 and T64 tables. 3. Add rule to allow neighbor solicitation and advertisement: # ipfw add allow icmp6 from any to any icmp6types 135,136 4. Create NAT64 instance: # ipfw nat64stl NAT create table4 T46 table6 T64 5. Add rules that matches the traffic: # ipfw add nat64stl NAT ip from any to table(T46) # ipfw add nat64stl NAT ip from table(T64) to 64:ff9b::/96 6. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96 via NAT64 host. Stateful NAT64 registers external action with name nat64lsn. The only one option required to create nat64lsn instance - prefix4. It defines the pool of IPv4 addresses used for translation. A configuration of instance should looks like this: 1. Add rule to allow neighbor solicitation and advertisement: # ipfw add allow icmp6 from any to any icmp6types 135,136 2. Create NAT64 instance: # ipfw nat64lsn NAT create prefix4 A.B.C.D/28 3. Add rules that matches the traffic: # ipfw add nat64lsn NAT ip from any to A.B.C.D/28 # ipfw add nat64lsn NAT ip6 from any to 64:ff9b::/96 4. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96 via NAT64 host. Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D6434	2016-08-13 16:09:49 +00:00
stevek	12aa5d6087	Move IPv4-specific jail functions to new file netinet/in_jail.c _prison_check_ip4 renamed to prison_check_ip4_locked Move IPv6-specific jail functions to new file netinet6/in6_jail.c _prison_check_ip6 renamed to prison_check_ip6_locked Add appropriate prototypes to sys/sys/jail.h Adjust kern_jail.c to call prison_check_ip4_locked and prison_check_ip6_locked accordingly. Add netinet/in_jail.c and netinet6/in6_jail.c to the list of files that need to be built when INET and INET6, respectively, are configured in the kernel configuration file. Reviewed by: jtl Approved by: sjg (mentor) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D6799	2016-08-09 02:16:21 +00:00
ae	a937127683	Fix NULL pointer dereference. ro pointer can be NULL when IPSec consumes mbuf. PR: 211486 MFC after: 3 days	2016-08-02 12:18:06 +00:00
gallatin	11f6fcfd28	Rework IPV6 TCP path MTU discovery to match IPv4 - Re-write tcp_ctlinput6() to closely mimic the IPv4 tcp_ctlinput() - Now that tcp_ctlinput6() updates t_maxseg, we can allow ip6_output() to send TCP packets without looking at the tcp host cache for every single transmit. - Make the icmp6 code mimic the IPv4 code & avoid returning PRC_HOSTDEAD because it is so expensive. Without these changes in place, every TCP6 pmtu discovery or host unreachable ICMP resulted in a call to in6_pcbnotify() which walks the tcbinfo table with the write lock held. Because the tcbinfo table is shared between IPv4 and IPv6, this causes huge scalabilty issues on servers with lots of (~100K) TCP connections, to the point where even a small percent of IPv6 traffic had a disproportionate impact on overall throughput. Reviewed by: bz, rrs, ae (all earlier versions), lstewart (in Netflix's tree) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D7272	2016-08-01 17:02:21 +00:00
stevek	3acd4a25e6	Prepare for network stack as a module - Move cr_canseeinpcb to sys/netinet/in_prot.c in order to separate the INET and INET6-specific code from the rest of the prot code (It is only used by the network stack, so it makes sense for it to live with the other network stack code.) - Move cr_canseeinpcb prototype from sys/systm.h to netinet/in_systm.h - Rename cr_seeotheruids to cr_canseeotheruids and cr_seeothergids to cr_canseeothergids, make them non-static, and add prototypes (so they can be seen/called by in_prot.c functions.) - Remove sw_csum variable from ip6_forward in ip6_forward.c, as it is an unused variable. Reviewed by: gnn, jtl Approved by: sjg (mentor) Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D2901	2016-07-27 20:34:09 +00:00
karels	e31cf93024	Fix per-connection L2 caching in fast path r301217 re-added per-connection L2 caching from a previous change, but it omitted caching in the fast path. Add it. Reviewed By: gallatin Approved by: gnn (mentor) Differential Revision: https://reviews.freebsd.org/D7239	2016-07-22 02:11:49 +00:00
ae	2c47439b3f	Add ipfw_nptv6 module that implements Network Prefix Translation for IPv6 as defined in RFC 6296. The module works together with ipfw(4) and implemented as its external action module. When it is loaded, it registers as eaction and can be used in rules. The usage pattern is similar to ipfw_nat(4). All matched by rule traffic goes to the NPT module. Reviewed by: hrs Obtained from: Yandex LLC MFC after: 1 month Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D6420	2016-07-18 19:46:31 +00:00
ae	7a18a4b316	Add net.inet6.ip6.intr_queue_maxlen sysctl. It can be used to change netisr queue limit for IPv6 at runtime. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2016-07-15 17:09:30 +00:00
dim	8a8ea0466a	Fix a page fault in ip6_setpktopt(), occurring when the pflog module is loaded, and syncthing is started, which uses setsockopt(IPV6_PKGINFO). This is because pflog interfaces do not normally have an IPv6 address, causing the ND_IFINFO() macro to dereference a NULL pointer. Reviewed by: ae PR: 210943 MFC after: 3 days	2016-07-13 19:41:19 +00:00
tuexen	c2c8b26056	Don't consider the socket when processing an incoming ICMP/ICMP6 packet, which was triggered by an SCTP packet. Whether a socket exists, is just not relevant. Approved by: re (kib) MFC after: 1 week	2016-06-23 09:13:15 +00:00
ae	7cdbaef028	Fix the NULL pointer dereference for unresolved link layer entries in the netinet6 code. Copy link layer address only when corresponding entry has LLE_VALID flag. PR: 210379 Approved by: re (kib)	2016-06-22 11:29:21 +00:00
bz	7a1c0b1ad1	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
pfg	0f12e1993e	Remove the SIOCSIFALIFETIME_IN6 ioctl. The SIOCSIFALIFETIME_IN6 provided by the kame project is unused, it can't really be used safely and has been completely removed from NetBSD and OpenBSD. Obtained from: NetBSD (kern/35897) PR: 210148 (exp-run) Reviewed by: ae, hrs Relnotes: yes Approved by: re (glebius) Differential Revision: https://reviews.freebsd.org/D5491	2016-06-13 22:31:16 +00:00
ae	48b268cd67	Cleanup unneded include "opt_ipfw.h". It was used for conditional build IPFIREWALL_FORWARD support. But IPFIREWALL_FORWARD option was removed a long time ago.	2016-06-09 05:48:34 +00:00
bz	5baf25edd0	Make KASSERT message more useful by printing the variables on which we assert. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-06 22:34:12 +00:00
bz	aaac6c5a13	Move the callout_reset() to the end of the work not having it stick before we do anything. Obtained from: projects/vnet MFC after: 2 week Sponsored by: The FreeBSD Foundation	2016-06-06 14:01:09 +00:00
bz	69cdb2137c	Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET. Add accessor functions to toggle the state per VNET. The base system (vnet0) will always enable itself with the normal registration. We will share the registered protocol handlers in all VNETs minimising duplication and management. Upon disabling netisr processing for a VNET drain the netisr queue from packets for that VNET. Update netisr consumers to (de)register on a per-VNET start/teardown using VNET_SYS(UN)INIT functionality. The change should be transparent for non-VIMAGE kernels. Reviewed by: gnn (, hiren) Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6691	2016-06-03 13:57:10 +00:00
gnn	d75e0c471e	This change re-adds L2 caching for TCP and UDP, as originally added in D4306 but removed due to other changes in the system. Restore the llentry pointer to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as appropriate. Submitted by: Mike Karels Differential Revision: https://reviews.freebsd.org/D6262	2016-06-02 17:51:29 +00:00
markj	8b17712ca6	Exploit r301213 to fix in6 ifaddr locking in pfxlist_onlink_check(). Reviewed by: ae, hrs MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6639	2016-06-02 17:21:57 +00:00
markj	71ff51c027	Always start IPv6 DAD asynchronously. Otherwise we transmit the first neighbour solicitation in the context of the caller of nd6_dad_start(), which can easily result in lock recursion. When DAD is to be started after some delay, we send the first NS from the DAD callout handler, so just change the implementation to do this in the non-delayed case as well. Reviewed by: ae, hrs MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6639	2016-06-02 17:17:15 +00:00
bz	fac944a70a	The pr_destroy field does not allow us to run the teardown code in a specific order. VNET_SYSUNINITs however are doing exactly that. Thus remove the VIMAGE conditional field from the domain(9) protosw structure and replace it with VNET_SYSUNINITs. This also allows us to change some order and to make the teardown functions file local static. Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use internally. Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g., for pfil consumers (firewalls), partially for this commit and for others to come. Reviewed by: gnn, tuexen (sctp), jhb (kernel.h) Obtained from: projects/vnet MFC after: 2 weeks X-MFC: do not remove pr_destroy Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6652	2016-06-01 10:14:04 +00:00
tuexen	525f16040f	Add PR_CONNREQUIRED for SOCK_STREAM sockets using SCTP. This is required to signal connetion setup on non-blocking sockets via becoming writable. This still allows for implicit connection setup. MFC after: 1 week	2016-05-30 18:24:23 +00:00
glebius	bbfa6d2853	Plug route reference underleak that happens with FLOWTABLE after r297225. Submitted by: Mike Karels <mike karels.net>	2016-05-27 17:31:02 +00:00
markj	9cb221bdb0	Mark the prefix and default router list sysctl handlers MPSAFE. MFC after: 2 weeks	2016-05-23 20:18:11 +00:00
markj	86c15ee95b	Acquire the nd6 lock in the prefix list sysctl handler. The nd6 lock will be used to synchronize access to the NDP prefix list. MFC after: 2 weeks Tested by: Jason Wolfe (as part of a larger change)	2016-05-23 20:15:08 +00:00
ae	78161c3462	Remove ip6 adjusting from the place where pointer couldn't be changed. And add comment after calling PFIL hooks, where it could be changed.	2016-05-20 12:17:40 +00:00
ae	eef5384953	Remove ip6 pointer initialization and strange check from the beginning of ip6_output(). It isn't used until the first time adjusted. Remove the comment about adjusting where it is actually initialized.	2016-05-20 12:09:10 +00:00
markj	a09fd6097b	Move IPv6 malloc tag definitions into the IPv6 code.	2016-05-20 04:45:08 +00:00
ae	0412106b46	Since PFIL can change destination address, use its always actual value from mbuf when calculating path mtu. Remove now unused finaldst variable. Also constify dst argument in ip6_getpmtu() and ip6_getpmtu_ctl(). Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-05-19 12:45:20 +00:00
ae	d1f53cbfea	Call RO_RTFREE() when we have detected the change of destination address, otherwise the old route will be used with new destination. MFC after: 1 week	2016-05-17 14:06:55 +00:00
markj	a0640b8262	Use Node Information flag names instead of hard-coding their values. MFC after: 1 week	2016-05-15 03:22:13 +00:00
markj	43beb421e5	Add sysctl descriptions for net.inet6.ip6 and net.inet6.icmp6. icmp6.redirtimeout, icmp6.nd6_maxnudhint and ip6.rr_prune are left undocumented as they appear to have no effect. Some existing sysctl descriptions were modified for consistency and style, and the ip6.tempvltime and ip6.temppltime handlers were rewritten to be a bit simpler and to avoid setting the sysctl value before validating it. MFC after: 3 weeks	2016-05-15 03:18:03 +00:00
markj	c5c6630f07	Remove an always-false error check in the AIFADDR_IN6 handler. CID: 1250792 MFC after: 1 week	2016-05-15 03:01:40 +00:00
markj	4b9a93dfb3	Remove obsolescent comments from nd6_purge(). MFC after: 1 week	2016-05-09 23:43:12 +00:00
markj	94a1c25725	Clean up callers of nd6_prelist_add(). nd6_prelist_add() sets newp if and only if it is successful, so there's no need for code that handles the case where the return value is 0 and newp == NULL. Fix some style bugs in nd6_prelist_add() while here. MFC after: 1 week	2016-05-07 03:41:29 +00:00
markj	557551b31f	Remove two useless local variables from prelist_update(). MFC after: 1 week	2016-05-07 03:32:29 +00:00
pfg	d9c9113377	sys/net*: minor spelling fixes. No functional change.	2016-05-03 18:05:43 +00:00
tuexen	a750782f5b	When a client uses UDP encapsulation and lists IP addresses in the INIT chunk, enable UDP encapsulation for all those addresses. This helps clients using a userland stack to support multihoming if they are not behind a NAT. MFC after: 1 week	2016-05-01 21:48:55 +00:00
tuexen	3e7292aa0b	Use correct order of source and destination address and port.	2016-04-29 20:13:35 +00:00
rrs	64e463c093	Complete the UDP tunneling of ICMP msgs to those protocols interested in having tunneled UDP and finding out about the ICMP (tested by Michael Tuexen with SCTP.. soon to be using this feature). Differential Revision: http://reviews.freebsd.org/D5875	2016-04-28 15:53:10 +00:00
cem	23a478288f	in_lltable_alloc and in6 copy: Don't leak LLE in error path Fix a memory leak in error conditions introduced in r292978. Reported by: Coverity CIDs: 1347009, 1347010 Sponsored by: EMC / Isilon Storage Division	2016-04-26 23:13:48 +00:00
loos	cfc8d71705	Fixes the comment to reflect the code. Sponsored by: Rubicon Communications (Netgate)	2016-04-25 23:12:39 +00:00
pfg	32dcf3933a	Indentation issues. Contract some lines leftover from r298310. Mea culpa.	2016-04-20 16:19:44 +00:00
pfg	a7d40a88c9	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
tuexen	f78898772a	Address issues found by the XCode code analyzer.	2016-04-18 20:16:41 +00:00
tuexen	42159e8af3	Fix the ICMP6 handling for SCTP. Keep the IPv4 code in sync. MFC after: 1 week	2016-04-16 21:34:49 +00:00
pfg	12232f8463	sys/net* : for pointers replace 0 with NULL. Mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 17:30:33 +00:00
ae	3f81fe2ce0	Fix regression introduced in r296986. Currently we don't keep zoneid in in6_ifaddr structure, because there is still some code, that doesn't properly initialize sin6_scope_id, but some functions use sa_equal() for addresses comparison. sa_equal() compares full sockaddr_in6 structures and such comparison will fail. For now use zero zoneid in in6ifa_ifwithaddr(). It is safe, because used address is in embedded form. In future we will use zoneid, so mark it with XXX comment. Reported by: kp Tested by: kp	2016-04-08 11:13:24 +00:00
gnn	43026a8c5f	Unbreak the RSS/PCBGROUp build.	2016-03-31 00:53:23 +00:00
markj	4081472216	Fix the lladdr copy in in6_lltable_dump_entry() after r292978. This bug caused "ndp -a" to show the wrong link layer address for neighbour cache entries. PR: 208067	2016-03-30 00:03:59 +00:00
markj	db6b45eb6c	Modify nd6_llinfo_timer() to acquire the nd6 lock before the LLE lock. When expiring a neighbour cache entry we may need to look up the associated default router, which requires the nd6 read lock. To avoid an LOR, the nd6 lock should be acquired first. X-MFC-With: r296063 Tested by: Larry Rosenman <ler@lerctr.org> (previous revision)	2016-03-29 19:23:00 +00:00
gnn	c3d5404bbe	FreeBSD previously provided route caching for TCP (and UDP). Re-add route caching for TCP, with some improvements. In particular, invalidate the route cache if a new route is added, which might be a better match. The cache is automatically invalidated if the old route is deleted. Submitted by: Mike Karels Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4306	2016-03-24 07:54:56 +00:00
bz	d4cf530887	Mfp4 @180378: Factor out nd6 and in6_attach initialization to their own files. Also move destruction into those files though still called from the central initialization. Sponsored by: CK Software GmbH Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D5033	2016-03-22 15:43:47 +00:00
markj	8c873a2b1b	Modify defrouter_remove() to perform the router lookup before removal. This allows some simplification of its callers. No functional change intended. Tested by: Larry Rosenman (as part of a larger change) MFC after: 1 month	2016-03-17 19:01:44 +00:00
ae	2d399eaa6e	Reduce the number of local variables. Remove redundant check that inp pointer isn't NULL, it is safe, because we are handling IPV6_PKTINFO socket option in this block of code. Also, use in6ifa_withaddr() instead of ifa_withaddr().	2016-03-17 11:10:44 +00:00
ae	a3a4fe0039	Change in6_selectsrc() to allow usage of non-local IPv6 addresses in IPV6_PKTINFO ancillary data when IPV6_BINDANY socket option is set. Submitted by: n_hibma MFC after: 2 weeks	2016-03-17 10:59:30 +00:00
glebius	163857deb4	New way to manage reference counting of mbuf external storage. The m_ext.ext_cnt pointer becomes a union. It can now hold the refcount value itself. To tell that m_ext.ext_flags flag EXT_FLAG_EMBREF is used. The first mbuf to attach a cluster stores the refcount. The further mbufs to reference the cluster point at refcount in the first mbuf. The first mbuf is freed only when the last reference is freed. The benefit over refcounts stored in separate slabs is that now refcounts of different, unrelated mbufs do not share a cache line. For EXT_EXTREF mbufs the zone_ext_refcnt is no longer needed, and m_extadd() becomes void, making widely used M_EXTADD macro safe. For EXT_SFBUF mbufs the sf_ext_ref() is removed, which was an optimization exactly against the cache aliasing problem with regular refcounting. Discussed with: rrs, rwatson, gnn, hiren, sbruno, np Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D5396 Sponsored by: Netflix	2016-03-01 00:17:14 +00:00
markj	9524b1b571	Lock the NDP default router list and count defrouter references. This addresses a number of race conditions that can cause crashes as a result of unsynchronized access to the list. PR: 206904 Tested by: Larry Rosenman <ler@lerctr.org>, Kevin Bowling <kevin.bowling@kev009.com> MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D5315	2016-02-25 20:12:05 +00:00
tuexen	63d9199ac6	Don't leak an address in an error path. CID: 1351729 MFC after: 3 days	2016-02-23 18:50:34 +00:00
tuexen	3c0af23540	Fix reporting of mapped addressed in getpeername() and getsockname() for IPv6 SCTP sockets. This bugs were found because of an issue reported by PVS / D5245.	2016-02-18 21:05:04 +00:00
markj	2bf18fe55d	Release the ref acquired in nd6_dad_find() if DAD is already in progress. MFC after: 1 week	2016-02-18 00:00:51 +00:00
markj	bbc632b86e	Use pfxrtr_del() instead of freeing advertising routers directly. MFC after: 1 week	2016-02-17 23:55:24 +00:00
markj	7f716b8a59	Remove a prototype for the non-existent prelist_del(). MFC after: 1 week	2016-02-17 23:53:24 +00:00
glebius	c905b9e75c	Ternary operator has lower priority than OR. Found by: PVS-Studio	2016-02-17 21:17:14 +00:00
markj	67a2e23340	Add a missing newline to a log message. MFC after: 1 week	2016-02-12 21:17:00 +00:00
markj	9ec0a4438f	Rename the flags field of struct nd_defrouter to "raflags". This field contains the flags inherited from the corresponding router advertisement message and is not for storing private state. MFC after: 1 week	2016-02-12 21:15:57 +00:00
markj	e659a689f9	Simplify defrtrlist_update() slightly in preparation for future changes. No functional change intended. MFC after: 1 week	2016-02-12 21:06:48 +00:00
markj	151140ce44	Remove a bogus comment from nd6_na_input(). The splnet() call that it refers to has been removed, and a lock for the default router list is in fact needed. MFC after: 1 week	2016-02-12 21:01:53 +00:00
markj	88e4f89c6d	Remove superfluous return statements from the neighbour discovery code. MFC after: 1 week	2016-02-12 20:55:22 +00:00
markj	0309ab8781	Fix style around allocations from M_IP6NDP. - Don't cast the return value of malloc(9). - Use M_ZERO instead of explicitly calling bzero(9). MFC after: 1 week	2016-02-12 20:52:53 +00:00
markj	7326876626	Remove some unreferenced NDP debug variable definitions. MFC after: 1 week	2016-02-12 20:46:53 +00:00
dteske	51b30e8967	Merge SVN r295220 (bz) from projects/vnet/ Fix a panic that occurs when a vnet interface is unavailable at the time the vnet jail referencing said interface is stopped. Sponsored by: FIS Global, Inc.	2016-02-11 17:07:19 +00:00
glebius	306a6faf84	These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h	2016-02-01 17:41:21 +00:00
melifaro	23582454c7	MFP r287070,r287073: split radix implementation and route table structure. There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.	2016-01-25 06:33:15 +00:00
melifaro	962b1b7134	Fix rte refcount leak in ip6_forward(). Reviewed by: ae MFC after: 2 weeks Sponsored by: Yandex LLC	2016-01-20 11:25:30 +00:00
glebius	51f55053b6	Verify the packet length in sctp6_input(). The sctp6_ctlinput() function does not properly check the length of the packet it receives from the ICMP6 input routine. This means that an attacker can craft a packet that will cause a kernel panic. When the kernel receives an ICMP6 error message with one of the types/codes it handles, it calls icmp6_notify_error() to deliver it to the upper-level protocol. icmp6_notify_error() cycles through the extension headers (if any) to find the protocol number of the first non-extension header. It does NOT verify the length of the non-extension header. It passes information about the packet (including the actual packet) to the upper-level protocol's pr_ctlinput function. In the case of SCTP for IPv6, icmp6_notify_error() calls sctp6_ctlinput(). sctp6_ctlinput() assumes that the incoming packet contains a sufficiently-long SCTP header and calls m_copydata() to extract a copy of that header. In turn, m_copydata() assumes that the caller has already verified that the offset and length parameters are correct. If they are incorrect, it will dereference a NULL pointer and cause a kernel panic. In short, no one is sufficiently verifying the input, and the result is a kernel panic. Submitted by: jtl Security: SA-16:01.sctp	2016-01-14 10:11:10 +00:00
melifaro	7cc47d54cd	Bring RADIX_MPATH support to new routing KPI to ease migration. Move actual rte selection process from rtalloc_mpath_fib() to the rt_path_selectrte() function. Add public rt_mpath_select() to use in fibX_lookup_ functions.	2016-01-11 08:45:28 +00:00
melifaro	21632a9bd9	Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket(). in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and socket-less (ND code). The main reason for that change is inability to specify non-default FIB for callers w/o socket since (internally) inpcb is used to determine fib. As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc() static): 1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check along with returning hop limit when needed. 2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and pass IPv6 address w/ explicitly specified scope as separate argument. Reviewed by: ae (previous version)	2016-01-10 13:40:29 +00:00
melifaro	62e557108e	Do not hold ifaddr reference for the whole icmp6_reflect() exec time. Copy source address, calculate hlim and release refcount instead.	2016-01-10 11:59:55 +00:00
melifaro	fda8b5679f	Remove prefix check from in6_addroute(). This check was added in initial? netinet6/ import back in 1999 (r53541). It effectively became unnecessary after 'address/prefix clean-ups' KAME commit 90ff8792e676132096a440dd787f99a5a5860ee8 (github) in 2001 (merged to FreeBSD in r78064) where prefix check was added to nd6_prefix_onlink(). Similar IPv4 check (in_addroute() was added in r137628). Additionally, the right plance for this (or similar) check is the prefix addition code (nd6_prefix_onlink(), nd6_prefix_onlink_rtrequest(), in_addprefix() or rtinit()), but not the generic radix insert routine.	2016-01-09 11:41:37 +00:00
melifaro	14cf7637d1	Remove sys/eventhandler.h from net/route.h Reviewed by: ae	2016-01-09 09:34:39 +00:00
melifaro	4fe868c921	Finish r293098: make ip6_getpmtu() and ip6_getpmtu_ctl() use new routing API	2016-01-04 18:32:24 +00:00
melifaro	31d78f6810	Add rib_lookup_info() to provide API for retrieving individual route entries data in unified format. There are control plane functions that require information other than just next-hop data (e.g. individual rtentry fields like flags or prefix/mask). Given that the goal is to avoid rte reference/refcounting, re-use rt_addrinfo structure to store most rte fields. If caller wants to retrieve key/mask or gateway (which are sockaddrs and are allocated separately), it needs to provide sufficient-sized sockaddrs structures w/ ther pointers saved in passed rt_addrinfo. Convert: * lltable new records checks (in_lltable_rtcheck(), nd6_is_new_addr_neighbor(). * rtsock pre-add/change route check. * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because 1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should not be multiple host routes for such hosts 2) if we have multiple routes we should inspect them (which is not done). 3) the entire idea of abusing KRT as storage for ND proxy seems odd. Userland programs should be used for that purpose).	2016-01-04 15:03:20 +00:00
melifaro	113d546f8e	Remove 'struct route_int6' argument from in6_selectsrc() and in6_selectif(). The main task of in6_selectsrc() is to return IPv6 SAS (along with output interface used for scope checks). No data-path code uses route argument for caching. The only users are icmp6 (reflect code), ND6 ns/na generation code. All this fucntions are control-plane, so there is no reason to try to 'optimize' something by passing cached route into to ip6_output(). Given that, simplify code by eliminating in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is used only by in6_selectsrc(), eliminate its 'struct route_in6' argument, too. While here, reshape rte-related code inside in6_selectif() to free lookup result immediately after saving all the needed fields.	2016-01-03 10:43:23 +00:00
melifaro	c0fd3127f0	Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu(). Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to the caller. Currently, ip6_getpmtu() has 2 totally different use cases: 1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU and return it, w/o any reusability. 2) Actual ip6_output() data path where we (nearly) always use the provided route lookup data. If this data is not 'valid' we need to perform another lookup and save the result (which cannot be re-used by ip6_output()). Given that, handle 1) by calling separate function doing rte lookup itself. Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both ip6_getpmtu_ctl() and ip6_getpmtu(). For 2) instead of storing ref'ed rte, store mtu (the only needed data from the lookup result) inside newly-added ro_mtu field. 'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again by 4 bytes. New ro_mtu field will be used in other places like ip/tcp_output (EMSGSIZE handling from output routines). Reviewed by: ae	2016-01-03 09:54:03 +00:00
melifaro	eceeaeddc0	Use lltable_get_ifp() instead of direct access to lltable fields.	2016-01-01 12:35:33 +00:00
melifaro	93152c67c9	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102	2015-12-31 05:03:27 +00:00
jtl	3504bdc019	Add the appropriate case statement for IPV6_BINDMULTI so the option can be retrieved with getsockopt(). CID: 1229928 Differential Revision: https://reviews.freebsd.org/D4737 Reviewed by: adrian Sponsored by: Juniper Networks	2015-12-30 18:08:05 +00:00
bz	be33cc799d	This code is not in modules that need KPI stability so no need to use the wrapper functions as used in r252511. We can directly use the locking macros. Reviewed by: jtl, rwatson MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4731	2015-12-30 17:10:03 +00:00
wollman	fc57e5b82d	in6_if2idlen: treat bridge(4) interfaces like other Ethernet interfaces bridge(4) interfaces have an if_type of IFT_BRIDGE, rather than IFT_ETHER, even though they only support Ethernet-style links. This caused in6_if2idlen to emit an "unknown link type (209)" warning to the console every time it was called. Add IFT_BRIDGE to the case statement in the appropriate place, indicating that it uses the same IPv6 address format as other Ethernet-like interfaces. MFC after: 1 week	2015-12-28 18:29:47 +00:00
bz	bd04375ea4	Remove superfluous return (1) missed in r292601. Reported by: Matthew D. Fuller (fullermd over-yonder.net), Kevin Bowling (kevin.bowling kev009.com) MFC after: 13 days X-MFC with: r292601 Sponsored by: The FreeBSD Foundation	2015-12-23 10:23:47 +00:00
bz	982684552b	Since r256624 we've been leaking routing table allocations on vnet enabled jail shutdown. Call the provided cleanup routines for IP versions 4 and 6 to plug these leaks. Sponsored by: The FreeBSD Foundation MFC atfer: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4530	2015-12-22 14:53:19 +00:00
smh	45d5617154	Revert r292275 & r292379 glebius has concerns about these changes so reverting those can be discussed and addressed. Sponsored by: Multiplay	2015-12-17 14:41:30 +00:00
smh	0813e34dbc	Fix issues introduced by r292275 * Fix panic for etherswitches which don't have a LLADDR. * Disabled DELAY in unsolicited NDA, which needs further work. * Fixed missing DELAY in carp_send_na. * style(9) fix. Reported by: kp & melifaro X-MFC-With: r292275 MFC after: 1 month Sponsored by: Multiplay	2015-12-16 22:26:28 +00:00
melifaro	fb12a509fe	Provide additional lle data in IPv6 lltable dump used by ndp(8). Before the change, things like lle state were queried via SIOCGNBRINFO_IN6 by ndp(8) for _each_ lle entry in dump. This ioctl was added in 1999, probably to avoid touching rtsock code. This change maps SIOCGNBRINFO_IN6 data to standard rtsock dump the following way: expire (already) maps to rtm_rmx.rmx_expire isrouter -> rtm_flags & RTF_GATEWAY asked -> rtm_rmx.rmx_pksent state -> rtm_rmx.rmx_state (maps to rmx_weight via define) Reviewed by: ae	2015-12-16 10:14:16 +00:00
smh	864cf18128	Fix lagg failover due to missing notifications When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved. This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down. We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6. Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route. The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link Also removed unused param from lagg_port_state and added descriptions for the sysctls while here. PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111	2015-12-15 16:02:11 +00:00
kp	30ed5ce1ff	inet6: Do not assume every interface has ip6 enabled. Certain interfaces (e.g. pfsync0) do not have ip6 addresses (in other words, ifp->if_afdata[AF_INET6] is NULL). Ensure we don't panic when the MTU is updated. pfsync interfaces will never have ip6 support, because it's explicitly disabled in in6_domifattach(). PR: 205194 Reviewed by: melifaro, hrs Differential Revision: https://reviews.freebsd.org/D4522	2015-12-14 19:44:49 +00:00
melifaro	5acc305ae0	Remove LLE read lock from IPv6 fast path. LLE structure is mostly unchanged during its lifecycle: there are only 2 things relevant for fast path lookup code: 1) link-level address change. Since r286722, these updates are performed under AFDATA WLOCK. 2) Some sort of feedback indicating that this particular entry is used so we send NS to perform reachability verification instead of expiring entry. The only signal that is needed from fast path is something like binary yes/no. The latter is solved by the following changes: Special r_skip_req (introduced in D3688) value is used for fast path feedback. It is read lockless by fast path, but updated under req_mutex mutex. If this field is non-zero, then fast path will acquire lock and set it back to 0. After transitioning to STALE state, callout timer is armed to run each V_nd6_delay seconds to make sure that if packet was transmitted at the start of given interval, we would be able to switch to PROBE state in V_nd6_delay seconds as user expects. (in STALE state) timer is rescheduled until original V_nd6_gctimer expires keeping lle in STALE state (remaining timer value stored in lle_remtime). (in STALE state) timer is rescheduled if packet was transmitted less that V_nd6_delay seconds ago to make sure we transition to PROBE state exactly after V_n6_delay seconds. As a result, all packets towards lle in REACHABLE/STALE/PROBE states are handled by fast path without acquiring lle read lock. Differential Revision: https://reviews.freebsd.org/D3780	2015-12-13 07:39:49 +00:00
melifaro	802824b70a	Use correct lookup key for gif route lookups. This fixes r291993 change.	2015-12-09 22:09:33 +00:00
melifaro	2bb0e924cc	Make in_arpinput(), inp_lookup_mcast_ifp(), icmp_reflect(), ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(), in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api. Eliminate now-unused ip_rtaddr(). Fix lookup key fib6_lookup_nh_basic() which was lost diring merge. Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always return IPv6 destination address with embedded scope. Currently rw_gateway has it scope embedded, do the same for non-gatewayed destinations. Sponsored by: Yandex LLC	2015-12-09 11:14:27 +00:00
melifaro	ca13483a3c	Merge helper fib* functions used for basic lookups. Vast majority of rtalloc(9) users require only basic info from route table (e.g. "does the rtentry interface match with the interface I have?". "what is the MTU?", "Give me the IPv4 source address to use", etc..). Instead of hand-rolling lookups, checking if rtentry is up, valid, dealing with IPv6 mtu, finding "address" ifp (almost never done right), provide easy-to-use API hiding all the complexity and returning the needed info into small on-stack structure. This change also helps hiding route subsystem internals (locking, direct rtentry accesses). Additionaly, using this API improves lookup performance since rtentry is not locked. (This is safe, since all the rtentry changes happens under both radix WLOCK and rtentry WLOCK). Sponsored by: Yandex LLC	2015-12-08 10:50:03 +00:00
tuexen	23770ab942	Fix the allocation of outgoing streams: * When processing a cookie, use the number of streams announced in the INIT-ACK. * When sending an INIT-ACK for an existing association, use the value from the association, not from the end-point. MFC after: 1 week	2015-12-06 16:17:57 +00:00
ae	e9c29fe907	mld_v2_dispatch_general_query() is used by mld_fasttimo_vnet() to send a reply to the MLDv2 General Query. In case when router has a lot of multicast groups, the reply can take several packets due to MTU limitation. Also we have a limit MLD_MAX_RESPONSE_BURST == 4, that limits the number of packets we send in one shot. Then we recalculate the timer value and schedule the remaining packets for sending. The problem is that when we call mld_v2_dispatch_general_query() to send remaining packets, we queue new reply in the same mbuf queue. And when number of packets is bigger than MLD_MAX_RESPONSE_BURST, we get endless reply of MLDv2 reports. To fix this, add the check for remaining packets in the queue. PR: 204831 MFC after: 1 week Sponsored by: Yandex LLC	2015-12-01 11:17:41 +00:00
melifaro	e198456483	Add new rt_foreach_fib_walk_del() function for deleting route entries by filter function instead of picking into routing table details in each consumer. Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED user). This simplifies future nexthops/mulitipath changes and rtrequest1_fib() locking refactoring. Actual changes: Add "rt_chain" field to permit rte grouping while doing batched delete from routing table (thus growing rte 200->208 on amd64). Add "rti_filter" / "rti_filterdata" / "rti_spare" fields to rt_addrinfo to pass filter function to various routing subsystems in standard way. Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate rt_expunge().	2015-11-30 05:51:14 +00:00
ae	d81208c948	Overhaul if_enc(4) and make it loadable in run-time. Use hhook(9) framework to achieve ability of loading and unloading if_enc(4) kernel module. INET and INET6 code on initialization registers two helper hooks points in the kernel. if_enc(4) module uses these helper hook points and registers its hooks. IPSEC code uses these hhook points to call helper hooks implemented in if_enc(4).	2015-11-25 07:31:59 +00:00
cem	8922a29059	in6_mc_get: Fix recursion on if_addr_lock on malloc failure Analogously to r291040, in6_mc_get recurses on if_addr_lock if the M_NOWAIT allocation fails. The fix is the same. Suggested by: Andrey V. Elsukov Reviewed by: jhb (ip4 version) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4138 (ip4 version)	2015-11-19 00:27:26 +00:00
melifaro	2bf2184989	Bring back the ability of passing cached route via nd6_output_ifp().	2015-11-15 16:02:22 +00:00
rrs	dc494194a2	This fixes several places where callout_stops return is examined. The new return codes of -1 were mistakenly being considered "true". Callout_stop now returns -1 to indicate the callout had either already completed or was not running and 0 to indicate it could not be stopped. Also update the manual page to make it more consistent no non-zero in the callout_stop or callout_reset descriptions. MFC after: 1 Month with associated callout change.	2015-11-13 22:51:35 +00:00
melifaro	595bcb4ce1	Unify setting lladdr for AF_INET[6].	2015-11-07 11:12:00 +00:00
adrian	7ba24ae636	[netinet6]: Create a new IPv6 netisr which expects the frames to have been verified. This is required for fragments and encapsulated data (eg tunneling) to be redistributed to the RSS bucket based on the eventual IPv6 header and protocol (TCP, UDP, etc) header. * Add an mbuf tag with the state of IPv6 options parsing before the frame is queued into the direct dispatch handler; * Continue processing and complete the frame reception in the correct RSS bucket / netisr context. Testing results are in the phabricator review. Differential Revision: https://reviews.freebsd.org/D3563 Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn>	2015-11-06 23:07:43 +00:00
melifaro	4ea35c2935	Use m_cat() to reassembly IPv6 packets. Submitted by: jonloony_gmail.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3863	2015-10-27 22:11:09 +00:00
melifaro	e2681931e4	Invoke lle_event for new entry iff it has lladdr set.	2015-10-04 19:10:27 +00:00
melifaro	7867205355	Simplify if (lladdr) condition in nd6_cache_lladdr(): For case (7) (new entry) nothing has to be done except lle_event. Invoke this event directly from "create new lle" code block. For case (4) (existing entry, same mac) useless mac update was performed, along with LLENTRY_RESOLVED lle_event. There was no sense in doing that, since nothing really had changed. Simply avoid this condition instead. Given that, condition was simplified to (3),(5) states which can be merged with previous block.	2015-10-04 12:42:07 +00:00
melifaro	0dee60f8c8	Eliminate nd6_llinfo_settimer(). All consumers were converted to use nd6_llinfo_settimer_locked() in r216022. Make nd6_llinfo_settimer_locked() static: last external consumer was converted in r288124.	2015-10-04 08:33:16 +00:00
melifaro	02d9938404	Add __noinline attribute to several functions to ease dtrace instrumentation	2015-10-04 08:21:15 +00:00
melifaro	15bc65f144	Fix condition for nd6_llinfo_getholdsrc() introduced in r287484. Effectively it always returned NULL so SAS was always performed and sometimes the result might have been different. Fix state machine change accidentally introduced in r287985: state (4) inside nd6_cache_lladdr() (existing entry got nd message with the same lladdress) started to cause lle state transition to STALE instead of no-action.	2015-10-04 07:02:17 +00:00
hrs	defd45464b	- Schedule DAD for IN6_IFF_TENTATIVE addresses in nd6_timer(). This catches cases that DAD probes cannot be sent because of IFF_UP && !IFF_DRV_RUNNING. - nd6_dad_starttimer() now calls nd6_dad_ns_output(), instead of calling it before nd6_dad_starttimer(). - Do not release an entry in dadq when a duplicate entry is being added.	2015-10-03 12:09:12 +00:00
ae	c3f8d46dc4	Take extra reference to security policy before calling crypto_dispatch(). Currently we perform crypto requests for IPSEC synchronous for most of crypto providers (software, aesni) and only VIA padlock calls crypto callback asynchronous. In synchronous mode it is possible, that security policy will be removed during the processing crypto request. And crypto callback will release the last reference to SP. Then upon return into ipsec[46]_process_packet() IPSECREQUEST_UNLOCK() will be called to already freed request. To prevent this we will take extra reference to SP. PR: 201876 Sponsored by: Yandex LLC	2015-09-30 08:16:33 +00:00
melifaro	91b3356875	Eliminate nd6_nud_hint() and its TCP bindings. Initially function was introduced in r53541 (KAME initial commit) to "provide hints from upper layer protocols that indicate a connection is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability Confirmation). However, it was converted to do nothing (e.g. just return) in r122922 (tcp_hostcache implementation) back in 2003. Some defines were moved to tcp_var.h in r169541. Then, it was broken (for non-corner cases) by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So, right now this code is broken and has no "real" base users. Differential Revision: https://reviews.freebsd.org/D3699	2015-09-27 05:29:34 +00:00
melifaro	4fed811000	rtsock requests for deleting interface address lles started to return EPERM instead of old "ignore-and-return 0" in r287789. This broke arp -da / ndp -cn behavior (they exit on rtsock command failure). Fix this by translating LLE_IFADDR to RTM_PINNED flag, passing it to userland and making arp/ndp ignore these entries in batched delete. MFC after: 2 weeks	2015-09-27 04:54:29 +00:00
melifaro	88b54de46b	Use standard lle LLE_EXCLUSIVE request flags instead of its redefined version.	2015-09-22 20:45:04 +00:00
bz	34885c7609	Compare mbuf pointer to NULL rather than to 0. No functional change. MFC after: 2 weeks	2015-09-21 12:53:26 +00:00
bz	3e1ba7cafd	In the UDP over IPv6 implementation several cases are using the wrong protocol, e.g., based on wrong "next header" assumptions (which does not have to point to the upper layer protocol), or using hard-coded UDP instead of UDP or UDP-Lite possibly switching protocols. Fix those cases for UDP-Lite to work correctly. PR: 202788 Submitted by: Tiwei Bie (btw mail.ustc.edu.cn) [parts] Reviewed by: gnn, Tiwei Bie (btw mail.ustc.edu.cn), kevlo (earlier version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3686	2015-09-21 12:32:36 +00:00
melifaro	536c20752f	Unify nd6 state switching by using newly-created nd6_llinfo_setstate() function. The change is mostly mechanical with the following exception: Last piece of nd6_resolve_slow() was refactored: ND6_LLINFO_PERMANENT condition was removed as always-true, explicit ND6_LLINFO_NOSTATE -> ND6_LLINFO_INCOMPLETE state transition was removed as duplicate. Reviewed by: ae Sponsored by: Yandex LLC	2015-09-21 11:19:53 +00:00
melifaro	009169304f	Add "stale" timer back to nd6_cache_lladdr(). Setting timer was accidentally removed in r276844 due to misleading comment on its meaningless. Add it back to restore proper behaviour.	2015-09-21 10:24:34 +00:00
melifaro	d6bd5ed2ba	Cleanup nd6_cache_lladdr(). No functional changes. * Since new extries are now allocated explicitly, fill in all the necessary fields for lle _before_ attaching it to the table. * Remove ND6_LLINFO_INCOMPLETE check which was unused even in first KAME merge (r53541). * After that, the only new state that function can set, was ND6_LLINFO_STALE. Given everything above, simplify logic besides do_update and is_newentry. * Fix nd_resolve() comment.	2015-09-19 11:50:02 +00:00
melifaro	8da907f42b	* Simplify logic besides llchange variable. * Refresh nd6_is_router() comment.	2015-09-18 07:18:10 +00:00
melifaro	493325342d	Simplify the way of attaching IPv6 link-layer header. Problem description: How do we currently perform layer 2 resolution and header imposition: For IPv4 we have the following chain: ip_output() -> (ether\|atm\|whatever)_output() -> arpresolve() Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data. For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr() We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder). After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again. As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()). Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr() The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve() Error handling: Currently sending packet to non-existing la results in ip6_<output\|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether\|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0. (And EWOULDBLOCK is actually used by IB/TOE code). Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469	2015-09-16 14:26:28 +00:00
melifaro	daa0d6c584	Constantify lookup key in several nd6_* functions.	2015-09-16 11:06:07 +00:00
melifaro	f6e576f283	Simplify nd6_cache_lladdr: * Move isRouter calculation code to separate nd6_is_router() function. * Make nd6_cache_lladdr() return void: its return value hasn't been used since r53541 KAME import in 1999. Sponsored by: Yandex LLC	2015-09-15 17:16:31 +00:00
melifaro	b42c7af3b5	* Require explicitl lle unlink prior to calling llentry_delete(). This one slightly decreases time of holding afdata wlock. * While here, make nd6_free() return void. No one has used its return value since r186119.	2015-09-15 06:48:19 +00:00
vangyzen	75d72d4482	Fix the handling of IPv6 On-Link Redirects. On receipt of a redirect message, install an interface route for the redirected destination. On removal of the corresponding Neighbor Cache entry, remove the interface route. This requires changes in rtredirect_fib() to cope with an AF_LINK address for the gateway and with the absence of RTF_GATEWAY. This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo Phase 2 test suite. Unrelated to the above, fix a recursion on the radix node head lock triggered by the Tahi Redirected to Alternate Router test cases. When I first wrote this patch in October 2012, all Section 2 (Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE, and 8-STABLE. cem@ recently rebased the 10.x patch onto head and reported that it passes Tahi. (Thanks!) These other test cases also passed in 2012: * the RTF_MODIFIED case, with IPv4 and IPv6 (using a RTF_HOST\|RTF_GATEWAY route for the destination) * the redirected-to-self case, with IPv4 and IPv6 * a valid IPv4 redirect All testing in 2012 was done with WITNESS and INVARIANTS. Tested by: EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015, Mark Kelley <mark_kelley@dell.com> in 2012, TC Telkamp <terence_telkamp@dell.com> in 2012 PR: 152791 Reviewed by: melifaro (current rev), bz (earlier rev) Approved by: kib (mentor) MFC after: 1 month Relnotes: yes Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D3602	2015-09-14 19:17:25 +00:00
melifaro	5ad1f2444d	* Do more fine-grained locking: call eventhandlers/free_entry without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573	2015-09-14 16:48:19 +00:00
hrs	0529f0c80f	Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6 forgotten in the previous commit. MFC after: 3 days	2015-09-10 08:37:03 +00:00
hrs	e16dfdb9ef	- Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6. These are quite old APIs and there is no consumer now. MFC after: 3 days	2015-09-10 06:31:24 +00:00
hrs	e5a6c91e16	- Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6. These are quite old APIs and there is no consumer now. - Simplify first and duplicate LLA check. MFC after: 3 days	2015-09-10 06:29:18 +00:00
hrs	e161347f7d	Do not add IN6_IFF_TENTATIVE when ND6_IFF_NO_DAD. MFC after: 3 days	2015-09-10 06:10:30 +00:00
hrs	e5cef93a39	Remove IN6_IFF_NOPFX. This flag was no longer used. MFC after: 3 days	2015-09-10 06:08:42 +00:00
adrian	43407a0ac4	Add support for receiving flowtype, flowid and RSS bucket information as part of recvmsg(). Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3562	2015-09-06 20:57:57 +00:00
melifaro	e31cc5ffc6	Do not pass lle to nd6_ns_output(). Use newly-added nd6_llinfo_get_holdsrc() to extract desired IPv6 source from holdchain and pass it to the nd6_ns_output().	2015-09-05 14:14:03 +00:00
melifaro	d013cff635	Do not skip entries without LLE_VALID flag. This one fixes showing incomplete entries in ndp -an. MFC after: 2 weeks	2015-09-05 06:24:00 +00:00
melifaro	3e1524c83e	Make in6ifa_ifpwithaddr() take const param. Remove unneded DECONST from in6_lltable_rtcheck().	2015-09-05 05:54:09 +00:00
melifaro	3f9699bcfe	Simplify lla_rt_output()/nd6_add_ifa_lle() by setting lle state in alloc handler, based on flags.	2015-08-31 05:03:36 +00:00
adrian	a3c3341951	Implement RSS hashing/re-hashing for IPv6 ingress packets. This mirrors the basic IPv4 implementation - IPv6 packets under RSS now are checked for a correct RSS hash and if one isn't provided, it's done in software. This only handles the initial receive - it doesn't yet handle reinjecting / rehashing packets after being decapsulated from various tunneling setups. That'll come in some follow-up work. For non-RSS users, this is almost a giant no-op. It does change a couple of ipv6 methods to use const mbuf * instead of mbuf * but it doesn't have any functional changes. So, the following now occurs: * If the NIC doesn't do any RSS hashing, it's all done in software. Single-queue, non-RSS NICs will now have the RX path distributed into multiple receive netisr queues. * If the NIC provides the wrong hash (eg only IPv6 hash when we needed an IPv6 TCP hash, or IPv6 UDP hash when we expected IPv6 hash) then the hash is recalculated. * .. if the hash is recalculated, it'll end up being injected into the correct netisr queue for v6 processing. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3504	2015-08-29 07:14:29 +00:00
bz	7bff80af22	remove a left-over after r220463 empty #ifdef INET check. MFC after: 1 week	2015-08-28 09:38:18 +00:00
adrian	2d6b12b499	Replace the printf()s with optional rate limited debugging for RSS. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3471	2015-08-28 05:58:16 +00:00
bz	6d4420afb7	get_inpcbinfo() and get_pcblist() are UDP local functions and do not do what one would expect by name. Prefix them with "udp_" to at least obviously limit the scope. This is a non-functional change. Reviewed by: gnn, rwatson MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3505	2015-08-27 15:27:41 +00:00
adrian	abcfdaaaf0	Call the new RSS hash calculation function to correctly calculate a hash based on the configured requirements for the protocol. Tested: * UDP IPv6 TX/RX testing, w/ RSS enabled, 82599 ixgbe(4) hardware	2015-08-25 06:12:59 +00:00
adrian	830cda9ae5	Implement the IPv6 RSS software hash function. This isn't yet linked into the receive/transmit paths anywhere just yet. This is part of a GSoC 2015 project. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Reviewed by: hiren, gnn Differential Revision: https://reviews.freebsd.org/D3423	2015-08-24 05:36:08 +00:00
hrs	d27954934d	- Deprecate IN6_IFF_NODAD. It was used to prevent DAD on a loopback interface but in6if_do_dad() already had a check for IFF_LOOPBACK. - Remove in6if_do_dad() check in in6_broadcast_ifa(). An address which needs DAD always has IN6_IFF_TENTATIVE there. - in6if_do_dad() now returns EAGAIN when the interface is not ready since DAD callout handler ignores such an interface. - In DAD callout handler, mark an address as IN6_IFF_TENTATIVE when the interface has ND6_IFF_IFDISABLED. And Do IFF_UP and IFF_DRV_RUNNING check consistently when DAD is required. - draft-ietf-6man-enhanced-dad is now published as RFC 7527. - Fix some typos.	2015-08-24 05:21:49 +00:00
melifaro	54b3b78856	* Split allocation and table linking for lle's. Before that, the logic besides lle_create() was the following: return existing if found, create if not. This behaviour was error-prone since we had to deal with 'sudden' static<>dynamic lle changes. This commit fixes bunch of different issues like: - refcount leak when lle is converted to static. Simple check case: console 1: while true; do for i in `arp -an\|awk '$4~/incomp/{print$2}'\|tr -d '()'`; do arp -s $i 00:22:44:66:88:00 ; arp -d $i; done; done console 2: ping -f any-dead-host-in-L2 console 3: # watch for memory consumption: vmstat -m \| awk '$1~/lltable/{print$2}' - possible problems in arptimer() / nd6_timer() when dropping/reacquiring lock. New logic explicitly handles use-or-create cases in every lla_create user. Basically, most of the changes are purely mechanical. However, we explicitly avoid using existing lle's for interface/static LLE records. * While here, call lle_event handlers on all real table lle change. * Create lltable_free_entry() calling existing per-lltable lle_free_t callback for entry deletion	2015-08-20 12:05:17 +00:00
melifaro	0c24547a66	Use single 'lle_timer' callout in lltable instead of two different names of the same timer.	2015-08-11 12:38:54 +00:00
melifaro	d8f92ce2cf	Store addresses instead of sockaddrs inside llentry. This permits us having all (not fully true yet) all the info needed in lookup process in first 64 bytes of 'struct llentry'. struct llentry layout: BEFORE: [rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]] AFTER [ in[6]_addr MAC .. state .. rwlock ] Currently, address part of struct llentry has only 16 bytes for the key. However, lltable does not restrict any custom lltable consumers with long keys use the previous approach (store key at (lle+1)). Sponsored by: Yandex LLC	2015-08-11 09:26:11 +00:00
melifaro	8e6b3a8d59	MFP r276712. * Split lltable_init() into lltable_allocate_htbl() (alloc hash table with default callbacks) and lltable_link() ( links any lltable to the list). * Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field. * Move lltable setup to separate functions in in[6]_domifattach.	2015-08-11 05:51:00 +00:00
melifaro	ba06112c24	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian	2015-08-10 20:50:31 +00:00
melifaro	4f240a9c31	Partially merge r274887,r275334,r275577,r275578,r275586 to minimize differences between projects/routing and HEAD. This commit tries to keep code logic the same while changing underlying code to use unified callbacks. * Add llt_foreach_entry method to traverse all entries in given llt * Add llt_dump_entry method to export particular lle entry in sysctl/rtsock format (code is not indented properly to minimize diff). Will be fixed in the next commits. * Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle. * Add llt_fill_sa_entry method to export address in the lle to sockaddr format. * Add llt_hash method to use in generic hash table support code. * Add llt_free_entry method which is used in llt_prefix_free code. * Prepare for fine-grained locking by separating lle unlink and deletion in lltable_free() and lltable_prefix_free(). * Provide lltable_get<ifp\|af>() functions to reduce direct 'struct lltable' access by external callers. * Remove @llt agrument from lle_free() lle callback since it was unused. * Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting. * Switch to per-af hashing code. * Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method. Update description from these functions. * Use unified lltable_free_entry() function instead of per-af one. Reviewed by: ae	2015-08-10 12:03:59 +00:00
marius	fbf037b786	Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT.	2015-08-08 21:41:59 +00:00
melifaro	33c52eed18	MFP r274295: * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2015-08-08 18:14:59 +00:00
melifaro	20bb5966e2	MFP r274553: * Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete lla_create now returns with LLE_EXCLUSIVE lock for lle. * Provide typedefs for new/existing lltable callbacks. Reviewed by: ae	2015-08-08 17:48:54 +00:00
melifaro	a915efe931	Simplify ip[6] simploop: Do not pass 'dst' sockaddr to ip[6]_mloopback: - We have explicit check for AF_INET in ip_output() - We assume ip header inside passed mbuf in ip_mloopback - We assume ip6 header inside passed mbuf in ip6_mloopback	2015-08-08 15:58:35 +00:00
jch	67927a7a7c	Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc.	2015-08-03 12:13:54 +00:00
ae	55f7ded2fa	Properly handle IPV6_NEXTHOP socket option in selectroute(). o remove disabled code; o if nexthop address is link-local, use embedded scope zone id to determine outgoing interface; o properly fill ro_dst before doing route lookup; o remove LLE lookup, instead check rt_flags for RTF_GATEWAY bit. Sponsored by: Yandex LLC	2015-08-02 12:40:56 +00:00
ae	99ebe8411a	Remove redundant check.	2015-08-02 11:58:24 +00:00
ae	271b2043d8	Eliminate the use of m_copydata() in gif_encapcheck(). ip_encap already has inspected mbuf's data, at least an IP header. And it is safe to use mtod() and do direct access to needed fields. Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that mbuf has a packet header. Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also remove "martian filters" checks. According to RFC 4213 it is enough to verify that the source address is the address of the encapsulator, as configured on the decapsulator. Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-07-29 14:07:43 +00:00
ae	75425458ac	Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock. Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149	2015-07-29 08:12:05 +00:00
tuexen	c0e1a0d3a9	Move including netinet/icmp6.h around to avoid a problem when including netinet/icmp6.h and net/netmap.h. Both use ni_flags... This allows to build multistack with SCTP support. MFC after: 1 week	2015-07-25 18:26:09 +00:00
rrs	e8d0638773	Fix inverted logic bug that David Wolfskill found (thanks David!) MFC after: 3 Weeks	2015-07-22 09:29:50 +00:00
rrs	e4870a15db	When a tunneling protocol is being used with UDP we must release the lock on the INP before calling the tunnel protocol, else a LOR may occur (it does with SCTP for sure). Instead we must acquire a ref count and release the lock, taking care to allow for the case where the UDP socket has gone away and not unlocking since the refcnt decrement on the inp will do the unlock in that case. Reviewed by: tuexen MFC after: 3 weeks	2015-07-21 09:54:31 +00:00
ae	8ada02a437	Add LLE event handler to report ND6 events to userland via rtsock. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2015-07-20 06:58:32 +00:00
ae	291abe56fe	Invoke LLE event handler when entry is deleted. MFC after: 2 weeks Sponsored by: Yandex LLC	2015-07-20 06:54:50 +00:00
ae	cce3941676	Keep IPv6 address specified by IPV6_PKTINFO socket option in kernel internal form to be able handle link-local IPv6 addresses. Reported by: kp Tested by: kp	2015-07-03 19:01:38 +00:00
bz	4c7c9799ed	Move comment to the right position. PR: 152791 Submitted by: vangyzen (as part of the functional change) MFC after: 3 days	2015-07-03 09:53:56 +00:00
tuexen	2af840e2ac	Add FIB support for SCTP. This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200379 MFC after: 3 days	2015-06-17 15:20:14 +00:00
ae	a1e13e9e59	Move RTM announces into generic code to be independent from Layer2 code. This fixes bug introduced in 274988, when announces about new addresses don't sent for tunneling interfaces. Reported by: tuexen@ MFC after: 1 week	2015-05-29 10:24:16 +00:00
tuexen	a82f33e60c	Fix and cleanup the debug information. This has no user-visible changes. Thanks to Irene Ruengeler for proving a patch. MFC after: 3 days	2015-05-28 16:00:23 +00:00
jkim	318c4f97e6	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
ae	cbc4e577f0	Add an ability accept encapsulated packets from different sources by one gif(4) interface. Add new option "ignore_source" for gif(4) interface. When it is enabled, gif's encapcheck function requires match only for packet's destination address. Differential Revision: https://reviews.freebsd.org/D2004 Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2015-05-15 12:19:45 +00:00
hrs	64bd31eb61	- Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice because a link where looped back NS messages are permanently observed does not work with either NDP or ARP for IPv4. - draft-ietf-6man-enhanced-dad is now RFC 7527. Discussed with: hiren MFC after: 3 days	2015-05-12 03:31:57 +00:00
ae	85230adb3c	Mark data checksum as valid for multicast packets, that we send back to myself via simloop. Also remove duplicate check under #ifdef DIAGNOSTIC. PR: 180065 MFC after: 1 week	2015-05-07 14:17:43 +00:00
ae	efaf13e547	Remove unneded #ifdef INET6 and IPSEC. This file compiled only when both options are defined. Include opt_sctp.h and sctp_crc32.h to enable #ifdef SCTP code block and delayed checksum calculation for SCTP.	2015-05-07 12:15:45 +00:00
glebius	1ca5d173ab	Remove #ifdef IFT_FOO. Submitted by: Guy Yur <guyyur gmail.com>	2015-05-02 20:31:27 +00:00
ae	b38774ffc9	Remove now unneded KEY_FREESP() for case when ipsec[46]_process_packet() returns EJUSTRETURN. Sponsored by: Yandex LLC	2015-04-27 01:11:09 +00:00
ae	5a6412a276	Fix possible use after free due to security policy deletion. When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(), we hold one reference to security policy and release it just after return from this function. But IPSec processing can be deffered and when we release reference to security policy after ipsec[46]_process_packet(), user can delete this security policy from SPDB. And when IPSec processing will be done, xform's callback function will do access to already freed memory. To fix this move KEY_FREESP() into callback function. Now IPSec code will release reference to SP after processing will be finished. Differential Revision: https://reviews.freebsd.org/D2324 No objections from: #network Sponsored by: Yandex LLC	2015-04-27 00:55:56 +00:00
glebius	37c56fd2f8	Fix r281649: don't call in6_clearscope() twice. Submitted by: ae	2015-04-17 15:26:08 +00:00
glebius	14b7122d6d	Provide functions to determine presence of a given address configured on a given interface. Discussed with: np Sponsored by: Nginx, Inc.	2015-04-17 11:57:06 +00:00
markj	47f557e75d	Fix a possible refcount leak in regen_tmpaddr(). public_ifa6 may be set to NULL after taking a reference to a previous address list element. Instead, only take the reference after leaving the loop but before releasing the address list lock. Differential Revision: https://reviews.freebsd.org/D2253 Reviewed by: ae MFC after: 2 weeks	2015-04-13 01:55:42 +00:00
ae	77d9811f99	Fix the IPV6_MULTICAST_IF sockopt handling. RFC 3493 says when the interface index is specified as zero, the system should select the interface to use for outgoing multicast packets. Even the comment for the in6p_set_multicast_if() function says about index of zero. But in fact for zero index the function just returns EADDRNOTAVAIL. I.e. if you first set some interface and then will try reset it with zero ifindex, you will get EADDRNOTAVAIL. Reset im6o_multicast_ifp to NULL when interface index specified as zero. Also return EINVAL in case when ifnet_byindex() returns NULL. This will be the same behaviour as when ifindex is bigger than V_if_index. And return EADDRNOTAVAIL only when interface is not multicast capable. Reported by: Olivier Cochard-Labbé MFC after: 2 weeks Sponsored by: Yandex LLC	2015-04-10 19:09:51 +00:00
ae	1af0fcc87b	Fix the check for maximum mbuf's size needed to send ND6 NA and NS. It is acceptable that the size can be equal to MCLBYTES. In the later KAME's code this check has been moved under DIAGNOSTIC ifdef, because the size of NA and NS is much smaller than MCLBYTES. So, it is safe to replace the check with KASSERT. PR: 199304 Discussed with: glebius MFC after: 1 week	2015-04-09 12:57:58 +00:00
kp	86419259e3	Evaluate packet size after the firewall had its chance Defer the packet size check until after the firewall has had a look at it. This means that the firewall now has the opportunity to (re-)fragment an oversized packet. Differential Revision: https://reviews.freebsd.org/D1815 Reviewed by: ae Approved by: gnn (mentor)	2015-04-07 20:29:03 +00:00
delphij	250c8d6f6d	Mitigate Local Denial of Service with IPv6 Router Advertisements and log attack attempts. Submitted by: hrs Security: FreeBSD-SA-15:09.nd6 Security: CVE-2015-2923	2015-04-07 20:20:09 +00:00
glebius	8ea08edd44	o Make net.inet6.ip6.mif6table return special API structure, that doesn't contain kernel pointers, and instead has interface index. Bump __FreeBSD_version for that change. o Now, netstat/mroute6.c no longer needs to kvm_read(3) struct ifnet, and no longer needs to include if_var.h Note that this change is far from being a complete move of IPv6 multicast routing to a proper API. Other structures are still dumped into their sysctls as is, requiring userland application to #define _KERNEL when including ip6_mroute.h and then call kvm_read(3) to gather all bits and pieces. But fixing this is out of scope of the opaque ifnet project. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-04-06 22:12:18 +00:00
kp	b57f3509a8	Remove duplicate code We'll just fall into the same local delivery block under the 'if (m->m_flags & M_FASTFWD_OURS)'. Suggested by: ae Differential Revision: https://reviews.freebsd.org/D2225 Approved by: gnn (mentor)	2015-04-06 19:08:44 +00:00
kp	86dedea3cb	Preserve IPv6 fragment IDs accross reassembly and refragmentation When forwarding fragmented IPv6 packets and filtering with PF we reassemble and refragment. That means we generate new fragment headers and a new fragment ID. We already save the fragment IDs so we can do the reassembly so it's straightforward to apply the incoming fragment ID on the refragmented packets. Differential Revision: https://reviews.freebsd.org/D2188 Approved by: gnn (mentor)	2015-04-01 12:15:01 +00:00
glebius	09c0e42529	Move ip6_sprintf() declaration from in6_var.h to in6.h. This is a simple function that works with in6_addr and it is not related to the INET6 stack implementation. Sponsored by: Nginx, Inc.	2015-03-24 16:45:50 +00:00
ae	a41131da37	To avoid a possible race, release the reference to ifa after return from nd6_dad_na_input(). Submitted by: Alexandre Martins MFC after: 1 week	2015-03-19 00:04:25 +00:00
ae	3169d96832	tcp6_ctlinput() doesn't pass MTU value to in6_pcbnotify(). Check cmdarg isn't NULL before dereference, this check was in the ip6_notify_pmtu() before r279588. Reported by: Florian Smeets MFC after: 1 week	2015-03-06 05:50:39 +00:00
hrs	218de7f1d6	- Implement loopback probing state in enhanced DAD algorithm. - Add no_dad and ignoreloop per-IF knob. no_dad disables DAD completely, and ignoreloop is to prevent infinite loop in loopback probing state when loopback is permanently expected.	2015-03-05 21:27:49 +00:00
ae	a312c1bedf	Fix deadlock in IPv6 PCB code. When several threads are trying to send datagram to the same destination, but fragmentation is disabled and datagram size exceeds link MTU, ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all sockets wanted to know MTU to this destination. And since all threads hold PCB lock while sending, taking the lock for each PCB in the in6_pcbnotify() leads to deadlock. RFC 3542 p.11.3 suggests notify all application wanted to receive IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message. But it doesn't require this, when we don't receive ICMPv6 message. Change ip6_notify_pmtu() function to be able use it directly from ip6_output() to notify only one socket, and to notify all sockets when ICMPv6 packet too big message received. PR: 197059 Differential Revision: https://reviews.freebsd.org/D1949 Reviewed by: no objection from #network Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2015-03-04 11:20:01 +00:00
ae	aadc4e1de2	Create nd6_ns_output_fib() function with extra argument fibnum. Use it to initialize mbuf's fibnum. Uninitialized fibnum value can lead to panic in the routing code. Currently we use only RT_DEFAULT_FIB value for initialization. Differential Revision: https://reviews.freebsd.org/D1998 Reviewed by: hrs (previous version) Sponsored by: Yandex LLC	2015-03-03 10:50:03 +00:00
hrs	09859de9fa	Nonce has to be non-NULL for DAD even if net.inet6.ip6.dad_enhanced=0.	2015-03-03 04:28:19 +00:00
hrs	9d87f6a3c3	Implement Enhanced DAD algorithm for IPv6 described in draft-ietf-6man-enhanced-dad-13. This basically adds a random nonce option (RFC 3971) to NS messages for DAD probe to detect a looped back packet. This looped back packet prevented DAD on some pseudo-interfaces which aggregates multiple L2 links such as lagg(4). The length of the nonce is set to 6 bytes. This algorithm can be disabled by setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis. Reported by: hiren Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D1835	2015-03-02 17:30:26 +00:00

... 3 4 5 6 7 ...

1858 Commits