freebsd-dev

Author	SHA1	Message	Date
Alexander V. Chernikov	8268d82cff	Remove per-packet ifa refcounting from IPv6 fast path. Currently ip6_input() calls in6ifa_ifwithaddr() for every local packet, in order to check if the target ip belongs to the local ifa in proper state and increase its counters. in6ifa_ifwithaddr() references found ifa. With epoch changes, both `ip6_input()` and all other current callers of `in6ifa_ifwithaddr()` do not need this reference anymore, as epoch provides stability guarantee. Given that, update `in6ifa_ifwithaddr()` to allow it to return ifa without referencing it, while preserving option for getting referenced ifa if so desired. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28648	2021-02-15 22:33:12 +00:00
Alexander V. Chernikov	605284b894	Enforce net epoch in in6_selectsrc(). in6_selectsrc() may call fib6_lookup() in some cases, which requires epoch. Wrap in6_selectsrc* calls into epoch inside its users. Mark it as requiring epoch by adding NET_EPOCH_ASSERT(). MFC after: 1 weeek Differential Revision: https://reviews.freebsd.org/D28647	2021-02-15 22:33:12 +00:00
Alexander V. Chernikov	924d1c9a05	Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors." Wrong version of the change was pushed inadvertenly. This reverts commit `4a01b854ca`.	2021-02-08 22:32:32 +00:00
Alexander V. Chernikov	4a01b854ca	SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.	2021-02-08 21:42:20 +00:00
Alexander V. Chernikov	d9999ae9ca	Fix use-after-free in icmp6_notify_error(). Reported by: Maxime Villard <max at m00nbsd.net> Reviewed by: markj MFC after: 3 days	2020-10-28 20:22:20 +00:00
Mark Johnston	4caea9b169	icmp6: Count packets dropped due to an invalid hop limit Pad the icmp6stat structure so that we can add more counters in the future without breaking compatibility again, last done in r358620. Annotate the rarely executed error paths with __predict_false while here. Reviewed by: bz, melifaro Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26578	2020-10-19 17:07:19 +00:00
Alexander V. Chernikov	eddfb2e86f	Fix IPv6 regression introduced by r362900. PR: kern/247729	2020-07-03 08:06:26 +00:00
Alexander V. Chernikov	6ad7446c6f	Complete conversions from fib<4\|6>_lookup_nh_<basic\|ext> to fib<4\|6>_lookup(). fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. With no callers remaining, remove fib[46]_lookup_nh_ functions. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25445	2020-07-02 21:04:08 +00:00
Alexander V. Chernikov	da187ddb3d	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:49:42 +00:00
Alexander V. Chernikov	e7403d0230	Revert r361704, it accidentally committed merged D25067 and D25070.	2020-06-01 20:40:40 +00:00
Alexander V. Chernikov	79674562b8	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:32:02 +00:00
Alexander V. Chernikov	a37a5246ca	Use fib[46]_lookup() in mtu calculations. fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Differential Revision: https://reviews.freebsd.org/D24974	2020-05-28 08:00:08 +00:00
Alexander V. Chernikov	9ac7c6cfed	Convert IP/IPv6 forwarding, ICMP processing and IP PCB laddr selection to the new routing KPI. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24245	2020-04-14 23:06:25 +00:00
Alexander V. Chernikov	34a5582c47	Bring back redirect route expiration. Redirect (and temporal) route expiration was broken a while ago. This change brings route expiration back, with unified IPv4/IPv6 handling code. It introduces net.inet.icmp.redirtimeout sysctl, allowing to set an expiration time for redirected routes. It defaults to 10 minutes, analogues with net.inet6.icmp6.redirtimeout. Implementation uses separate file, route_temporal.c, as route.c is already bloated with tons of different functions. Internally, expiration is implemented as an per-rnh callout scheduled when route with non-zero rt_expire time is added or rt_expire is changed. It does not add any overhead when no temporal routes are present. Callout traverses entire routing tree under wlock, scheduling expired routes for deletion and calculating the next time it needs to be run. The rationale for such implemention is the following: typically workloads requiring large amount of routes have redirects turned off already, while the systems with small amount of routes will not inhibit large overhead during tree traversal. This changes also fixes netstat -rn display of route expiration time, which has been broken since the conversion from kread() to sysctl. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23075	2020-01-22 13:53:18 +00:00
Bjoern A. Zeeb	0700d2c3f0	Make icmp6_reflect() static. icmp6_reflect() is not used anywhere outside icmp6.c, no reason to export it. Sponsored by: Netflix	2019-12-03 14:46:38 +00:00
Bjoern A. Zeeb	a4adf6cc65	Fix m_pullup() problem after removing PULLDOWN_TESTs and KAME EXT_*macros. r354748-354750 replaced the KAME macros with m_pulldown() calls. Contrary to the rest of the network stack m_len checks before m_pulldown() were not put in placed (see r354748). Put these m_len checks in place for now (to go along with the style of the network stack since the initial commits). These are not put in for performance but to avoid an error scenario (even though it also will help performance at the moment as it avoid allocating an extra mbuf; not because of the unconditional function call). The observed error case went like this: (1) an mbuf with M_EXT arrives and we call m_pullup() unconditionally on it. (2) m_pullup() will call m_get() unless the requested length is larger than MHLEN (in which case it'll m_freem() the perfectly fine mbuf) and migrate the requested length of data and pkthdr into the new mbuf. (3) If m_get() succeeds, a further m_pullup() call going over MHLEN will fail. This was observed with failing auto-configuration as an RA packet of 200 bytes exceeded MHLEN and the m_pullup() called from nd6_ra_input() dropped the mbuf. (Re-)adding the m_len checks before m_pullup() calls avoids this problems with mbufs using external storage for now. MFC after: 3 weeks Sponsored by: Netflix	2019-12-01 00:22:04 +00:00
Bjoern A. Zeeb	32af08ecad	icmpv6: Fix mbuf change in mld After r354748 mld_input() can change the mbuf. The new pointer is never returned to icmp6_input() and when passed to icmp6_rip6_input() the mbuf may no longer valid leading to a panic. Pass a pointer to the mbuf to mld_input() so we can return an updated version in the non-error case. Add a test sending an MLD packet case which will trigger this bug. Pointyhat to: bz Reported by: gallatin, thj MFC After: 2 weeks X-MFC with: r354748 Sponsored by: Netflix	2019-11-18 21:59:47 +00:00
Bjoern A. Zeeb	a61b5cfbbf	netinet6: Remove PULLDOWN_TESTs. Remove the KAME introduced PULLDOWN_TESTs which did not even have a compile-time option in sys/conf to turn them on for a custom kernel build. They made the code a lot harder to read or more complicated in a few cases. Convert the IP6_EXTHDR_CHECK() calls into FreeBSD looking code. Rather than throwing the packet away if it would not fit the KAME mbuf expectations, convert the macros to m_pullup() calls. Do not do any extra manual conditional checks upfront as to whether the m_len would suffice (), simply let m_pullup() do its work (incl. an early check). Remove extra m_pullup() calls where earlier in the function or the only caller has already done the pullup. Discussed with: rwatson () Reviewed by: ae MFC after: 8 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22334	2019-11-15 21:40:40 +00:00
Bjoern A. Zeeb	a8fe77d877	netinet: update mp to pass the proper value back In ip6_[direct_]input() we are looping over the extension headers to deal with the next header. We pass a pointer to an mbuf pointer to the handling functions. In certain cases the mbuf can be updated there and we need to pass the new one back. That missing in dest6_input() and route6_input(). In tcp6_input() we should also update it before we call tcp_input(). In addition to that mark the mbuf NULL all the times when we return that we are done with handling the packet and no next header should be checked (IPPROTO_DONE). This will eventually allow us to assert proper behaviour and catch the above kind of errors more easily, expecting *mp to always be set. This change is extracted from a larger patch and not an exhaustive change across the entire stack yet. PR: 240135 Reported by: prabhakar.lakhera gmail.com MFC after: 3 weeks Sponsored by: Netflix	2019-11-12 15:46:28 +00:00
Gleb Smirnoff	cf377af6e2	Remove unnecessary recursive epoch enter via INP_INFO_RLOCK macro in icmp6_rip6_input(). It shall always run in the network epoch.	2019-11-07 20:43:12 +00:00
Bjoern A. Zeeb	503f4e4736	netinet*: variable cleanup In preparation for another change factor out various variable cleanups. These mainly include: (1) do not assign values to variables during declaration: this makes the code more readable and does allow for better grouping of variable declarations, (2) do not assign values to variables before need; e.g., if a variable is only used in the 2nd half of a function and we have multiple return paths before that, then do not set it before it is needed, and (3) try to avoid assigning the same value multiple times. MFC after: 3 weeks Sponsored by: Netflix	2019-11-07 18:29:51 +00:00
Gleb Smirnoff	b8a6e03fac	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
Bjoern A. Zeeb	0ecd976e80	IPv6 cleanup: kernel Finish what was started a few years ago and harmonize IPv6 and IPv4 kernel names. We are down to very few places now that it is feasible to do the change for everything remaining with causing too much disturbance. Remove "aliases" for IPv6 names which confusingly could indicate that we are talking about a different data structure or field or have two fields, one for each address family. Try to follow common conventions used in FreeBSD. * Rename sin6p to sin6 as that is how it is spelt in most places. * Remove "aliases" (#defines) for: - in6pcb which really is an inpcb and nothing separate - sotoin6pcb which is sotoinpcb (as per above) - in6p_sp which is inp_sp - in6p_flowinfo which is inp_flow * Try to use ia6 for in6_addr rather than in6p. * With all these gone also rename the in6p variables to inp as that is what we call it in most of the network stack including parts of netinet6. The reasons behind this cleanup are that we try to further unify netinet and netinet6 code where possible and that people will less ignore one or the other protocol family when doing code changes as they may not have spotted places due to different names for the same thing. No functional changes. Discussed with: tuexen (SCTP changes) MFC after: 3 months Sponsored by: Netflix	2019-08-02 07:41:36 +00:00
Hiroki Sato	7460ef5d7a	Fix hostname to be returned in an ICMPv6 NI Reply message defined in RFC 4620, ICMPv6 Node Information Queries. A vnet jail with an IPv6 address sent a hostname of the host environment, not the jail, even if another hostname was set to the jail. This change can be tested by the following commands: # ifconfig epair0 create # jail -c -n j1 vnet host.hostname=vnetjail path=/ persist # ifconfig epair0b vnet j1 # ifconfig epair0a inet6 -ifdisabled auto_linklocal up # jexec j1 ifconfig epair0b inet6 -ifdisabled auto_linklocal up # ping6 -w ff02::1%epair0a Differential Revision: https://reviews.freebsd.org/D20207 MFC after: 1 week	2019-05-16 19:09:41 +00:00
Gleb Smirnoff	a68cc38879	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
Bjoern A. Zeeb	6675bee81a	In icmp6_rip6_input(), once we have a lock, make sure the inp is not freed. This can happen since the list traversal and locking was converted to epoch(9). If the inp is marked "freed", skip it. This prevents a NULL pointer deref panic in ip6_savecontrol_v4() trying to access the socket hanging off the inp, which was gone by the time we got there. Reported by: andrew Tested by: andrew Approved by: re (gjb)	2018-09-20 15:45:53 +00:00
Andrew Turner	5f901c92a8	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
Matt Macy	6573d7580b	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066	2018-07-04 02:47:16 +00:00
Matt Macy	b872626dbe	mechanical CK macro conversion of inpcbinfo lists This is a dependency for converting the inpcbinfo hash and info rlocks to epoch.	2018-06-12 22:18:20 +00:00
Matt Macy	4f6c66cc9c	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
Matt Macy	d7c5a620e2	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
Andrey V. Elsukov	849eeaa592	icmp6_reflect() sends ICMPv6 message with new IPv6 header. So, it is considered as originated by our host packet. And thus rcvif should be NULL, since it is used by ipfw(4) to determine that packet was originated from this host. Some of icmp6_reflect() consumers reuse mbuf and m_pkthdr without resetting rcvif pointer. To avoid this always reset m_pkthdr.rcvif pointer to NULL in icmp6_reflect(). Also remove such line and comment describing this from icmp6_error(), since it does not longer matters. PR: 227674 Reported by: eugen MFC after: 1 week	2018-04-23 12:20:07 +00:00
Jonathan T. Looney	c187c03466	Remove some unneccessary variable sets in IPv6 code, as detected by clang's static analyzer. Reviewed by: bz MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10940	2018-03-24 12:43:34 +00:00
Eric van Gyzen	43105e589a	Fix ICMPv6 redirects icmp6_redirect_input() validates that a redirect packet came from the current gateway for the respective destination. To do this, it compares the source address, which has an embedded scope zone id, to the next-hop address, which does not. If the address is link-local, which should be the case, the comparison fails and the redirect is ignored. Insert the scope zone id into the next-hop address so the comparison is accurate. Unsurprisingly, this fixes 35 UNH IPv6 conformance test cases. Submitted by: Farrell Woods <Farrell_Woods@Dell.com> (initial revision) Reviewed by: ae melifaro dab MFC after: 1 week Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D14254	2018-02-09 00:13:05 +00:00
Andrey V. Elsukov	a406128960	Follow the RFC6980 and silently ignore following IPv6 NDP messages that had the IPv6 fragmentation header: o Neighbor Solicitation o Neighbor Advertisement o Router Solicitation o Router Advertisement o Redirect Introduce M_FRAGMENTED mbuf flag, and set it after IPv6 fragment reassembly is completed. Then check the presence of this flag in correspondig ND6 handling routines. PR: 224247 MFC after: 2 weeks	2017-12-15 12:37:32 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Jonathan T. Looney	fb04394554	Fix two places in the ICMP6 code where we could dereference a NULL pointer in the icmp6_input() function. When processing an ICMP6_ECHO_REQUEST, if IP6_EXTHDR_GET fails, it will set nicmp6 and n to NULL. Therefore, we should condition our modification to nicmp6 on n being not NULL. And, when processing an ICMP6_WRUREQUEST in the (mode != FQDN) case, if m_dup_pkthdr() fails, the code will set n to NULL. However, the very next line dereferences n. Therefore, when m_dup_pkthdr() fails, we should discontinue further processing and follow the same path as when m_gethdr() fails. Reported by: clang static analyzer Reviewed by: ae MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D10941	2017-05-30 14:41:31 +00:00
Alan Somers	559b42968c	Constrain IPv6 routes to single FIBs when net.add_addr_allfibs=0 sys/netinet6/icmp6.c Use the interface's FIB for source address selection in ICMPv6 error responses. sys/netinet6/in6.c In in6_newaddrmsg, announce arrival of local addresses on the interface's FIB only. In in6_lltable_rtcheck, use a per-fib ND6 cache instead of a single cache. sys/netinet6/in6_src.c In in6_selectsrc, use the caller's fib instead of the default fib. In in6_selectsrc_socket, remove a superfluous check. sys/netinet6/nd6.c In nd6_lle_event, use the interface's fib for routing socket messages. In nd6_is_new_addr_neighbor, check all FIBs when trying to determine whether an address is a neighbor. Also, simplify the code for point to point interfaces. sys/netinet6/nd6.h sys/netinet6/nd6.c sys/netinet6/nd6_rtr.c Make defrouter_select fib-aware, and make all of its callers pass in the interface fib. sys/netinet6/nd6_nbr.c When inputting a Neighbor Solicitation packet, consider the interface fib instead of the default fib for DAD. Output NS and Neighbor Advertisement packets on the correct fib. sys/netinet6/nd6_rtr.c Allow installing the same host route on different interfaces in different FIBs. If rt_add_addr_allfibs=0, only install or delete the prefix route on the interface fib. tests/sys/netinet/fibs_test.sh Clear some expected failures, but add a skip for the newly revealed BUG217871. PR: 196361 Submitted by: Erick Turnquist <jhujhiti@adjectivism.org> Reported by: Jason Healy <jhealy@logn.net> Reviewed by: asomers MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D9451	2017-03-17 16:50:37 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Michael Tuexen	3e1465754f	Make ICMPv6 hard error handling for TCP consistent with the ICMPv4 handling. Ensure that: * Protocol unreachable errors are handled by indicating ECONNREFUSED to the TCP user for both IPv4 and IPv6. These were ignored for IPv6. * Communication prohibited errors are handled by indicating ECONNREFUSED to the TCP user for both IPv4 and IPv6. These were ignored for IPv6. * Hop Limited exceeded errors are handled by indicating EHOSTUNREACH to the TCP user for both IPv4 and IPv6. For IPv6 the TCP connected was dropped but errno wasn't set. Reviewed by: gallatin, rrs MFC after: 1 month Sponsored by: Netflix Differential Revision: 7904	2016-10-21 10:32:57 +00:00
Andrey V. Elsukov	5a03e7819a	Fix bug introduced in r274300. In icmp6_reflect() use original source address of erroneous packet as destination address for source selection algorithm when original destination address is not one of our own. Reported by: Mark Kamichoff <prox at prolixium com> Tested by: Mark Kamichoff <prox at prolixium com> MFC after: 1 week	2016-09-29 19:57:37 +00:00
Mark Johnston	2d12d25c6a	Reduce code duplication around NDP message handlers in icmp6_input(). No functional change intended. MFC after: 2 weeks	2016-09-20 18:08:17 +00:00
Kevin Lo	c3bef61e58	Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878	2016-09-15 07:41:48 +00:00
Andrew Gallatin	d4c22202e6	Rework IPV6 TCP path MTU discovery to match IPv4 - Re-write tcp_ctlinput6() to closely mimic the IPv4 tcp_ctlinput() - Now that tcp_ctlinput6() updates t_maxseg, we can allow ip6_output() to send TCP packets without looking at the tcp host cache for every single transmit. - Make the icmp6 code mimic the IPv4 code & avoid returning PRC_HOSTDEAD because it is so expensive. Without these changes in place, every TCP6 pmtu discovery or host unreachable ICMP resulted in a call to in6_pcbnotify() which walks the tcbinfo table with the write lock held. Because the tcbinfo table is shared between IPv4 and IPv6, this causes huge scalabilty issues on servers with lots of (~100K) TCP connections, to the point where even a small percent of IPv6 traffic had a disproportionate impact on overall throughput. Reviewed by: bz, rrs, ae (all earlier versions), lstewart (in Netflix's tree) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D7272	2016-08-01 17:02:21 +00:00
Mark Johnston	c81c183dae	Use Node Information flag names instead of hard-coding their values. MFC after: 1 week	2016-05-15 03:22:13 +00:00
Alexander V. Chernikov	601c0b8bcc	Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket(). in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and socket-less (ND code). The main reason for that change is inability to specify non-default FIB for callers w/o socket since (internally) inpcb is used to determine fib. As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc() static): 1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check along with returning hop limit when needed. 2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and pass IPv6 address w/ explicitly specified scope as separate argument. Reviewed by: ae (previous version)	2016-01-10 13:40:29 +00:00
Alexander V. Chernikov	ab861e6c06	Do not hold ifaddr reference for the whole icmp6_reflect() exec time. Copy source address, calculate hlim and release refcount instead.	2016-01-10 11:59:55 +00:00
Alexander V. Chernikov	357ce739b9	Remove 'struct route_int6' argument from in6_selectsrc() and in6_selectif(). The main task of in6_selectsrc() is to return IPv6 SAS (along with output interface used for scope checks). No data-path code uses route argument for caching. The only users are icmp6 (reflect code), ND6 ns/na generation code. All this fucntions are control-plane, so there is no reason to try to 'optimize' something by passing cached route into to ip6_output(). Given that, simplify code by eliminating in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is used only by in6_selectsrc(), eliminate its 'struct route_in6' argument, too. While here, reshape rte-related code inside in6_selectif() to free lookup result immediately after saving all the needed fields.	2016-01-03 10:43:23 +00:00
Alexander V. Chernikov	4fb3a8208c	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102	2015-12-31 05:03:27 +00:00
Alexander V. Chernikov	9977be4a64	Make in_arpinput(), inp_lookup_mcast_ifp(), icmp_reflect(), ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(), in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api. Eliminate now-unused ip_rtaddr(). Fix lookup key fib6_lookup_nh_basic() which was lost diring merge. Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always return IPv6 destination address with embedded scope. Currently rw_gateway has it scope embedded, do the same for non-gatewayed destinations. Sponsored by: Yandex LLC	2015-12-09 11:14:27 +00:00

1 2 3 4 5

203 Commits