freebsd-dev

Author	SHA1	Message	Date
Mark Johnston	aa71d6b4a2	netinet: Disallow unspecified addresses in ICMP-embedded packets Reported by: glebius Reported by: syzbot+981c528ccb5c5534dffc@syzkaller.appspotmail.com Reviewed by: tuexen, glebius MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D38936	2023-03-13 10:45:56 -04:00
Justin Hibbits	3d0d5b21c9	IfAPI: Explicitly include <net/if_private.h> in netstack Summary: In preparation of making if_t completely opaque outside of the netstack, explicitly include the header. <net/if_var.h> will stop including the header in the future. Sponsored by: Juniper Networks, Inc. Reviewed by: glebius, melifaro Differential Revision: https://reviews.freebsd.org/D38200	2023-01-31 15:02:16 -05:00
Kristof Provost	a974702e27	pf: apply the network stack's ICMP rate limiting to ICMP errors sent by pf PR: 266477 Event: Aberdeen Hackathon 2022 Differential Revision: https://reviews.freebsd.org/D36903	2022-10-14 10:36:16 +02:00
Gleb Smirnoff	fcb3f813f3	netinet: remove PRC_ constants and streamline ICMP processing In the original design of the network stack from the protocol control input method pr_ctlinput was used notify the protocols about two very different kinds of events: internal system events and receival of an ICMP messages from outside. These events were coded with PRC_ codes. Today these methods are removed from the protosw(9) and are isolated to IPv4 and IPv6 stacks and are called only from icmp_input(). The PRC_ codes now just create a shim layer between ICMP codes and errors or actions taken by protocols. - Change ipproto_ctlinput_t to pass just pointer to ICMP header. This allows protocols to not deduct it from the internal IP header. - Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer. It has all the information needed to the protocols. In the structure, change ip6c_finaldst fields to sockaddr_in6. The reason is that icmp6_input() already has this address wrapped in sockaddr, and the protocols want this address as sockaddr. - For UDP tunneling control input, as well as for IPSEC control input, change the prototypes to accept a transparent union of either ICMP header pointer or struct ip6ctlparam pointer. - In icmp_input() and icmp6_input() do only validation of ICMP header and count bad packets. The translation of ICMP codes to errors/actions is done by protocols. - Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap, inet6ctlerrmap arrays. - In protocol ctlinput methods either trust what icmp_errmap() recommend, or do our own logic based on the ICMP header. Differential revision: https://reviews.freebsd.org/D36731	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	43d39ca7e5	netinet*: de-void control input IP protocol methods After decoupling of protosw(9) and IP wire protocols in `78b1fc05b2` for IPv4 we got vector ip_ctlprotox[] that is executed only and only from icmp_input() and respectively for IPv6 we got ip6_ctlprotox[] executed only and only from icmp6_input(). This allows to use protocol specific argument types in these methods instead of struct sockaddr and void. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36727	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	46ddeb6be8	netinet6: retire ip6protosw.h The netinet/ipprotosw.h and netinet6/ip6protosw.h were KAME relics, with the former removed in `f0ffb944d2` in 2001 and the latter survived until today. It has been reduced down to only one useful declaration that moves to ip6_var.h Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36726	2022-10-03 20:53:04 -07:00
Gleb Smirnoff	b730de8bad	mld6: use callout(9) directly instead of pr_slowtimo, pr_fasttimo While here remove recursive network epoch entry in mld_fasttimo_vnet(), as this function is already in epoch. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36161	2022-08-17 11:50:31 -07:00
Gleb Smirnoff	78b1fc05b2	protosw: separate pr_input and pr_ctlinput out of protosw The protosw KPI historically has implemented two quite orthogonal things: protocols that implement a certain kind of socket, and protocols that are IPv4/IPv6 protocol. These two things do not make one-to-one correspondence. The pr_input and pr_ctlinput methods were utilized only in IP protocols. This strange duality required IP protocols that doesn't have a socket to declare protosw, e.g. carp(4). On the other hand developers of socket protocols thought that they need to define pr_input/pr_ctlinput always, which lead to strange dead code, e.g. div_input() or sdp_ctlinput(). With this change pr_input and pr_ctlinput as part of protosw disappear and IPv4/IPv6 get their private single level protocol switch table ip_protox[] and ip6_protox[] respectively, pointing at array of ipproto_input_t functions. The pr_ctlinput that was used for control input coming from the network (ICMP, ICMPv6) is now represented by ip_ctlprotox[] and ip6_ctlprotox[]. ipproto_register() becomes the only official way to register in the table. Those protocols that were always static and unlikely anybody is interested in making them loadable, are now registered by ip_init(), ip6_init(). An IP protocol that considers itself unloadable shall register itself within its own private SYSINIT(). Reviewed by: tuexen, melifaro Differential revision: https://reviews.freebsd.org/D36157	2022-08-17 11:50:31 -07:00
Gleb Smirnoff	948f31d7b0	netinet: do not broadcast PRC_REDIRECT_HOST on ICMP redirect This is expensive and useless call. It has been useless since Alexander melifaro@ moved the forwarding table to nexthops with passive invalidation. What happens now is that cached route in a inpcb would get invalidated on next ip_output(). These were the last users of pfctlinput(), so garbage collect it. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36156	2022-08-12 08:31:29 -07:00
Kornel Dulęba	82042465c3	icmp6: Improve validation of PMTU Currently we accept any pmtu between IPV6_MMTU(1280B) and the link mtu. In some network topologies could allow a bad actor to perform a DOS attack. Contrary to IPv4 in IPv6 oversized packets are dropped, and a ICMP PACKET_TOO_BIG message is sent back to the sender. After receiving an ICMPv6 packet with pmtu bigger than the current one the victim will start sending frames that will be dropped a router with reduced MTU. Although it will eventually receive another message with correct pmtu, an attacker can still just inject their spoofed packets frequently enough to overwrite the correct value. This issue is described in detail in RFC8201, section 6. Fix this by checking the current pmtu, and accepting the new one only if it's smaller. Approved by: mw(mentor) Reviewed by: tuexen MFC after: 1 week Sponsored by: Stormshield Obtained from: Semihalf Differential Revision: https://reviews.freebsd.org/D35871	2022-07-27 16:09:56 +02:00
Cy Schubert	db0ac6ded6	Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816" This reverts commit `266f97b5e9`, reversing changes made to `a10253cffe`. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.	2021-12-02 14:45:04 -08:00
Cy Schubert	266f97b5e9	wpa: Import wpa_supplicant/hostapd commit 14ab4a816 This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month	2021-12-02 13:35:14 -08:00
Gleb Smirnoff	de2d47842e	SMR protection for inpcbs With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addresses, etc). However, inpcb aren't static in nature, they are created and destroyed all the time, which creates some traffic on the epoch(9) garbage collector. Fairly new feature of uma(9) - Safe Memory Reclamation allows to safely free memory in page-sized batches, with virtually zero overhead compared to uma_zfree(). However, unlike epoch(9), it puts stricter requirement on the access to the protected memory, needing the critical(9) section to access it. Details: - The database is already build on CK lists, thanks to epoch(9). - For write access nothing is changed. - For a lookup in the database SMR section is now required. Once the desired inpcb is found we need to transition from SMR section to r/w lock on the inpcb itself, with a check that inpcb isn't yet freed. This requires some compexity, since SMR section itself is a critical(9) section. The complexity is hidden from KPI users in inp_smr_lock(). - For a inpcb list traversal (a pcblist sysctl, or broadcast notification) also a new KPI is provided, that hides internals of the database - inp_next(struct inp_iterator *). Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33022	2021-12-02 10:48:48 -08:00
Alexander V. Chernikov	c541bd368f	lltable: Add support for "child" LLEs holding encap for IPv4oIPv6 entries. Currently we use pre-calculated headers inside LLE entries as prepend data for `if_output` functions. Using these headers allows saving some CPU cycles/memory accesses on the fast path. However, this approach makes adding L2 header for IPv4 traffic with IPv6 nexthops more complex, as it is not possible to store multiple pre-calculated headers inside lle. Additionally, the solution space is limited by the fact that PCB caching saves LLEs in addition to the nexthop. Thus, add support for creating special "child" LLEs for the purpose of holding custom family encaps and store mbufs pending resolution. To simplify handling of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE. Such LLEs are not visible when iterating LLE table. Their lifecycle is bound to the "parent" LLE - it is not possible to delete "child" when parent is alive. Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state machine used by the standard LLEs. nd6_lookup() and nd6_resolve() now accepts an additional argument, family, allowing to return such child LLEs. This change uses `LLE_SF()` macro which packs family and flags in a single int field. This is done to simplify merging back to stable/. Once this code lands, most of the cases will be converted to use a dedicated `family` parameter. Differential Revision: https://reviews.freebsd.org/D31379 MFC after: 2 weeks	2021-08-21 17:34:35 +00:00
Roy Marples	7045b1603b	socket: Implement SO_RERROR SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports. Reviewed by: philip (network), kbowling (transport), gbe (manpages) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26652	2021-07-28 09:35:09 -07:00
Alexander V. Chernikov	8268d82cff	Remove per-packet ifa refcounting from IPv6 fast path. Currently ip6_input() calls in6ifa_ifwithaddr() for every local packet, in order to check if the target ip belongs to the local ifa in proper state and increase its counters. in6ifa_ifwithaddr() references found ifa. With epoch changes, both `ip6_input()` and all other current callers of `in6ifa_ifwithaddr()` do not need this reference anymore, as epoch provides stability guarantee. Given that, update `in6ifa_ifwithaddr()` to allow it to return ifa without referencing it, while preserving option for getting referenced ifa if so desired. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28648	2021-02-15 22:33:12 +00:00
Alexander V. Chernikov	605284b894	Enforce net epoch in in6_selectsrc(). in6_selectsrc() may call fib6_lookup() in some cases, which requires epoch. Wrap in6_selectsrc* calls into epoch inside its users. Mark it as requiring epoch by adding NET_EPOCH_ASSERT(). MFC after: 1 weeek Differential Revision: https://reviews.freebsd.org/D28647	2021-02-15 22:33:12 +00:00
Alexander V. Chernikov	924d1c9a05	Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors." Wrong version of the change was pushed inadvertenly. This reverts commit `4a01b854ca`.	2021-02-08 22:32:32 +00:00
Alexander V. Chernikov	4a01b854ca	SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.	2021-02-08 21:42:20 +00:00
Alexander V. Chernikov	d9999ae9ca	Fix use-after-free in icmp6_notify_error(). Reported by: Maxime Villard <max at m00nbsd.net> Reviewed by: markj MFC after: 3 days	2020-10-28 20:22:20 +00:00
Mark Johnston	4caea9b169	icmp6: Count packets dropped due to an invalid hop limit Pad the icmp6stat structure so that we can add more counters in the future without breaking compatibility again, last done in r358620. Annotate the rarely executed error paths with __predict_false while here. Reviewed by: bz, melifaro Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26578	2020-10-19 17:07:19 +00:00
Alexander V. Chernikov	eddfb2e86f	Fix IPv6 regression introduced by r362900. PR: kern/247729	2020-07-03 08:06:26 +00:00
Alexander V. Chernikov	6ad7446c6f	Complete conversions from fib<4\|6>_lookup_nh_<basic\|ext> to fib<4\|6>_lookup(). fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. With no callers remaining, remove fib[46]_lookup_nh_ functions. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25445	2020-07-02 21:04:08 +00:00
Alexander V. Chernikov	da187ddb3d	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:49:42 +00:00
Alexander V. Chernikov	e7403d0230	Revert r361704, it accidentally committed merged D25067 and D25070.	2020-06-01 20:40:40 +00:00
Alexander V. Chernikov	79674562b8	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:32:02 +00:00
Alexander V. Chernikov	a37a5246ca	Use fib[46]_lookup() in mtu calculations. fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Differential Revision: https://reviews.freebsd.org/D24974	2020-05-28 08:00:08 +00:00
Alexander V. Chernikov	9ac7c6cfed	Convert IP/IPv6 forwarding, ICMP processing and IP PCB laddr selection to the new routing KPI. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24245	2020-04-14 23:06:25 +00:00
Alexander V. Chernikov	34a5582c47	Bring back redirect route expiration. Redirect (and temporal) route expiration was broken a while ago. This change brings route expiration back, with unified IPv4/IPv6 handling code. It introduces net.inet.icmp.redirtimeout sysctl, allowing to set an expiration time for redirected routes. It defaults to 10 minutes, analogues with net.inet6.icmp6.redirtimeout. Implementation uses separate file, route_temporal.c, as route.c is already bloated with tons of different functions. Internally, expiration is implemented as an per-rnh callout scheduled when route with non-zero rt_expire time is added or rt_expire is changed. It does not add any overhead when no temporal routes are present. Callout traverses entire routing tree under wlock, scheduling expired routes for deletion and calculating the next time it needs to be run. The rationale for such implemention is the following: typically workloads requiring large amount of routes have redirects turned off already, while the systems with small amount of routes will not inhibit large overhead during tree traversal. This changes also fixes netstat -rn display of route expiration time, which has been broken since the conversion from kread() to sysctl. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23075	2020-01-22 13:53:18 +00:00
Bjoern A. Zeeb	0700d2c3f0	Make icmp6_reflect() static. icmp6_reflect() is not used anywhere outside icmp6.c, no reason to export it. Sponsored by: Netflix	2019-12-03 14:46:38 +00:00
Bjoern A. Zeeb	a4adf6cc65	Fix m_pullup() problem after removing PULLDOWN_TESTs and KAME EXT_*macros. r354748-354750 replaced the KAME macros with m_pulldown() calls. Contrary to the rest of the network stack m_len checks before m_pulldown() were not put in placed (see r354748). Put these m_len checks in place for now (to go along with the style of the network stack since the initial commits). These are not put in for performance but to avoid an error scenario (even though it also will help performance at the moment as it avoid allocating an extra mbuf; not because of the unconditional function call). The observed error case went like this: (1) an mbuf with M_EXT arrives and we call m_pullup() unconditionally on it. (2) m_pullup() will call m_get() unless the requested length is larger than MHLEN (in which case it'll m_freem() the perfectly fine mbuf) and migrate the requested length of data and pkthdr into the new mbuf. (3) If m_get() succeeds, a further m_pullup() call going over MHLEN will fail. This was observed with failing auto-configuration as an RA packet of 200 bytes exceeded MHLEN and the m_pullup() called from nd6_ra_input() dropped the mbuf. (Re-)adding the m_len checks before m_pullup() calls avoids this problems with mbufs using external storage for now. MFC after: 3 weeks Sponsored by: Netflix	2019-12-01 00:22:04 +00:00
Bjoern A. Zeeb	32af08ecad	icmpv6: Fix mbuf change in mld After r354748 mld_input() can change the mbuf. The new pointer is never returned to icmp6_input() and when passed to icmp6_rip6_input() the mbuf may no longer valid leading to a panic. Pass a pointer to the mbuf to mld_input() so we can return an updated version in the non-error case. Add a test sending an MLD packet case which will trigger this bug. Pointyhat to: bz Reported by: gallatin, thj MFC After: 2 weeks X-MFC with: r354748 Sponsored by: Netflix	2019-11-18 21:59:47 +00:00
Bjoern A. Zeeb	a61b5cfbbf	netinet6: Remove PULLDOWN_TESTs. Remove the KAME introduced PULLDOWN_TESTs which did not even have a compile-time option in sys/conf to turn them on for a custom kernel build. They made the code a lot harder to read or more complicated in a few cases. Convert the IP6_EXTHDR_CHECK() calls into FreeBSD looking code. Rather than throwing the packet away if it would not fit the KAME mbuf expectations, convert the macros to m_pullup() calls. Do not do any extra manual conditional checks upfront as to whether the m_len would suffice (), simply let m_pullup() do its work (incl. an early check). Remove extra m_pullup() calls where earlier in the function or the only caller has already done the pullup. Discussed with: rwatson () Reviewed by: ae MFC after: 8 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22334	2019-11-15 21:40:40 +00:00
Bjoern A. Zeeb	a8fe77d877	netinet: update mp to pass the proper value back In ip6_[direct_]input() we are looping over the extension headers to deal with the next header. We pass a pointer to an mbuf pointer to the handling functions. In certain cases the mbuf can be updated there and we need to pass the new one back. That missing in dest6_input() and route6_input(). In tcp6_input() we should also update it before we call tcp_input(). In addition to that mark the mbuf NULL all the times when we return that we are done with handling the packet and no next header should be checked (IPPROTO_DONE). This will eventually allow us to assert proper behaviour and catch the above kind of errors more easily, expecting *mp to always be set. This change is extracted from a larger patch and not an exhaustive change across the entire stack yet. PR: 240135 Reported by: prabhakar.lakhera gmail.com MFC after: 3 weeks Sponsored by: Netflix	2019-11-12 15:46:28 +00:00
Gleb Smirnoff	cf377af6e2	Remove unnecessary recursive epoch enter via INP_INFO_RLOCK macro in icmp6_rip6_input(). It shall always run in the network epoch.	2019-11-07 20:43:12 +00:00
Bjoern A. Zeeb	503f4e4736	netinet*: variable cleanup In preparation for another change factor out various variable cleanups. These mainly include: (1) do not assign values to variables during declaration: this makes the code more readable and does allow for better grouping of variable declarations, (2) do not assign values to variables before need; e.g., if a variable is only used in the 2nd half of a function and we have multiple return paths before that, then do not set it before it is needed, and (3) try to avoid assigning the same value multiple times. MFC after: 3 weeks Sponsored by: Netflix	2019-11-07 18:29:51 +00:00
Gleb Smirnoff	b8a6e03fac	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
Bjoern A. Zeeb	0ecd976e80	IPv6 cleanup: kernel Finish what was started a few years ago and harmonize IPv6 and IPv4 kernel names. We are down to very few places now that it is feasible to do the change for everything remaining with causing too much disturbance. Remove "aliases" for IPv6 names which confusingly could indicate that we are talking about a different data structure or field or have two fields, one for each address family. Try to follow common conventions used in FreeBSD. * Rename sin6p to sin6 as that is how it is spelt in most places. * Remove "aliases" (#defines) for: - in6pcb which really is an inpcb and nothing separate - sotoin6pcb which is sotoinpcb (as per above) - in6p_sp which is inp_sp - in6p_flowinfo which is inp_flow * Try to use ia6 for in6_addr rather than in6p. * With all these gone also rename the in6p variables to inp as that is what we call it in most of the network stack including parts of netinet6. The reasons behind this cleanup are that we try to further unify netinet and netinet6 code where possible and that people will less ignore one or the other protocol family when doing code changes as they may not have spotted places due to different names for the same thing. No functional changes. Discussed with: tuexen (SCTP changes) MFC after: 3 months Sponsored by: Netflix	2019-08-02 07:41:36 +00:00
Hiroki Sato	7460ef5d7a	Fix hostname to be returned in an ICMPv6 NI Reply message defined in RFC 4620, ICMPv6 Node Information Queries. A vnet jail with an IPv6 address sent a hostname of the host environment, not the jail, even if another hostname was set to the jail. This change can be tested by the following commands: # ifconfig epair0 create # jail -c -n j1 vnet host.hostname=vnetjail path=/ persist # ifconfig epair0b vnet j1 # ifconfig epair0a inet6 -ifdisabled auto_linklocal up # jexec j1 ifconfig epair0b inet6 -ifdisabled auto_linklocal up # ping6 -w ff02::1%epair0a Differential Revision: https://reviews.freebsd.org/D20207 MFC after: 1 week	2019-05-16 19:09:41 +00:00
Gleb Smirnoff	a68cc38879	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
Bjoern A. Zeeb	6675bee81a	In icmp6_rip6_input(), once we have a lock, make sure the inp is not freed. This can happen since the list traversal and locking was converted to epoch(9). If the inp is marked "freed", skip it. This prevents a NULL pointer deref panic in ip6_savecontrol_v4() trying to access the socket hanging off the inp, which was gone by the time we got there. Reported by: andrew Tested by: andrew Approved by: re (gjb)	2018-09-20 15:45:53 +00:00
Andrew Turner	5f901c92a8	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
Matt Macy	6573d7580b	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066	2018-07-04 02:47:16 +00:00
Matt Macy	b872626dbe	mechanical CK macro conversion of inpcbinfo lists This is a dependency for converting the inpcbinfo hash and info rlocks to epoch.	2018-06-12 22:18:20 +00:00
Matt Macy	4f6c66cc9c	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
Matt Macy	d7c5a620e2	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
Andrey V. Elsukov	849eeaa592	icmp6_reflect() sends ICMPv6 message with new IPv6 header. So, it is considered as originated by our host packet. And thus rcvif should be NULL, since it is used by ipfw(4) to determine that packet was originated from this host. Some of icmp6_reflect() consumers reuse mbuf and m_pkthdr without resetting rcvif pointer. To avoid this always reset m_pkthdr.rcvif pointer to NULL in icmp6_reflect(). Also remove such line and comment describing this from icmp6_error(), since it does not longer matters. PR: 227674 Reported by: eugen MFC after: 1 week	2018-04-23 12:20:07 +00:00
Jonathan T. Looney	c187c03466	Remove some unneccessary variable sets in IPv6 code, as detected by clang's static analyzer. Reviewed by: bz MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10940	2018-03-24 12:43:34 +00:00
Eric van Gyzen	43105e589a	Fix ICMPv6 redirects icmp6_redirect_input() validates that a redirect packet came from the current gateway for the respective destination. To do this, it compares the source address, which has an embedded scope zone id, to the next-hop address, which does not. If the address is link-local, which should be the case, the comparison fails and the redirect is ignored. Insert the scope zone id into the next-hop address so the comparison is accurate. Unsurprisingly, this fixes 35 UNH IPv6 conformance test cases. Submitted by: Farrell Woods <Farrell_Woods@Dell.com> (initial revision) Reviewed by: ae melifaro dab MFC after: 1 week Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D14254	2018-02-09 00:13:05 +00:00
Andrey V. Elsukov	a406128960	Follow the RFC6980 and silently ignore following IPv6 NDP messages that had the IPv6 fragmentation header: o Neighbor Solicitation o Neighbor Advertisement o Router Solicitation o Router Advertisement o Redirect Introduce M_FRAGMENTED mbuf flag, and set it after IPv6 fragment reassembly is completed. Then check the presence of this flag in correspondig ND6 handling routines. PR: 224247 MFC after: 2 weeks	2017-12-15 12:37:32 +00:00

1 2 3 4 5

218 Commits