freebsd-nq

Author	SHA1	Message	Date
Alexander V. Chernikov	9e02229580	Remove now-unused rt_ifp,rt_ifa,rt_gateway,rt_mtu rte fields. After converting routing subsystem customers to use nexthop objects defined in r359823, some fields in struct rtentry became unused. This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry along with the code initializing and updating these fields. Cleanup of the remaining fields will be addressed by D24669. This commit also changes the implementation of the RTM_CHANGE handling. Old implementation tried to perform the whole operation under radix WLOCK, resulting in slow performance and hacks like using RTF_RNH_LOCKED flag. New implementation looks up the route nexthop under radix RLOCK, creates new nexthop and tries to update rte nhop pointer. Only last part is done under WLOCK. In the hypothetical scenarious where multiple rtsock clients repeatedly issue RTM_CHANGE requests for the same route, route may get updated between read and update operation. This is addressed by retrying the operation multiple (3) times before returning failure back to the caller. Differential Revision: https://reviews.freebsd.org/D24666	2020-05-04 14:31:45 +00:00
Gleb Smirnoff	7b6c99d08d	Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf within m_epg namespace. All edits except the 'struct mbuf' declaration and mb_dupcl() were done mechanically with sed: s/->m_ext_pgs.nrdy/->m_epg_nrdy/g s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g s/->m_ext_pgs.trail_len/->m_epg_trllen/g s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g s/->m_ext_pgs.flags/->m_epg_flags/g s/->m_ext_pgs.record_type/->m_epg_record_type/g s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g s/->m_ext_pgs.tls/->m_epg_tls/g s/->m_ext_pgs.so/->m_epg_so/g s/->m_ext_pgs.seqno/->m_epg_seqno/g s/->m_ext_pgs.stailq/->m_epg_stailq/g Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:12:56 +00:00
Alexander V. Chernikov	74787ef47b	Add nhop to the ifa_rtrequest() callback. With the upcoming multipath changes described in D24141, rt->rt_nhop can potentially point to a nexthop group instead of an individual nhop. To simplify caller handling of such cases, change ifa_rtrequest() callback to pass changed nhop directly. Differential Revision: https://reviews.freebsd.org/D24604	2020-04-29 19:28:56 +00:00
Alexander V. Chernikov	e7d8af4f65	Move route_temporal.c and route_var.h to net/route. Nexthop objects implementation, defined in r359823, introduced sys/net/route directory intended to hold all routing-related code. Move recently-introduced route_temporal.c and private route_var.h header there. Differential Revision: https://reviews.freebsd.org/D24597	2020-04-28 19:14:09 +00:00
Alexander V. Chernikov	fe6da72759	Move struct rtentry definition to nhop_var.h. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver features much faster. This is one of the last changes, effectively moving struct rtentry definition to a net/route_var.h header, internal to the routing subsystem. Differential Revision: https://reviews.freebsd.org/D24580	2020-04-28 18:42:30 +00:00
Alexander V. Chernikov	1b0051bada	Eliminate now-unused parts of old routing KPI. r360292 switched most of the remaining routing customers to a new KPI, leaving a bunch of wrappers for old routing lookup functions unused. Remove them from the tree as a part of routing cleanup. Differential Revision: https://reviews.freebsd.org/D24569	2020-04-28 07:25:34 +00:00
Alexander V. Chernikov	55f57ca9ac	Convert debugnet to the new routing KPI. Introduce new fib[46]_lookup_debugnet() functions serving as a special interface for the crash-time operations. Underlying implementation will try to return lookup result if datastructures are not corrupted, avoding locking. Convert debugnet to use fib4_lookup_debugnet() and switch it to use nexthops instead of rtentries. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D24555	2020-04-26 18:42:38 +00:00
Alexander V. Chernikov	49c9f84f54	Fix IPv6 link-local operations with RADIX_MPATH. It was broken by r360292 as fib6_lookup() assumes de-embedded addresses while rtalloc_mpath_fib() requires sockaddr with embedded ones. New fib6_lookup() transparently supports multipath, hence remove old RADIX_MPATH condition.	2020-04-26 18:07:35 +00:00
Alexander V. Chernikov	983066f05b	Convert route caching to nexthop caching. This change is build on top of nexthop objects introduced in r359823. Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms. Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn. Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same. Differential Revision: https://reviews.freebsd.org/D24340	2020-04-25 09:06:11 +00:00
Alexander V. Chernikov	aaad3c4fca	Convert rtentry field accesses into nhop field accesses. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver more features much faster. This commit is mostly mechanical change to eliminate direct struct rtentry field accesses. The only notable difference is AF_LINK gateway encoding. AF_LINK gw is used in routing stack for operations with interface routes and host loopback routes. In the former case it indicates _some_ non-NULL gateway, as the interface is the same as in rt_ifp in kernel and rtm_ifindex in rtsock reporting. In the latter case the interface index inside gateway was used by the IPv6 datapath to verify address scope for link-local interfaces. Kernel uses struct sockaddr_dl for this type of gateway. This structure allows for specifying rich interface data, such as mac address and interface name. However, this results in relatively large structure size - 52 bytes. Routing stack fils in only 2 fields - sdl_index and sdl_type, which reside in the first 8 bytes of the structure. In the new KPI, struct nhop_object tries to be cache-efficient, hence embodies gateway address inside the structure. In the AF_LINK case it stores stortened version of the structure - struct sockaddr_dl_short, which occupies 16 bytes. After D24340 changes, the data inside AF_LINK gateway will not be used in the kernel at all, leaving rtsock as the only potential concern. The difference in rtsock reporting: (old) got message of size 240 on Thu Apr 16 03:12:13 2020 RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 link#5 255.255.255.0 (new) got message of size 200 on Sun Apr 19 09:46:32 2020 RTM_ADD: Add Route: len 200, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 link#5 255.255.255.0 Note 40 bytes different (52-16 + alignment). However, gateway is still a valid AF_LINK gateway with proper data filled in. It is worth noting that these particular messages (interface routes) are mostly ignored by routing daemons: * bird/quagga/frr uses RTM_NEWADDR and ignores prefix route addition messages. * quagga/frr ignores routes without gateway More detailed overview on how rtsock messages are used by the routing daemons to reconstruct the kernel view, can be found in D22974. Differential Revision: https://reviews.freebsd.org/D24519	2020-04-23 08:04:20 +00:00
Alexander V. Chernikov	d98351e13c	Fix lookup key generation in fib6_check_urpf(). The version introduced in r359823 assumed D23051 had been in tree already. As this is not the case yet, revert to sockaddr.	2020-04-19 07:27:12 +00:00
Jonathan T. Looney	5d6e356cb0	Avoid calling protocol drain routines more than once per reclamation event. mb_reclaim() calls the protocol drain routines for each protocol in each domain. Some protocols exist in more than one domain and share drain routines. In the case of SCTP, it also uses the same drain routine for its SOCK_SEQPACKET and SOCK_STREAM entries in the same domain. On systems with INET, INET6, and SCTP all defined, mb_reclaim() calls sctp_drain() four times. On systems with INET and INET6 defined, mb_reclaim() calls tcp_drain() twice. mb_reclaim() is the only in-tree caller of the pr_drain protocol entry. Eliminate this duplication by ensuring that each pr_drain routine is only specified for one protocol entry in one domain. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24418	2020-04-16 20:17:24 +00:00
Alexander V. Chernikov	539642a29d	Add nhop parameter to rti_filter callback. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver more features much faster. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. Additionally, with the followup multipath changes, single rtentry can point to multiple nexthops. With that in mind, convert rti_filter callback used when traversing the routing table to accept pair (rt, nhop) instead of nexthop. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24440	2020-04-16 17:20:18 +00:00
Alexander V. Chernikov	53a4886d5d	Convert ip6_forward() to the new routing KPI. Update ip6_forward() internals to use deembedded IPv6 addresses to simplify calls to the new KPI and prepare for the future scope-embedding cleanup. Add in6_get_unicast_scopeid() and in6_set_unicast_scopeid() scopeid operation functions tailored for unicast processing. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24334	2020-04-15 12:56:05 +00:00
Alexander V. Chernikov	9ac7c6cfed	Convert IP/IPv6 forwarding, ICMP processing and IP PCB laddr selection to the new routing KPI. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24245	2020-04-14 23:06:25 +00:00
Alexander V. Chernikov	dd4776f0cc	Reorganise nd6 notification code to avoid direct rtentry field access. One of the goals of the new routing KPI defined in r359823 is to entirely hide `struct rtentry` from the consumers. Doing so will allow to improve routing subsystem internals and deliver features more easily. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. It introduces rtfree_func() wrapper around RTFREE() and reorganises nd6 notification code to avoid accessing most of the rtentry fields. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24404	2020-04-14 22:48:33 +00:00
Andrew Gallatin	23feb56348	KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to carry more than a single page for sendfile, they are rather cache unfriendly when accessing the ext_pgs metadata and data. This is because the ext_pgs part of the mbuf is allocated separately, and almost guaranteed to be cold in cache. This change takes advantage of the fact that unmapped mbufs are never used at the same time as pkthdr mbufs. Given this fact, we can overlap the ext_pgs metadata with the mbuf pkthdr, and carry the ext_pgs meta directly in the mbuf itself. Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array of pages) directly after the existing m_ext. In order to be able to carry 5 pages (which is the minimum required for a 16K TLS record which is not perfectly aligned) on LP64, I've had to steal ext_arg2. The only user of this in the xmit path is sendfile, and I've adjusted it to use arg1 when using unmapped mbufs. This change is almost entirely mechanical, except that we change mb_alloc_ext_pgs() to no longer allow allocating pkthdrs, the change to avoid ext_arg2 as mentioned above, and the removal of the ext_pgs zone, This change saves roughly 2% "raw" CPU (~59% -> 57%), or over 3% "scaled" CPU on a Netflix 100% software kTLS workload at 90+ Gb/s on Broadwell Xeons. In a follow-on commit, I plan to remove some hacks to avoid access ext_pgs fields of mbufs, since they will now be in cache. Many thanks to glebius for helping to make this better in the Netflix tree. Reviewed by: hselasky, jhb, rrs, glebius (early version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24213	2020-04-14 14:46:06 +00:00
Alexander V. Chernikov	6722086045	Plug netmask NULL check during route addition causing kernel panic. This bug was introduced by the r359823. Reported by: hselasky	2020-04-14 13:12:22 +00:00
Alexander V. Chernikov	3133002560	Remove tcp_rtlookup6() function signature. The function itself was removed in r122922 16 years ago.	2020-04-13 08:26:11 +00:00
Alexander V. Chernikov	a666325282	Introduce nexthop objects and new routing KPI. This is the foundational change for the routing subsytem rearchitecture. More details and goals are available in https://reviews.freebsd.org/D24141 . This patch introduces concept of nexthop objects and new nexthop-based routing KPI. Nexthops are objects, containing all necessary information for performing the packet output decision. Output interface, mtu, flags, gw address goes there. For most of the cases, these objects will serve the same role as the struct rtentry is currently serving. Typically there will be low tens of such objects for the router even with multiple BGP full-views, as these objects will be shared between routing entries. This allows to store more information in the nexthop. New KPI: struct nhop_object fib4_lookup(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, uint32_t flowid); struct nhop_object fib6_lookup(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, uint32_t flowid); These 2 function are intended to replace all all flavours of <in_\|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous fib[46]-generation functions. Upon successful lookup, they return nexthop object which is guaranteed to exist within current NET_EPOCH. If longer lifetime is desired, one can specify NHR_REF as a flag and get a referenced version of the nexthop. Reference semantic closely resembles rtentry one, allowing sed-style conversion. Additionally, another 2 functions are introduced to support uRPF functionality inside variety of our firewalls. Their primary goal is to hide the multipath implementation details inside the routing subsystem, greatly simplifying firewalls implementation: int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); All functions have a separate scopeid argument, paving way to eliminating IPv6 scope embedding and allowing to support IPv4 link-locals in the future. Structure changes: * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size. * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz. Old KPI: During the transition state old and new KPI will coexists. As there are another 4-5 decent-sized conversion patches, it will probably take a couple of weeks. To support both KPIs, fields not required by the new KPI (most of rtentry) has to be kept, resulting in the temporary size increase. Once conversion is finished, rtentry will notably shrink. More details: * architectural overview: https://reviews.freebsd.org/D24141 * list of the next changes: https://reviews.freebsd.org/D24232 Reviewed by: ae,glebius(initial version) Differential Revision: https://reviews.freebsd.org/D24232	2020-04-12 14:30:00 +00:00
Alexander V. Chernikov	c80b717f71	Remove RADIX_MPATH headers, they were unused since r293159. MFC after: 2 weeks	2020-04-11 07:56:11 +00:00
Alexander V. Chernikov	4684d3cbcb	Remove per-AF radix_mpath initializtion functions. Split their functionality by moving random seed allocation to SYSINIT and calling (new) generic multipath function from standard IPv4/IPv5 RIB init handlers. Differential Revision: https://reviews.freebsd.org/D24356	2020-04-11 07:37:08 +00:00
Andrey V. Elsukov	cfad769689	Ignore ND6 neighbor advertisement received for static link-layer entries. Previously such NA could override manually created LLE. Reported by: Martin Beran <martin at mber cz> Reviewed by: melifaro MFC after: 10 days	2020-04-01 02:13:01 +00:00
Mark Johnston	431f2b8712	Use a dedicated taskqueue thread for in6m_release_task(). Interfaces may be detached from a taskqueue_thread task, for example by prison_complete(), so after r359438, when draining the queue we may end up deadlocking. Reported by: Jenkins via lwhsu MFC with: r359438	2020-03-31 02:25:53 +00:00
Mark Johnston	9b1d850be8	Remove the "config" taskqgroup and its KPIs. Equivalent functionality is already provided by taskqueue(9), just use that instead. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-03-30 14:24:03 +00:00
Mark Johnston	e02582d1ae	Fix synchronization in the IPV6_2292PKTOPTIONS set handler. The inpcb needs to be locked when we update output packet options. Otherwise it is possible for the IPV6_2292PKTOPTIONS handler to free packet option structures while another thread is reading or updating them. Note that the option handler is still kind of broken. For instance it frees all options before performing privilege checks for individual options. However, this can be fixed separately. Reported by: syzbot+52eb0fd4ddc119787f9d@syzkaller.appspotmail.com Reviewed by: bz, tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24125	2020-03-19 21:38:52 +00:00
Bjoern A. Zeeb	8483fce695	ip6: retire in6_selectroute_fib() as promised 8 years ago In r231852 I added in6_selectroute_fib() as a compat function with the fibnum as an extra argument compared to in6_selectroute() to keep the KPI stable. Way too late retire this function again and add the fib to in6_selectroute() which also only has a single consumer now and was an orphan function before.	2020-03-03 13:48:12 +00:00
Bjoern A. Zeeb	000c42faf3	ip6_output: use new routing KPI when not passed a cached route Implement the equivalent of r347375 (IPv4) for the IPv6 output path. In IPv6 we get passed a cached route (and inp) by udp6_output() depending on whether we acquired a write lock on the INP. In case we neither bind nor connect a first UDP packet would come in with a cached route (wlocked) and all further packets would not. In case we bind and do not connect we never write-lock the inp. When we do not pass in a cached route, rather than providing the storage for a route locally and pass it over the old lookup code and down the stack, use the new route lookup KPI and acquire all details we need to send the packet. Compared to the IPv4 code the IPv6 code has a couple of possible complications: given an option with a routing hdr/caching route there, and path mtu (ro_pmtu) case which now equally has to deal with the possibility of having a route which is NULL passed in, and the fwd_tag in case a firewall changes the next hop (something to factor out in the future). Sponsored by: Netflix Reviewed by: glebius MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D23886	2020-03-03 11:32:47 +00:00
Bjoern A. Zeeb	5f3e375ed8	in6_fib: return nh_ia in the ext interface as we do for IPv4 Like for IPv4 add nh_ia to the ext interface and return rt_ifa in order to be used for, e.g., packet/octets accounting purposes. Reviewed by: melifaro MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23873	2020-03-03 09:50:33 +00:00
Bjoern A. Zeeb	f6428cdb1f	fib6_rte_to_nh_: return a link-local gw address with scope embedded In fib6_rte_to_nh_ when returning a link-local gateway address currently we do clear the scope. That could be recovered using the ifp returned as well, but the code in general seems to expect a link-local address with scope embeedded as otherwise the "dst" (gw) passed to the output routines will not include scope and not send the packet out (the right interface). Do not clear the scope when returning a link-local address and allow packets to go out (the right interface). Remove the (now) extra scope recovery in the IPv6 fast-fwd code. Sponsored by: Netflix Reviewed by: melifaro, ae MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23872	2020-03-03 09:45:16 +00:00
Bjoern A. Zeeb	f1db666a61	mld6: initialize oifp to avoid bogus results/panics in edge cases In certain cases (probably not during normal operation but observed in the lab during development) ip6_ouput() could return without error and ifpp (&oifp) not updated. Given oifp was never initialized we would take the later branch as oifp was not NULL, and when calling icmp6_ifstat_inc() we would panic dereferencing a garbage pointer. For code stability initialize oifp to NULL before first use to always have a deterministic value and not rely on a called function to behave and always and for ever do the work for us as we hope for. MFC after: 3 days Sponsored by: Netflix	2020-02-28 11:16:41 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Bjoern A. Zeeb	3db6053160	ip6_output: fix regression introduced in r358167 for ipv6 fragmentation When moving the calculations for the optlen into the if (opt) block which deals with possible extension headers I failed to initialise unfragpartlen to the ipv6 header length if there were no extension headers present. Correct that mistake to make IPv6 fragment length calculcations work again. Reported by: hselasky, kp OKed by: hselasky, kp MFC after: 3 days X-MFC with: r358167 PR: 244393	2020-02-25 15:03:41 +00:00
Bjoern A. Zeeb	3459050c9a	Fix IPv6 checksums when exthdrs are present. In two places in ip6_output we are doing (delayed) checksum calculations. The initial logic came from SCTP in r205075,205104 and later I copied and adjusted it for the TCP\|UDP case in r235958. The problem was that the original SCTP offsets were already wrong for any case with extension headers present given IPv6 extension headers are not part of the pseudo checksum calculations. The later changes do not help in case there is checksum offloading as for extension headers (incl. fragments) we do currrently never offload as we have no infrastructure to know whether the NIC can handle these cases. Correct the offsets for delayed checksum calculations and properly handle mbuf flags. In addition harmonize the almost identical duplicate code. While here eliminate the now unneeded variable hlen and add an always missing mtod() call in the 1-b and 3 cases after the introduction of the mb_unmapped_to_ext() calls. Reported by: Francis Dupont (fdupont isc.org) PR: 243675 MFC after: 6 days Reviewed by: markj (earlier version), gallatin Differential Revision: https://reviews.freebsd.org/D23760	2020-02-24 19:12:20 +00:00
Pawel Biernacki	295a18d184	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (14 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Approved by: kib (mentor, blanket) Differential Revision: https://reviews.freebsd.org/D23639	2020-02-24 10:47:18 +00:00
Bjoern A. Zeeb	a1a6c01e41	ip6_output: improve extension header handling Move IPv6 source address checks from after extension header heandling to the top of the function. If we do not pass these checks there is no reason to do a lot of work upfront. Fold extension header preparations and length calculations together into a single branch and macro rather than doing them sequentially. Likewise move extension header concatination into a single branch block only doing it if we recorded any extension header length length. Reviewed by: melifaro (earlier version), markj, gallatin Sponsored by: Netflix (partially, originally) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23740	2020-02-20 10:56:12 +00:00
Michael Tuexen	868b51f234	Epochify SCTP.	2020-02-18 21:25:17 +00:00
Bjoern A. Zeeb	7c1daefe2c	ip6_output: update comments. Clear up some comments and improve to panic messages. No functional changes. MFC after: 3 days	2020-02-18 11:28:00 +00:00
Hans Petter Selasky	bacb11c9ed	Fix kernel panic while trying to read multicast stream. When VIMAGE is enabled make sure the "m_pkthdr.rcvif" pointer is set for all mbufs being input by the IGMP/MLD6 code. Else there will be a NULL-pointer dereference in the netisr code when trying to set the VNET based on the incoming mbuf. Add an assert to catch this when queueing mbufs on a netisr to make debugging of similar cases easier. Found by: Vladislav V. Prodan PR: 244002 Reviewed by: bz@ MFC after: 1 week Sponsored by: Mellanox Technologies	2020-02-17 09:46:32 +00:00
Navdeep Parhar	c53c867eb3	Fix NOINET builds.	2020-01-31 02:23:48 +00:00
Gleb Smirnoff	e617b21d2f	Enter network epoch when calling in_pcbconnect() for IPv6 mapped to IPv4 UDP sockets. This is miss from r356983. Reported by: https://syzkaller.appspot.com/bug?id=73c7a2e3f0783f9947459065e5c2f25fe8f82f54	2020-01-22 17:06:55 +00:00
Alexander V. Chernikov	34a5582c47	Bring back redirect route expiration. Redirect (and temporal) route expiration was broken a while ago. This change brings route expiration back, with unified IPv4/IPv6 handling code. It introduces net.inet.icmp.redirtimeout sysctl, allowing to set an expiration time for redirected routes. It defaults to 10 minutes, analogues with net.inet6.icmp6.redirtimeout. Implementation uses separate file, route_temporal.c, as route.c is already bloated with tons of different functions. Internally, expiration is implemented as an per-rnh callout scheduled when route with non-zero rt_expire time is added or rt_expire is changed. It does not add any overhead when no temporal routes are present. Callout traverses entire routing tree under wlock, scheduling expired routes for deletion and calculating the next time it needs to be run. The rationale for such implemention is the following: typically workloads requiring large amount of routes have redirects turned off already, while the systems with small amount of routes will not inhibit large overhead during tree traversal. This changes also fixes netstat -rn display of route expiration time, which has been broken since the conversion from kread() to sysctl. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23075	2020-01-22 13:53:18 +00:00
Gleb Smirnoff	b955545386	Make ip6_output() and ip_output() require network epoch. All callers that before may called into these functions without network epoch now must enter it.	2020-01-22 05:51:22 +00:00
Gleb Smirnoff	bab98355f9	Add some documenting NET_EPOCH_ASSERTs.	2020-01-22 02:37:47 +00:00
Gleb Smirnoff	f6a2a6b163	Unroll macro that is used just once. Not a functional change.	2020-01-22 02:35:39 +00:00
Alexander V. Chernikov	16c2f24169	Document requirements for the 'struct route' variations. MFC after: 2 weeks	2020-01-21 12:00:34 +00:00
Gleb Smirnoff	2a4bd982d0	Introduce NET_EPOCH_CALL() macro and use it everywhere where we free data based on the network epoch. The macro reverses the argument order of epoch_call(9) - first function, then its argument. NFC	2020-01-15 06:05:20 +00:00
Gleb Smirnoff	97168be809	Mechanically substitute assertion of in_epoch(net_epoch_preempt) to NET_EPOCH_ASSERT(). NFC	2020-01-15 05:45:27 +00:00
Michael Tuexen	fe1274ee39	Fix race when accepting TCP connections. When expanding a SYN-cache entry to a socket/inp a two step approach was taken: 1) The local address was filled in, then the inp was added to the hash table. 2) The remote address was filled in and the inp was relocated in the hash table. Before the epoch changes, a write lock was held when this happens and the code looking up entries was holding a corresponding read lock. Since the read lock is gone away after the introduction of the epochs, the half populated inp was found during lookup. This resulted in processing TCP segments in the context of the wrong TCP connection. This patch changes the above procedure in a way that the inp is fully populated before inserted into the hash table. Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@ mailing list and for testing the patch! Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D22971	2020-01-12 17:52:32 +00:00
Bjoern A. Zeeb	c6feea3b89	nd6_rtr: constantly use __func__ for nd6log() Over time one or two hard coded function names did not match the actual function anymore. Consistently use __func__ for nd6log() calls and re-wrap/re-format some messages for consitency. MFC after: 2 weeks	2020-01-12 17:41:09 +00:00

1 2 3 4 5 ...

2033 Commits