freebsd-dev

Author	SHA1	Message	Date
Kristof Provost	a7f20faa07	netinet6: fix panic on kldunload pfsync Commit `d6cd20cc5` ("netinet6: fix ndp proxying") caused us to panic when unloading pfsync: Fatal trap 12: page fault while in kernel mode cpuid = 19; apic id = 38 fault virtual address = 0x20 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80dfe7f4 stack pointer = 0x28:0xfffffe015d4f8ac0 frame pointer = 0x28:0xfffffe015d4f8ae0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 5477 (kldunload) trap number = 12 panic: page fault cpuid = 19 time = 1654023100 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015d4f8880 vpanic() at vpanic+0x17f/frame 0xfffffe015d4f88d0 panic() at panic+0x43/frame 0xfffffe015d4f8930 trap_fatal() at trap_fatal+0x387/frame 0xfffffe015d4f8990 trap_pfault() at trap_pfault+0xab/frame 0xfffffe015d4f89f0 calltrap() at calltrap+0x8/frame 0xfffffe015d4f89f0 --- trap 0xc, rip = 0xffffffff80dfe7f4, rsp = 0xfffffe015d4f8ac0, rbp = 0xfffffe015d4f8ae0 --- in6_purge_proxy_ndp() at in6_purge_proxy_ndp+0x14/frame 0xfffffe015d4f8ae0 if_purgeaddrs() at if_purgeaddrs+0x24/frame 0xfffffe015d4f8b90 if_detach_internal() at if_detach_internal+0x1c2/frame 0xfffffe015d4f8bf0 if_detach() at if_detach+0x71/frame 0xfffffe015d4f8c20 pfsync_clone_destroy() at pfsync_clone_destroy+0x1dd/frame 0xfffffe015d4f8c70 if_clone_destroyif() at if_clone_destroyif+0x239/frame 0xfffffe015d4f8cc0 if_clone_detach() at if_clone_detach+0xc8/frame 0xfffffe015d4f8cf0 vnet_pfsync_uninit() at vnet_pfsync_uninit+0xda/frame 0xfffffe015d4f8d10 vnet_deregister_sysuninit() at vnet_deregister_sysuninit+0x85/frame 0xfffffe015d4f8d40 linker_file_sysuninit() at linker_file_sysuninit+0x147/frame 0xfffffe015d4f8d70 linker_file_unload() at linker_file_unload+0x269/frame 0xfffffe015d4f8db0 kern_kldunload() at kern_kldunload+0x18d/frame 0xfffffe015d4f8e00 amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe015d4f8f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015d4f8f30 --- syscall (444, FreeBSD ELF64, sys_kldunloadf), rip = 0x1601eab28cba, rsp = 0x1601e9c363f8, rbp = 0x1601e9c36c50 --- This happens because ifp->if_afdata[AF_INET6] is NULL. Check for this, just as we already do in a few other places. See also `c139b3c19b` ("arp/nd: Cope with late calls to iflladdr_event"). Reviewed by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D35374	2022-06-01 09:26:15 +02:00
Arseny Smalyuk	d18b4bec98	netinet6: Fix mbuf leak in NDP Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d. It happens because lltable_drop_entry_queue() rely on `la_numheld` counter when dropping NDP entries (lles). It turned out NDP code never increased `la_numheld`, so the actual free never happened. Fix the issue by introducing unified lltable_append_entry_queue(), common for both ARP and NDP code, properly addressing packet queue maintenance. Reviewed By: melifaro Differential Revision: https://reviews.freebsd.org/D35365 MFC after: 2 weeks	2022-05-31 21:06:14 +00:00
KUROSAWA Takahiro	d6cd20cc5c	netinet6: fix ndp proxying We could insert proxy NDP entries by the ndp command, but the host with proxy ndp entries had not responded to Neighbor Solicitations. Change the following points for proxy NDP to work as expected: * join solicited-node multicast addresses for proxy NDP entries in order to receive Neighbor Solicitations. * look up proxy NDP entries not on the routing table but on the link-level address table when receiving Neighbor Solicitations. Reviewed By: melifaro Differential Revision: https://reviews.freebsd.org/D35307 MFC after: 2 weeks	2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro	77001f9b6d	lltable: introduce the llt_post_resolved callback In order to decrease ifdef INET/INET6s in the lltable implementation, introduce the llt_post_resolved callback and implement protocol-dependent code in the protocol-dependent part. Reviewed By: melifaro Differential Revision: https://reviews.freebsd.org/D35322 MFC after: 2 weeks	2022-05-30 10:53:33 +00:00
Dmitry Chagin	31d1b816fe	sysent: Get rid of bogus sys/sysent.h include. Where appropriate hide sysent.h under proper condition. MFC after: 2 weeks	2022-05-28 20:52:17 +03:00
Gleb Smirnoff	6890b58814	sockbuf: improve sbcreatecontrol() o Constify memory pointer. Make length unsigned. o Make it never fail with M_WAITOK and assert that length is sane.	2022-05-17 10:10:42 -07:00
Gleb Smirnoff	b46667c63e	sockbuf: merge two versions of sbcreatecontrol() into one No functional change.	2022-05-17 10:10:42 -07:00
Gleb Smirnoff	808b7d80e0	mbuf: remove PH_vt alias for mbuf packet header persistent shared data Mechanical sed change s/PH_vt\.vt_nrecs/vt_nrecs/g	2022-05-13 13:32:43 -07:00
Kristof Provost	797b94504f	udp6: allow udp_tun_func_t() to indicate it did not eat the packet Implement the same filter feature we implemented for UDP over IPv6 in `742e7210d`. This was missed in that commit. Pointed out by: markj Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-04-22 16:55:23 +02:00
Mark Johnston	5d691ab4f0	mld6: Ensure that mld_domifattach() always succeeds mld_domifattach() does a memory allocation under the global MLD mutex and so can fail, but no error handling prevents a null pointer dereference in this case. The mutex is only needed when updating the global softc list; the allocation and static initialization of the softc does not require this mutex. So, reduce the scope of the mutex and use M_WAITOK for the allocation. PR: 261457 MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34943	2022-04-21 13:23:59 -04:00
John Baldwin	a98bb75f80	netinet6: Use __diagused for variables only used in KASSERT().	2022-04-13 16:08:19 -07:00
Kristof Provost	742e7210d0	udp: allow udp_tun_func_t() to indicate it did not eat the packet Allow udp tunnel functions to indicate they have not taken ownership of the packet, and that normal UDP processing should continue. This is especially useful for scenarios where the kernel has taken ownership of a socket that was originally created by userspace. It allows the tunnel function to pass through certain packets for userspace processing. The primary user of this is if_ovpn, when it receives messages from unknown peers (which might be a new client). Reviewed by: tuexen Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34883	2022-04-12 10:04:59 +02:00
Andrey V. Elsukov	7d98cc096b	Fix ipfw fwd that doesn't work in some cases For IPv4 use dst pointer as destination address in fib4_lookup(). It keeps destination address from IPv4 header and can be changed when PACKET_TAG_IPFORWARD tag was set by packet filter. For IPv6 override destination address with address from dst_sa.sin6_addr, that was set from PACKET_TAG_IPFORWARD tag. Reviewed by: eugen MFC after: 1 week PR: 256828, 261697, 255705 Differential Revision: https://reviews.freebsd.org/D34732	2022-04-11 14:16:43 +03:00
Mark Johnston	990a6d18b0	net: Fix memory leaks in lltable_calc_llheader() error paths Also convert raw epoch_call() calls to lltable_free_entry() calls, no functional change intended. There's no need to asynchronously free the LLEs in that case to begin with, but we might as well use the lltable interfaces consistently. Noticed by code inspection; I believe lltable_calc_llheader() failures do not generally happen in practice. Reviewed by: bz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34832	2022-04-08 11:47:25 -04:00
Mark Johnston	dd91d84486	net: Fix LLE lock leaks Historically, lltable_try_set_entry_addr() would release the LLE lock upon failure. After some refactoring, it no longer does so, but consumers were not adjusted accordingly. Also fix a leak that can occur if lltable_calc_llheader() fails in the ARP code, but I suspect that such a failure can only occur due to a code bug. Reviewed by: bz, melifaro Reported by: pho Fixes: `0b79b007eb` ("[lltable] Restructure nd6 code.") MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34831	2022-04-08 11:46:19 -04:00
John Baldwin	bf73b06771	ip6_mroute: Mark a variable only used in a debug trace as unused.	2022-04-06 16:45:29 -07:00
John Baldwin	8b6ccfb6c7	multicast code: Quiet unused warnings for variables used for KTR traces. For nallow and nblock, move the variables under #ifdef KTR. For return values from functions logged in KTR traces, mark the variables as __unused rather than having to #ifdef the assignment of the function return value.	2022-04-06 16:45:28 -07:00
Warner Losh	c7761ca93e	pim6_input: eliminate write only variable rc Sponsored by: Netflix	2022-04-04 22:30:52 -06:00
Gordon Bergling	c55ecce1c1	netinet6: Fix a typo in a source code comment - s/maping/mapping/ MFC after: 3 days	2022-03-28 19:32:10 +02:00
Andrew Gallatin	9ba117960e	Fix a memory leak when ip_output_send() returns EAGAIN due to send tag issues When ip_output_send() returns EAGAIN due to issues with send tags (route change, lagg failover, etc), it must free the mbuf. This is because ip_output_send() was written as a wrapper/replacement for a direct call to if_output(), and the contract with if_output() has historically been that it owns the mbufs once called. When ip_output_send() failed to free mbufs, it violated this assumption and lead to leaked mbufs. This was noticed when using NIC TLS in combination with hardware rate-limited connections. When seeing lots of NIC output drops triggered ratelimit send tag changes, we noticed we were leaking ktls_sessions, send tags and mbufs. This was due ip_output_send() leaking mbufs which held references to ktls_sessions, which in turn held references to send tags. Many thanks to jbh, rrs, hselasky and markj for their help in debugging this. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34054 Reviewed by: hselasky, jhb, rrs MFC after: 2 weeks	2022-01-27 10:34:34 -05:00
Thomas Steen Rasmussen	bc6abdd97e	nd6: use CARP link level address in SLLAO for NS sent out When sending an NS, check if we are using a IPv6 CARP address and if we do, then put proper CARP link level address into ND_OPT_SOURCE_LINKADDR option and also put PACKET_TAG_CARP tag on the packet. The latter will enforce CARP link level address at the data link layer too, which might be necessary for broken implementations. The code really follows what NA sending code has been doing since introduction of carp(4). While here, bring to style(9) the whole block of code. PR: 193280 Differential revision: https://reviews.freebsd.org/D33858	2022-01-24 21:02:47 -08:00
Gleb Smirnoff	644ca0846d	domains: make domain_init() initialize only global state Now that each module handles its global and VNET initialization itself, there is no VNET related stuff left to do in domain_init(). Differential revision: https://reviews.freebsd.org/D33541	2022-01-03 10:15:22 -08:00
Gleb Smirnoff	89128ff3e4	protocols: init with standard SYSINIT(9) or VNET_SYSINIT The historical BSD network stack loop that rolls over domains and over protocols has no advantages over more modern SYSINIT(9). While doing the sweep, split global and per-VNET initializers. Getting rid of pr_init allows to achieve several things: o Get rid of ifdef's that protect against double foo_init() when both INET and INET6 are compiled in. o Isolate initializers statically to the module they init. o Makes code easier to understand and maintain. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33537	2022-01-03 10:15:21 -08:00
Michael Tuexen	657fcf5807	udp6: remove assignments not being used MFC after: 3 days Sponsored by: Netflix, Inc.	2022-01-01 19:25:47 +01:00
Michael Tuexen	2de2ae331b	sctp: improve sctp_pathmtu_adjustment() Allow the resending of DATA chunks to be controlled by the caller, which allows retiring sctp_mtu_size_reset() in a separate commit. Also improve the computaion of the overhead and use 32-bit integers consistently. Thanks to Timo Voelker for pointing me to the code. MFC after: 3 days	2021-12-30 15:16:05 +01:00
Alexander V. Chernikov	ff3a85d324	[lltable] Add per-family lltable getters. Introduce a new function, lltable_get(), to retrieve lltable pointer for the specified interface and family. Use it to avoid all-iftable list traversal when adding or deleting ARP/ND records. Differential Revision: https://reviews.freebsd.org/D33660 MFC after: 2 weeks	2021-12-29 20:57:15 +00:00
Gleb Smirnoff	a057769205	in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address The intent is to provide more entropy than can be provided by just the 32-bits of the IPv6 address which overlaps with 6to4 tunnels. This is needed to mitigate potential algorithmic complexity attacks from attackers who can control large numbers of IPv6 addresses. Together with: gallatin Reviewed by: dwmalone, rscheff Differential revision: https://reviews.freebsd.org/D33254	2021-12-26 10:47:28 -08:00
Gleb Smirnoff	eb8dcdeac2	jail: network epoch protection for IP address lists Now struct prison has two pointers (IPv4 and IPv6) of struct prison_ip type. Each points into epoch context, address count and variable size array of addresses. These structures are freed with network epoch deferred free and are not edited in place, instead a new structure is allocated and set. While here, the change also generalizes a lot (but not enough) of IPv4 and IPv6 processing. E.g. address family agnostic helpers for kern_jail_set() are provided, that reduce v4-v6 copy-paste. The fast-path prison_check_ip[46]_locked() is also generalized into prison_ip_check() that can be executed with network epoch protection only. Reviewed by: jamie Differential revision: https://reviews.freebsd.org/D33339	2021-12-26 10:45:50 -08:00
Mateusz Guzik	71a1539e37	inet6: fix a LOR between rip and rawinp Running sys/netpfil/pf/fragmentation v6 results in: lock order reversal: 1st 0xfffffe00050429a8 rip (rip, sleep mutex) @ /usr/src/sys/netinet6/raw_ip6.c:803 2nd 0xfffff8009491e1d0 rawinp (rawinp, rw) @ /usr/src/sys/netinet6/raw_ip6.c:804 lock order rawinp -> rip established at: 0xffffffff8068e26a at witness_lock_order_add+0x28a 0xffffffff8068d087 at witness_checkorder+0x627 0xffffffff805a9f05 at __mtx_lock_flags+0x205 0xffffffff808102e4 at in_pcballoc+0x204 0xffffffff808d53c6 at rip6_attach+0x116 0xffffffff806dc4e8 at socreate+0x368 0xffffffff806eaedc at kern_socket+0xfc 0xffffffff806eadcd at sys_socket+0x2d 0xffffffff80abc774 at syscallenter+0x5c4 0xffffffff80abbeeb at amd64_syscall+0x1b 0xffffffff80a8044b at fast_syscall_common+0xf8 lock order rip -> rawinp attempted at: 0xffffffff8068dc2a at witness_checkorder+0x11ca 0xffffffff805d1b7f at _rw_wlock_cookie+0x18f 0xffffffff808d596c at rip6_connect+0x19c 0xffffffff806e0842 at soconnectat+0x142 0xffffffff806ebe36 at kern_connectat+0x136 0xffffffff806ebcdf at sys_connect+0x4f 0xffffffff80abc774 at syscallenter+0x5c4 0xffffffff80abbeeb at amd64_syscall+0x1b 0xffffffff80a8044b at fast_syscall_common+0xf8 Reviewed by: glebius Fixes: `de2d47842e` ("SMR protection for inpcbs") Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33508	2021-12-19 14:43:04 +00:00
Kristof Provost	9f5432d5e5	netinet6: ip6_setpktopt() requires NET_EPOCH ip6_setpktopt() can call ifnet_byindex() which requires epoch. Mark the function as requiring NET_EPOCH, and ensure we enter it priot to calling it. Reported-by: syzbot+92526116441688fea8a3@syzkaller.appspotmail.com Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33462	2021-12-17 17:30:36 +01:00
Gleb Smirnoff	185e659c40	inpcb: use locked variant of prison_check_ip*() The pcb lookup always happens in the network epoch and in SMR section. We can't block on a mutex due to the latter. Right now this patch opens up a race. But soon that will be addressed by D33339. Reviewed by: markj, jamie Differential revision: https://reviews.freebsd.org/D33340 Fixes: `de2d47842e`	2021-12-14 09:38:52 -08:00
Gleb Smirnoff	e3044071de	in6p_set_multicast_if(): fix malloc(M_WAITOK) with epoch Fixes: `d74b7baeb0`	2021-12-06 14:33:23 -08:00
Gleb Smirnoff	d74b7baeb0	ifnet_byindex() actually requires network epoch Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment. Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263	2021-12-06 09:32:31 -08:00
Cy Schubert	db0ac6ded6	Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816" This reverts commit `266f97b5e9`, reversing changes made to `a10253cffe`. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.	2021-12-02 14:45:04 -08:00
Cy Schubert	266f97b5e9	wpa: Import wpa_supplicant/hostapd commit 14ab4a816 This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month	2021-12-02 13:35:14 -08:00
Gleb Smirnoff	de2d47842e	SMR protection for inpcbs With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addresses, etc). However, inpcb aren't static in nature, they are created and destroyed all the time, which creates some traffic on the epoch(9) garbage collector. Fairly new feature of uma(9) - Safe Memory Reclamation allows to safely free memory in page-sized batches, with virtually zero overhead compared to uma_zfree(). However, unlike epoch(9), it puts stricter requirement on the access to the protected memory, needing the critical(9) section to access it. Details: - The database is already build on CK lists, thanks to epoch(9). - For write access nothing is changed. - For a lookup in the database SMR section is now required. Once the desired inpcb is found we need to transition from SMR section to r/w lock on the inpcb itself, with a check that inpcb isn't yet freed. This requires some compexity, since SMR section itself is a critical(9) section. The complexity is hidden from KPI users in inp_smr_lock(). - For a inpcb list traversal (a pcblist sysctl, or broadcast notification) also a new KPI is provided, that hides internals of the database - inp_next(struct inp_iterator *). Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33022	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	565655f4e3	inpcb: reduce some aliased functions after removal of PCBGROUP. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33021	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	93c67567e0	Remove "options PCBGROUP" With upcoming changes to the inpcb synchronisation it is going to be broken. Even its current status after the move of PCB synchronization to the network epoch is very questionable. This experimental feature was sponsored by Juniper but ended never to be used in Juniper and doesn't exist in their source tree [sjg@, stevek@, jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix [gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@]. I'm up to resurrecting it back if there is any interest from anybody. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33020	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	1cec1c5831	Allow to compile RSS without PCBGROUP. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33019	2021-12-02 10:48:48 -08:00
Gordon Bergling	3cf59750eb	netinet6: Fix a typo in a sysctl description - remove a double 'a' MFC after: 3 days	2021-11-30 07:24:44 +01:00
Mark Johnston	44775b163b	netinet: Remove unneeded mb_unmapped_to_ext() calls in_cksum_skip() now handles unmapped mbufs on platforms where they're permitted. Reviewed by: glebius, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33097	2021-11-24 13:31:16 -05:00
Kristof Provost	19dc644511	if_stf: add 6rd support Implement IPv6 Rapid Deployment (RFC5969) on top of the existing 6to4 (RFC3056) if_stf code. PR: 253328 Reviewed by: hrs Obtained from: pfSense Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33037	2021-11-20 19:29:01 +01:00
Gleb Smirnoff	3850d1837b	in6_rmx: remove unnecessary TCP includes	2021-11-18 00:54:29 -08:00
Gleb Smirnoff	1817be481b	Add net.inet6.ip6.source_address_validation Drop packets arriving from the network that have our source IPv6 address. If maliciously crafted they can create evil effects like an RST exchange between two of our listening TCP ports. Such packets just can't be legitimate. Enable the tunable by default. Long time due for a modern Internet host. Reviewed by: melifaro, donner, kp Differential revision: https://reviews.freebsd.org/D32915	2021-11-12 09:01:40 -08:00
Gleb Smirnoff	9c89392f12	Add in_localip_fib(), in6_localip_fib(). Check if given address/FIB exists locally. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D32913	2021-11-12 08:59:42 -08:00
Gleb Smirnoff	3ea9a7cf7b	blackhole(4): disable for locally originated TCP/UDP packets In most cases blackholing for locally originated packets is undesired, leads to different kind of lags and delays. Provide sysctls to enforce it, e.g. for debugging purposes. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32718	2021-11-03 13:02:44 -07:00
Roy Marples	5c5340108e	net: Allow binding of unspecified address without address existance Previously in_pcbbind_setup returned EADDRNOTAVAIL for empty V_in_ifaddrhead (i.e., no IPv4 addresses configured) and in6_pcbbind did the same for empty V_in6_ifaddrhead (no IPv6 addresses). An equivalent test has existed since 4.4-Lite. It was presumably done to avoid extra work (assuming the address isn't going to be found later). In normal system operation *_ifaddrhead will not be empty: they will at least have the loopback address(es). In practice no work will be avoided. Further, this case caused net/dhcpd to fail when run early in boot before assignment of any addresses. It should be possible to bind the unspecified address even if no addresses have been configured yet, so just remove the tests. The now-removed "XXX broken" comments were added in `59562606b9`, which converted the ifaddr lists to TAILQs. As far as I (emaste) can tell the brokenness is the issue described above, not some aspect of the TAILQ conversion. PR: 253166 Reviewed by: ae, bz, donner, emaste, glebius MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D32563	2021-10-20 19:25:51 -04:00
Gleb Smirnoff	0f617ae48a	Add in_pcb_var.h for KPIs that are private to in_pcb.c and in6_pcb.c.	2021-10-18 10:19:57 -07:00
Gleb Smirnoff	147f018a72	Move in6_pcbsetport() to in6_pcb.c This function was originally carved out of in6_pcbbind(), which is in in6_pcb.c. This function also uses KPI private to the PCB database - in_pcb_lport().	2021-10-18 10:19:03 -07:00
Mark Johnston	2d5c48eccd	sctp: Tighten up locking around sctp_aloc_assoc() All callers of sctp_aloc_assoc() mark the PCB as connected after a successful call (for one-to-one-style sockets). In all cases this is done without the PCB lock, so the PCB's flags can be corrupted. We also do not atomically check whether a one-to-one-style socket is a listening socket, which violates various assumptions in solisten_proto(). We need to hold the PCB lock across all of sctp_aloc_assoc() to fix this. In order to do that without introducing lock order reversals, we have to hold the global info lock as well. So: - Convert sctp_aloc_assoc() so that the inp and info locks are consistently held. It returns with the association lock held, as before. - Fix an apparent bug where we failed to remove an association from a global hash if sctp_add_remote_addr() fails. - sctp_select_a_tag() is called when initializing an association, and it acquires the global info lock. To avoid lock recursion, push locking into its callers. - Introduce sctp_aloc_assoc_connected(), which atomically checks for a listening socket and sets SCTP_PCB_FLAGS_CONNECTED. There is still one edge case in sctp_process_cookie_new() where we do not update PCB/socket state correctly. Reviewed by: tuexen MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31908	2021-09-11 10:15:21 -04:00
Mark Johnston	353783964c	ip6mrouter: Make the expiration callout MPSAFE - Protect the `expire_upcalls` callout with the MFC6 mutex. The callout handler needs this mutex anyway. - Convert the MROUTER6 mutex to a sleepable sx lock. It is only used when configuring the global v6 multicast routing socket, so is only used in system call paths where sleeping is safe. This lets us drain the callout without having to drop the lock. - For all locking macros in the file, convert to using a _LOCKPTR macro. Reported by: mav MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31836	2021-09-07 11:19:29 -04:00
Mark Johnston	9a94097cd0	nd6: Make the DAD callout MPSAFE Interface addresses with pending duplicate address detection (DAD) live in a global queue. In this case, a callout is associated with each entry. The callout transmits neighbour solicitations until the system decides the address is no longer tentative, or until a duplicate address is discovered. At this point the entry is dequeued and freed. DAD may be manually stopped as well. The callout currently runs (and potentially transmits packets) with Giant held. Reorganize DAD queue locking to interlock properly with the callout: - Configure the callout to acquire the DAD queue lock before running. The lock is dropped before transmitting any packets. Stop protecting the callout with Giant. - When looking up DAD queue entries for an incoming NS or NA, don't bother fiddling with the DAD queue entry reference count. - Split nd6_dad_starttimer() so that the caller is responsible to transmitting a NS if it so desires. - Remove the DAD entry from the queue before stopping the timer. Use a temporary reference to make sure that the entry doesn't get freed by the callout while we're draining. Reported by: mav Reviewed by: bz, hrs Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D31826	2021-09-07 11:19:29 -04:00
Gordon Bergling	10e0082fff	inet6(4): Fix a few common typos in source code comments - s/reshedule/reschedule/ MFC after: 3 days	2021-08-28 18:53:59 +02:00
Hiroki Sato	9823a0c0ac	inet6(4): add a missing IPPROTO_ETHERIP entry bridge(4) + gif(4) did not work when the outer protocol was IPv6. Submitted by: Masahiro Kozuka PR: 256820 MFC after: 3 days	2021-08-27 17:14:35 +09:00
Alexander V. Chernikov	f8c1b1a929	lltable: fix crash introduced in `c541bd368f`. Reported by: cy MFC after: 2 weeks	2021-08-22 08:49:18 +00:00
Alexander V. Chernikov	c541bd368f	lltable: Add support for "child" LLEs holding encap for IPv4oIPv6 entries. Currently we use pre-calculated headers inside LLE entries as prepend data for `if_output` functions. Using these headers allows saving some CPU cycles/memory accesses on the fast path. However, this approach makes adding L2 header for IPv4 traffic with IPv6 nexthops more complex, as it is not possible to store multiple pre-calculated headers inside lle. Additionally, the solution space is limited by the fact that PCB caching saves LLEs in addition to the nexthop. Thus, add support for creating special "child" LLEs for the purpose of holding custom family encaps and store mbufs pending resolution. To simplify handling of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE. Such LLEs are not visible when iterating LLE table. Their lifecycle is bound to the "parent" LLE - it is not possible to delete "child" when parent is alive. Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state machine used by the standard LLEs. nd6_lookup() and nd6_resolve() now accepts an additional argument, family, allowing to return such child LLEs. This change uses `LLE_SF()` macro which packs family and flags in a single int field. This is done to simplify merging back to stable/. Once this code lands, most of the cases will be converted to use a dedicated `family` parameter. Differential Revision: https://reviews.freebsd.org/D31379 MFC after: 2 weeks	2021-08-21 17:34:35 +00:00
Mateusz Guzik	8afe9481cf	frag6: do less work in frag6_slowtimo if possible frag6_slowtimo avoidably uses CPU on otherwise idle boxes Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31528	2021-08-14 18:51:00 +02:00
Mateusz Guzik	c17ae18080	frag6: drop the volatile keyword from frag6_nfrags and mark with __exclusive_cache_line The keyword adds nothing as all operations on the var are performed through atomic_* Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31528	2021-08-14 18:50:29 +02:00
Mark Johnston	663428ea17	nd6: Mark several callouts as MPSAFE The use of Giant here is vestigal and does not provide any useful synchronization. Furthermore, non-MPSAFE callouts can cause the softclock threads to block waiting for long-running newbus operations to complete. Reported by: mav Reviewed by: bz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31470	2021-08-09 13:27:52 -04:00
Mark Johnston	8ee0826f75	in6: Enter the net epoch in in6_tmpaddrtimer() We need to do so to safely traverse the ifnet list. Reviewed by: bz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31470	2021-08-09 13:27:52 -04:00
Michael Tuexen	a8d54fc903	ipv6: Fix getsockopt() for some IPPROTO_IPV6 level socket options Fix getsockopt() for the IPPROTO_IPV6 level socket options with the following names: IPV6_HOPOPTS, IPV6_RTHDR, IPV6_RTHDRDSTOPTS, IPV6_DSTOPTS, and IPV6_NEXTHOP. Reviewed by: markj MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D31458	2021-08-09 09:29:13 +02:00
Gordon Bergling	04389c855e	Fix some common typos in comments - s/configuraiton/configuration/ - s/specifed/specified/ - s/compatiblity/compatibility/ MFC after: 5 days	2021-08-08 10:16:06 +02:00
Michael Tuexen	784692c740	sctp: improve handling of IPv4 addresses on IPV6 sockets Reported by: syzbot+08fe66e4bfc2777cba95@syzkaller.appspotmail.com MFC after: 3 days	2021-08-07 17:27:56 +02:00
Alexander V. Chernikov	0b79b007eb	[lltable] Restructure nd6 code. Factor out lltable locking logic from lltable_try_set_entry_addr() into a separate lltable_acquire_wlock(), so the latter can be used in other parts of the code w/o duplication. Create nd6_try_set_entry_addr() to avoid code duplication in nd6.c and nd6_nbr.c. Move lle creation logic from nd6_resolve_slow() into a separate nd6_get_llentry() to simplify the former. These changes serve as a pre-requisite for implementing RFC8950 (IPv4 prefixes with IPv6 nexthops). Differential Revision: https://reviews.freebsd.org/D31432 MFC after: 2 weeks	2021-08-07 09:59:11 +00:00
Andrey V. Elsukov	d477a7feed	Fix panic in IPv6 multicast code. Add check that ifp supports IPv6 multicasts in in6_getmulti. This fixes panic when user application tries to join into multicast group on an interface that doesn't support IPv6 multicasts, like IFT_PFLOG interfaces. PR: 257302 Reviewed by: melifaro MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D31420	2021-08-06 12:57:59 +03:00
Alexander V. Chernikov	8482aa7748	Use lltable calculated header when sending lle holdchain after successful lle resolution. Subscribers: imp, ae, bz Differential Revision: https://reviews.freebsd.org/D31391	2021-08-05 20:44:36 +00:00
Alexander V. Chernikov	f3a3b06121	[lltable] Unify datapath feedback mechamism. Use newly-create llentry_request_feedback(), llentry_mark_used() and llentry_get_hittime() to request datapatch usage check and fetch the results in the same fashion both in IPv4 and IPv6. While here, simplify llentry_provide_feedback() wrapper by eliminating 1 condition check. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D31390	2021-08-04 22:52:43 +00:00
Roy Marples	7045b1603b	socket: Implement SO_RERROR SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports. Reviewed by: philip (network), kbowling (transport), gbe (manpages) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26652	2021-07-28 09:35:09 -07:00
Michael Tuexen	105b68b42d	sctp: Fix errno in case of association setup failures Do not report always ETIMEDOUT, but only when appropriate. In other cases report ECONNABORTED. MFC after: 3 days	2021-07-09 23:19:25 +02:00
Michael Tuexen	c7f048ab35	sctp: initialize sequence numbers for ECN correctly MFC after: 3 days Reported by: Junseok Yang (for the userland stack)	2021-06-27 20:14:48 +02:00
Ryan Stone	2290dfb40f	Enter the net epoch before calling ip6_setpktopts ip6_setpktopts() can look up ifnets via ifnet_by_index(), which is only safe in the net epoch. Ensure that callers are in the net epoch before calling this function. Sponsored by: Dell EMC Isilon MFC after: 4 weeks Reviewed by: donner, kp Differential Revision: https://reviews.freebsd.org/D30630	2021-06-04 13:18:11 -04:00
Mark Johnston	d8acd2681b	Fix mbuf leaks in various pru_send implementations The various protocol implementations are not very consistent about freeing mbufs in error paths. In general, all protocols must free both "m" and "control" upon an error, except if PRUS_NOTREADY is specified (this is only implemented by TCP and unix(4) and requires further work not handled in this diff), in which case "control" still must be freed. This diff plugs various leaks in the pru_send implementations. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30151	2021-05-12 13:00:09 -04:00
Mark Johnston	c1dd4d642f	nd6: Avoid using an uninitialized sockaddr in nd6_prefix_offlink() Commit `81728a538` ("Split rtinit() into multiple functions.") removed the initialization of sa6, but not one of its uses. This meant that we were passing an uninitialized sockaddr as the address to lltable_prefix_free(). Remove the variable outright to fix the problem. The caller is expected to hold a reference on pr. Fixes: `81728a538` ("Split rtinit() into multiple functions.") Reported by: KMSAN Reviewed by: donner MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30166	2021-05-12 12:52:06 -04:00
Kristof Provost	2ef5d803e3	in6_mcast: Return EADDRINUSE when we've already joined the group Distinguish between truly invalid requests and those that fail because we've already joined the group. Both cases fail, but differentiating them allows userspace to make more informed decisions about what the error means. For example. radvd tries to join the all-routers group on every SIGHUP. This fails, because it's already joined it, but this failure should be ignored (rather than treated as a sign that the interface's multicast is broken). This puts us in line with OpenBSD, NetBSD and Linux. Reviewed by: donner MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D30111	2021-05-10 09:48:51 +02:00
Mark Johnston	f161d294b9	Add missing sockaddr length and family validation to various protocols Several protocol methods take a sockaddr as input. In some cases the sockaddr lengths were not being validated, or were validated after some out-of-bounds accesses could occur. Add requisite checking to various protocol entry points, and convert some existing checks to assertions where appropriate. Reported by: syzkaller+KASAN Reviewed by: tuexen, melifaro MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29519	2021-05-03 13:35:19 -04:00
Mark Johnston	8e8f1cc9bb	Re-enable network ioctls in capability mode This reverts a portion of `274579831b` ("capsicum: Limit socket operations in capability mode") as at least rtsol and dhcpcd rely on being able to configure network interfaces while in capability mode. Reported by: bapt, Greg V Sponsored by: The FreeBSD Foundation	2021-04-23 09:22:49 -04:00
Gleb Smirnoff	1db08fbe3f	tcp_input: always request read-locking of PCB for any pure SYN segment. This is further rework of `08d9c92027`. Now we carry the knowledge of lock type all the way through tcp_input() and also into tcp_twcheck(). Ideally the rlocking for pure SYNs should propagate all the way into the alternative TCP stacks, but not yet today. This should close a race when socket is bind(2)-ed but not yet listen(2)-ed and a SYN-packet arrives racing with listen(2), discovered recently by pho@.	2021-04-20 10:02:20 -07:00
Michael Tuexen	9e644c2300	tcp: add support for TCP over UDP Adding support for TCP over UDP allows communication with TCP stacks which can be implemented in userspace without requiring special priviledges or specific support by the OS. This is joint work with rrs. Reviewed by: rrs Sponsored by: Netflix, Inc. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29469	2021-04-18 16:16:42 +02:00
Gleb Smirnoff	08d9c92027	tcp_input/syncache: acquire only read lock on PCB for SYN,!ACK packets When packet is a SYN packet, we don't need to modify any existing PCB. Normally SYN arrives on a listening socket, we either create a syncache entry or generate syncookie, but we don't modify anything with the listening socket or associated PCB. Thus create a new PCB lookup mode - rlock if listening. This removes the primary contention point under SYN flood - the listening socket PCB. Sidenote: when SYN arrives on a synchronized connection, we still don't need write access to PCB to send a challenge ACK or just to drop. There is only one exclusion - tcptw recycling. However, existing entanglement of tcp_input + stacks doesn't allow to make this change small. Consider this patch as first approach to the problem. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D29576	2021-04-12 08:25:31 -07:00
Mark Johnston	274579831b	capsicum: Limit socket operations in capability mode Capsicum did not prevent certain privileged networking operations, specifically creation of raw sockets and network configuration ioctls. However, these facilities can be used to circumvent some of the restrictions that capability mode is supposed to enforce. Add capability mode checks to disallow network configuration ioctls and creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET internet sockets. Reviewed by: oshogbo Discussed with: emaste Reported by: manu Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29423	2021-04-07 14:32:56 -04:00
Kyle Evans	f187d6dfbf	base: remove if_wg(4) and associated utilities, manpage After length decisions, we've decided that the if_wg(4) driver and related work is not yet ready to live in the tree. This driver has larger security implications than many, and thus will be held to more scrutiny than other drivers. Please also see the related message sent to the freebsd-hackers@ and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on 2021/03/16, with the subject line "Removing WireGuard Support From Base" for additional context.	2021-03-17 09:14:48 -05:00
Kyle Evans	74ae3f3e33	if_wg: import latest fixup work from the wireguard-freebsd project This is the culmination of about a week of work from three developers to fix a number of functional and security issues. This patch consists of work done by the following folks: - Jason A. Donenfeld <Jason@zx2c4.com> - Matt Dunwoodie <ncon@noconroy.net> - Kyle Evans <kevans@FreeBSD.org> Notable changes include: - Packets are now correctly staged for processing once the handshake has completed, resulting in less packet loss in the interim. - Various race conditions have been resolved, particularly w.r.t. socket and packet lifetime (panics) - Various tests have been added to assure correct functionality and tooling conformance - Many security issues have been addressed - if_wg now maintains jail-friendly semantics: sockets are created in the interface's home vnet so that it can act as the sole network connection for a jail - if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0 - if_wg now exports via ioctl a format that is future proof and complete. It is additionally supported by the upstream wireguard-tools (which we plan to merge in to base soon) - if_wg now conforms to the WireGuard protocol and is more closely aligned with security auditing guidelines Note that the driver has been rebased away from using iflib. iflib poses a number of challenges for a cloned device trying to operate in a vnet that are non-trivial to solve and adds complexity to the implementation for little gain. The crypto implementation that was previously added to the tree was a super complex integration of what previously appeared in an old out of tree Linux module, which has been reduced to crypto.c containing simple boring reference implementations. This is part of a near-to-mid term goal to work with FreeBSD kernel crypto folks and take advantage of or improve accelerated crypto already offered elsewhere. There's additional test suite effort underway out-of-tree taking advantage of the aforementioned jail-friendly semantics to test a number of real-world topologies, based on netns.sh. Also note that this is still a work in progress; work going further will be much smaller in nature. MFC after: 1 month (maybe)	2021-03-14 23:52:04 -05:00
Alexander V. Chernikov	b1d63265ac	Flush remaining routes from the routing table during VNET shutdown. Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET. PR: 253998 Reported by: rashey at superbox.pl MFC after: 3 days Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown). Thus, any route table operations are too late to schedule. As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`. It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish. Test Plan: ``` set_skip:set_skip_group_lo -> passed [0.053s] tail -n 200 /var/log/messages \| grep rtentry ``` Reviewers: #network, kp, bz Reviewed By: kp Subscribers: imp, ae Differential Revision: https://reviews.freebsd.org/D29116	2021-03-10 21:10:14 +00:00
Alexander V. Chernikov	7634919e15	Fix 'in6_purgeaddr: err=65, destination address delete failed' message. P2P ifa may require 2 routes: one is the loopback route, another is the "prefix" route towards its destination. Current code marks loopback routes existence with IFA_RTSELF and "prefix" p2p routes with IFA_ROUTE. For historic reasons, we fill in ifa_dstaddr for loopback interfaces. To avoid installing the same route twice, we preemptively set IFA_RTSELF when adding "prefix" route for loopback. However, the teardown part doesn't have this hack, so we try to remove the same route twice. Fix this by checking if ifa_dstaddr is different from the ifa_addr and moving this logic into a separate function. Reviewed By: kp Differential Revision: https://reviews.freebsd.org/D29121 MFC after: 3 days	2021-03-08 21:28:35 +00:00
Alexander V. Chernikov	d5be41beb7	Fix dpdk/ldradix fib lookup algorithm preference calculation. The current preference number were copied from IPv4 code, assuming 500k routes to be the full-view. Adjust with the current reality (100k full-view). Reported by: Marek Zarychta <zarychtam at plan-b.pwste.edu.pl> MFC after: 3 days	2021-03-07 22:17:53 +00:00
Kristof Provost	bb4a7d94b9	net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macros Introduce convenience macros to retrieve the DSCP, ECN or traffic class bits from an IPv6 header. Use them where appropriate. Reviewed by: ae (previous version), rscheff, tuexen, rgrimes MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29056	2021-03-04 20:56:48 +01:00
Alexander V. Chernikov	cc3fa1e29f	Fix crash with rtadv-originated multipath IPv6 routes. PR: 253800 Reported by: Frederic Denis <freebsdml at hecian.net> MFC after: immediately	2021-02-24 16:44:10 +00:00
Alexander V. Chernikov	9c4a8d24f0	Fix nd6 rib_action() handling. rib_action() guarantees valid rc filling IFF it returns without error. Check rib_action() return code instead of checking rc fields. PR: 253800 Reported by: Frederic Denis <freebsdml@hecian.net> MFC after: immediately	2021-02-23 22:40:01 +00:00
Kristof Provost	c139b3c19b	arp/nd: Cope with late calls to iflladdr_event When tearing down vnet jails we can move an if_bridge out (as part of the normal vnet_if_return()). This can, when it's clearing out its list of member interfaces, change its link layer address. That sends an iflladdr_event, but at that point we've already freed the AF_INET/AF_INET6 if_afdata pointers. In other words: when the iflladdr_event callbacks fire we can't assume that ifp->if_afdata[AF_INET] will be set. Reviewed by: donner@, melifaro@ MFC after: 1 week Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D28860	2021-02-23 13:54:07 +01:00
Alexander V. Chernikov	8268d82cff	Remove per-packet ifa refcounting from IPv6 fast path. Currently ip6_input() calls in6ifa_ifwithaddr() for every local packet, in order to check if the target ip belongs to the local ifa in proper state and increase its counters. in6ifa_ifwithaddr() references found ifa. With epoch changes, both `ip6_input()` and all other current callers of `in6ifa_ifwithaddr()` do not need this reference anymore, as epoch provides stability guarantee. Given that, update `in6ifa_ifwithaddr()` to allow it to return ifa without referencing it, while preserving option for getting referenced ifa if so desired. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28648	2021-02-15 22:33:12 +00:00
Alexander V. Chernikov	605284b894	Enforce net epoch in in6_selectsrc(). in6_selectsrc() may call fib6_lookup() in some cases, which requires epoch. Wrap in6_selectsrc* calls into epoch inside its users. Mark it as requiring epoch by adding NET_EPOCH_ASSERT(). MFC after: 1 weeek Differential Revision: https://reviews.freebsd.org/D28647	2021-02-15 22:33:12 +00:00
Alexander V. Chernikov	1bd44b11e5	Do not reference returned ifa in in6_ifawithifp(). The only place where in6_ifawithifp() is used is ip6_output(), which uses the returned ifa to bump traffic counters. Given ifa stability guarantees is provided by epoch, do not refcount ifa. This eliminates 2 atomic ops from IPv6 fast path. Reviewed By: rstone MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28649	2021-02-14 10:11:18 +00:00
Andrey V. Elsukov	3c782d9c91	[udp6] fix possible panic due to lack of locking. The lookup for a IPv6 multicast addresses corresponding to the destination address in the datagram is protected by the NET_EPOCH section. Access to each PCB is protected by INP_RLOCK during comparing. But access to socket's so_options field is not protected. And in some cases it is possible, that PCB pointer is still valid, but inp_socket is not. The patch wides lock holding to protect access to inp_socket. It copies locking strategy from IPv4 UDP handling. PR: 232192 Obtained from: Yandex LLC MFC after: 3 days Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D28232	2021-02-11 12:00:25 +03:00
Alexander V. Chernikov	924d1c9a05	Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors." Wrong version of the change was pushed inadvertenly. This reverts commit `4a01b854ca`.	2021-02-08 22:32:32 +00:00
Alexander V. Chernikov	4a01b854ca	SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.	2021-02-08 21:42:20 +00:00
Alexander V. Chernikov	91f2c69ec2	Fix unused-function waring when compiling with FIB_ALGO. MFC after: 3 days	2021-01-30 23:25:56 +00:00
Gleb Smirnoff	3f43ada98c	Catch up with `6edfd179c8`: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG. Originally IFCAP_NOMAP meant that the mbuf has external storage pointer that points to unmapped address. Then, this was extended to array of such pointers. Then, such mbufs were augmented with header/trailer. Basically, extended mbufs are extended, and set of features is subject to change. The new name should be generic enough to avoid further renaming.	2021-01-29 11:46:24 -08:00
Randall Stewart	24a8f6d369	When we are about to send down to the driver layer we need to make sure that the m_nextpkt field is NULL else the lower layers may do unwanted things. Reviewed By: gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D28377	2021-01-27 13:52:44 -05:00
Alexander V. Chernikov	f9e0752e35	Create new in6_purgeifaddr() which purges bound ifa prefix if it gets unused. Currently if_purgeifaddrs() uses in6_purgeaddr() to remove IPv6 ifaddrs. in6_purgeaddr() does not trrigger prefix removal if number of linked ifas goes to 0, as this is a low-level function. As a result, if_purgeifaddrs() purges all IPv4/IPv6 addresses but keeps corresponding IPv6 prefixes. Fix this by creating higher-level wrapper which handles unused prefix usecase and use it in if_purgeifaddrs(). Differential revision: https://reviews.freebsd.org/D28128	2021-01-17 20:32:25 +00:00
Alexander V. Chernikov	81728a538d	Split rtinit() into multiple functions. rtinit[1]() is a function used to add or remove interface address prefix routes, similar to ifa_maintain_loopback_route(). It was intended to be family-agnostic. There is a problem with this approach in reality. 1) IPv6 code does not use it for the ifa routes. There is a separate layer, nd6_prelist_(), providing interface for maintaining interface routes. Its part, responsible for the actual route table interaction, mimics rtenty() code. 2) rtinit tries to combine multiple actions in the same function: constructing proper route attributes and handling iterations over multiple fibs, for the non-zero net.add_addr_allfibs use case. It notably increases the code complexity. 3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag for p2p connections, host routes and p2p routes are handled in the same way. Additionally, mapping IFA flags to RTF flags makes the interface pretty messy. It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface aliases. 4) rtinit() is the last customer passing non-masked prefixes to rib_action(), complicating rib_action() implementation. 5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive" ifa messages in certain corner cases. To address all these points, the following has been done: * rtinit() has been split into multiple functions: - Route attribute construction were moved to the per-address-family functions, dealing with (2), (3) and (4). - funnction providing net.add_addr_allfibs handling and route rtsock notificaions is the new routing table inteface. - rtsock ifa notificaion has been moved out as well. resulting set of funcion are only responsible for the actual route notifications. Side effects: * /32 alias does not result in interface routes (/32 route and "host" route) * RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses Differential revision: https://reviews.freebsd.org/D28186	2021-01-16 22:42:41 +00:00
Alexander V. Chernikov	e58c8da068	Map IPv6 link-local prefix to the link-local ifa. Currently we create link-local route by creating an always-on IPv6 prefix in the prefix list. This prefix is not tied to the link-local ifa. This leads to the following problems: First, when flushing interface addresses we skip on-link route, leaving fe80::/64 prefix on the interface without any IPv6 addresses. Second, when creating and removing link-local alias we lose fe80::/64 prefix from the routing table. Fix this by attaching link-local prefix to the ifa at the initial creation. Reviewed by: hrs MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D28129	2021-01-13 10:03:15 +00:00
Alexander V. Chernikov	2defbe9f0e	Use rn_match instead of doing indirect calls in fib_algo. Relevant inet/inet6 code has the control over deciding what the RIB lookup function currently is. With that in mind, explicitly set it to the current value (rn_match) in the datapath lookups. This avoids cost on indirect call. Differential Revision: https://reviews.freebsd.org/D28066	2021-01-11 23:30:35 +00:00
Alexander V. Chernikov	0da3f8c98d	Bump amount of queued packets in for unresolved ARP/NDP entries to 16. Currently default behaviour is to keep only 1 packet per unresolved entry. Ability to queue more than one packet was added 10 years ago, in r215207, though the default value was kep intact. Things have changed since that time. Systems tend to initiate multiple connections at once for a variety of reasons. For example, recent kern/252278 bug report describe happy-eyeball DNS behaviour sending multiple requests to the DNS server. The primary driver for upper value for the queue length determination is memory consumption. Remote actors should not be able to easily exhaust local memory by sending packets to unresolved arp/ND entries. For now, bump value to 16 packets, to match Darwin implementation. The proper approach would be to switch the limit to calculate memory consumption instead of packet count and limit based on memory. We should MFC this with a variation of D22447. Reviewers: #manpages, #network, bz, emaste Reviewed By: emaste, gbe(doc), jilles(doc) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D28068	2021-01-11 19:51:11 +00:00
Alexander V. Chernikov	d68cf57b7f	Refactor rt_addrmsg() and rt_routemsg(). Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function. Reviewers: #network Subscribers: imp, ae, bz Differential Revision: https://reviews.freebsd.org/D28011	2021-01-07 19:38:19 +00:00
Alexander V. Chernikov	f5baf8bb12	Add modular fib lookup framework. This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookups. Framework takes care of handling initial synchronisation, route subscription, nhop/nhop groups reference and indexing, dataplane attachments and fib instance algorithm setup/teardown. Framework features automatic algorithm selection, allowing for picking the best matching algorithm on-the-fly based on the amount of routes in the routing table. Currently framework code is guarded under FIB_ALGO config option. An idea is to enable it by default in the next couple of weeks. The following algorithms are provided by default: IPv4: * bsearch4 (lockless binary search in a special IP array), tailored for small-fib (<16 routes) * radix4_lockless (lockless immutable radix, re-created on every rtable change), tailored for small-fib (<1000 routes) * radix4 (base system radix backend) * dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) IPv6: * radix6_lockless (lockless immutable radix, re-created on every rtable change), tailed for small-fib (<1000 routes) * radix6 (base system radix backend) * dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) Performance changes: Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604): IPv4: 8 routes: radix4: ~20mpps radix4_lockless: ~24.8mpps bsearch4: ~69mpps dpdk_lpm4: ~67 mpps 700k routes: radix4_lockless: 3.3mpps dpdk_lpm4: 46mpps IPv6: 8 routes: radix6_lockless: ~20mpps dpdk_lpm6: ~70mpps 100k routes: radix6_lockless: 13.9mpps dpdk_lpm6: 57mpps Forwarding benchmarks: + 10-15% IPv4 forwarding performance (small-fib, bsearch4) + 25% IPv4 forwarding performance (full-view, dpdk_lpm4) + 20% IPv6 forwarding performance (full-view, dpdk_lpm6) Control: Framwork adds the following runtime sysctls: List algos * net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4 * net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6 Debug level (7=LOG_DEBUG, per-route) net.route.algo.debug_level: 5 Algo selection (currently only for fib 0): net.route.algo.inet.algo: bsearch4 net.route.algo.inet6.algo: radix6_lockless Support for manually changing algos in non-default fib will be added soon. Some sysctl names will be changed in the near future. Differential Revision: https://reviews.freebsd.org/D27401	2020-12-25 11:33:17 +00:00
Andrew Gallatin	a034518ac8	Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain In order to efficiently serve web traffic on a NUMA machine, one must avoid as many NUMA domain crossings as possible. With SO_REUSEPORT_LB, a number of workers can share a listen socket. However, even if a worker sets affinity to a core or set of cores on a NUMA domain, it will receive connections associated with all NUMA domains in the system. This will lead to cross-domain traffic when the server writes to the socket or calls sendfile(), and memory is allocated on the server's local NUMA node, but transmitted on the NUMA node associated with the TCP connection. Similarly, when the server reads from the socket, he will likely be reading memory allocated on the NUMA domain associated with the TCP connection. This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A server can now tell the kernel to filter traffic so that only incoming connections associated with the desired NUMA domain are given to the server. (Of course, in the case where there are no servers sharing the listen socket on some domain, then as a fallback, traffic will be hashed as normal to all servers sharing the listen socket regardless of domain). This allows a server to deal only with traffic that is local to its NUMA domain, and avoids cross-domain traffic in most cases. This patch, and a corresponding small patch to nginx to use TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted https media content from dual-socket Xeons with only 13% (as measured by pcm.x) cross domain traffic on the memory controller. Reviewed by: jhb, bz (earlier version), bcr (man page) Tested by: gonzo Sponsored by: Netfix Differential Revision: https://reviews.freebsd.org/D21636	2020-12-19 22:04:46 +00:00
Hans Petter Selasky	ac4dd4cd95	Expose nonstandard IPv6 kernel definitions to standalone builds. No functional change. Reviewed by: bz@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-12-04 21:51:47 +00:00
Alexander V. Chernikov	d1d941c5b9	Remove RADIX_MPATH config option. ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244	2020-11-29 19:43:33 +00:00
Alexander V. Chernikov	b712e3e343	Refactor fib4/fib6 functions. No functional changes. * Make lookup path of fib<4\|6>_lookup_debugnet() separate functions (fib<46>_lookup_rt()). These will be used in the control plane code requiring unlocked radix operations and actual prefix pointer. * Make lookup part of fib<4\|6>_check_urpf() separate functions. This change simplifies the switch to alternative lookup implementations, which helps algorithmic lookups introduction. * While here, use static initializers for IPv4/IPv6 keys Differential Revision: https://reviews.freebsd.org/D27405	2020-11-29 13:41:49 +00:00
Bjoern A. Zeeb	dd4d5a5ffb	IPv6: set ifdisabled in the kernel rather than in rc Enable ND6_IFF_IFDISABLED when the interface is created in the kernel before return to user space. This avoids a race when an interface is create by a program which also calls ifconfig IF inet6 -ifdisabled and races with the devd -> /etc/pccard_ether -> .. netif start IF -> ifdisabled calls (the devd/rc framework disabling IPv6 again after the program had enabled it already). In case the global net.inet6.ip6.accept_rtadv was turned on, we also default to enabling IPv6 on the interfaces, rather than disabling them. PR: 248172 Reported by: Gert Doering (gert greenie.muc.de) Reviewed by: glebius (, phk) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D27324	2020-11-25 20:58:01 +00:00
Alexander V. Chernikov	7511a63825	Refactor rib iterator functions. * Make rib_walk() order of arguments consistent with the rest of RIB api * Add rib_walk_ext() allowing to exec callback before/after iteration. * Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del * Rename rt_forach_fib_walk -> rib_foreach_table_walk * Move rib_foreach_table_walk{_del} to route/route_helpers.c * Slightly refactor rib_foreach_table_walk{_del} to make the implementation consistent and prepare for upcoming iterator optimizations. Differential Revision: https://reviews.freebsd.org/D27219	2020-11-22 20:21:10 +00:00
Jonathan T. Looney	440598dd9e	Fix implicit automatic local port selection for IPv6 during connect calls. When a user creates a TCP socket and tries to connect to the socket without explicitly binding the socket to a local address, the connect call implicitly chooses an appropriate local port. When evaluating candidate local ports, the algorithm checks for conflicts with existing ports by doing a lookup in the connection hash table. In this circumstance, both the IPv4 and IPv6 code look for exact matches in the hash table. However, the IPv4 code goes a step further and checks whether the proposed 4-tuple will match wildcard (e.g. TCP "listen") entries. The IPv6 code has no such check. The missing wildcard check can cause problems when connecting to a local server. It is possible that the algorithm will choose the same value for the local port as the foreign port uses. This results in a connection with identical source and destination addresses and ports. Changing the IPv6 code to align with the IPv4 code's behavior fixes this problem. Reviewed by: tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27164	2020-11-14 14:50:34 +00:00
Alexander V. Chernikov	d9999ae9ca	Fix use-after-free in icmp6_notify_error(). Reported by: Maxime Villard <max at m00nbsd.net> Reviewed by: markj MFC after: 3 days	2020-10-28 20:22:20 +00:00
Mark Johnston	4caea9b169	icmp6: Count packets dropped due to an invalid hop limit Pad the icmp6stat structure so that we can add more counters in the future without breaking compatibility again, last done in r358620. Annotate the rarely executed error paths with __predict_false while here. Reviewed by: bz, melifaro Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26578	2020-10-19 17:07:19 +00:00
Alexander V. Chernikov	0c325f53f1	Implement flowid calculation for outbound connections to balance connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523	2020-10-18 17:15:47 +00:00
Richard Scheffenegger	868aabb470	Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow. This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to any priority between 0 and 7. Note that for untagged traffic, explicitly adding a priority will insert a special 801.1Q vlan header with vlan ID = 0 to carry the priority setting Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26409	2020-10-09 12:06:43 +00:00
Alexander V. Chernikov	fedeb08b6a	Introduce scalable route multipath. This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449	2020-10-03 10:47:17 +00:00
Alexander V. Chernikov	2259a03020	Rework part of routing code to reduce difference to D26449. * Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different times. * Start filling route weight in route change notifications * Pass flowid to UDP/raw IP route lookups * Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact that rtentry can contain multiple nexthops. Differential Revision: https://reviews.freebsd.org/D26497	2020-09-21 20:02:26 +00:00
Alexander V. Chernikov	1440f62266	Remove unused nhop_ref_any() function. Remove "opt_mpath.h" header where not needed. No functional changes.	2020-09-20 21:32:52 +00:00
Navdeep Parhar	b092fd6c97	if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS. This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx and rx), TSO, and RSS if the NIC is capable of performing these operations on inner VXLAN traffic. A VXLAN interface inherits the capabilities of its vxlandev interface if one is specified or of the interface that hosts the vxlanlocal address. If other interfaces will carry traffic for that VXLAN then they must have the same hardware capabilities. On transmit, if_vxlan verifies that the outbound interface has the required capabilities and then translates the CSUM_ flags to their inner equivalents. This tells the hardware ifnet that it needs to operate on the inner frame and not the outer VXLAN headers. An event is generated when a VXLAN ifnet starts. This allows hardware drivers to configure their devices to expect VXLAN traffic on the specified incoming port. On receive, the hardware does RSS and checksum verification on the inner frame. if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is not very clear why it didn't do this already. Future work: Rx: it should be possible to avoid the first trip up the protocol stack to get the frame to if_vxlan just so it can decapsulate and requeue for a second trip up the stack. The hardware NIC driver could directly call an if_vxlan receive routine for VXLAN traffic instead. Rx: LRO. depends on what happens with the previous item. There will have to to be a mechanism to indicate that it's time for if_vxlan to flush its LRO state. Reviewed by: kib@ Relnotes: Yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2020-09-18 02:37:57 +00:00
Navdeep Parhar	72cc43df17	Add a knob to allow zero UDP checksums for UDP/IPv6 traffic on the given UDP port. This will be used by some upcoming changes to if_vxlan(4). RFC 7348 (VXLAN) says that the UDP checksum "SHOULD be transmitted as zero. When a packet is received with a UDP checksum of zero, it MUST be accepted for decapsulation." But the original IPv6 RFCs did not allow zero UDP checksum. RFC 6935 attempts to resolve this. Reviewed by: kib@ Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2020-09-18 02:21:15 +00:00
Mateusz Guzik	662c13053f	net: clean up empty lines in .c and .h files	2020-09-01 21:19:14 +00:00
Kyle Evans	1e9b8db9b2	ipv6: quit dropping packets looping back on p2p interfaces To paraphrase the below-referenced PR: This logic originated in the KAME project, and was even controversial when it was enabled there by default in 2001. No such equivalent logic exists in the IPv4 stack, and it turns out that this leads to us dropping valid traffic when the "point to point" interface is actually a 1:many tun interface, e.g. with the wireguard userland stack. Even in the case of true point-to-point links, this logic only avoids transient looping of packets sent by misconfigured applications or attackers, which can be subverted by proper route configuration rather than hardcoded logic in the kernel to drop packets. In the review, melifaro goes on to note that the kernel can't fix it, so it perhaps shouldn't try to be 'smart' about it. Additionally, that TTL will still kick in even with incorrect route configuration. PR: 247718 Reviewed by: melifaro, rgrimes MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25567	2020-08-31 01:45:48 +00:00
Alexander V. Chernikov	a624ca3dff	Move net/route/shared.h definitions to net/route/route_var.h. No functional changes. net/route/shared.h was created in the inital phases of nexthop conversion. It was intended to serve the same purpose as route_var.h - share definitions of functions and structures between the routing subsystem components. At that time route_var.h was included by many files external to the routing subsystem, which largerly defeats its purpose. As currently this is not the case anymore and amount of route_var.h includes is roughly the same as shared.h, retire the latter in favour of the former.	2020-08-28 22:50:20 +00:00
Alexander V. Chernikov	bec053ffe0	Make net.inet6.ip6.deembed_scopeid behaviour default & remove sysctl. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25637	2020-08-15 11:37:44 +00:00
Alexander V. Chernikov	2f23f45b20	Simplify dom_<rtattach\|rtdetach>. Remove unused arguments from dom_rtattach/dom_rtdetach functions and make them return/accept 'struct rib_head' instead of 'void **'. Declare inet/inet6 implementations in the relevant _var.h headers similar to domifattach / domifdetach. Add rib_subscribe_internal() function to accept subscriptions to the rnh directly. Differential Revision: https://reviews.freebsd.org/D26053	2020-08-14 21:29:56 +00:00
Hans Petter Selasky	b453d3d239	Use a static initializer for the multicast free tasks. This makes the SYSINIT() function updated in r364072 superfluous. Suggested by: glebius@ MFC after: 1 week Sponsored by: Mellanox Technologies	2020-08-11 08:31:40 +00:00
Alexander V. Chernikov	9a00f6d067	Fix rib_subscribe() waitok flag by performing allocation outside epoch. Make in6_inithead() use rib_subscribe with waitok to achieve reliable subscription allocation. Reviewed by: glebius	2020-08-11 07:05:30 +00:00
Bjoern A. Zeeb	f9461246a2	MC: add a note with reference to the discussion and history as-to why we are where we are now. The main thing is to try to get rid of the delayed freeing to avoid blocking on the taskq when shutting down vnets. X-Timeout: if you still see this before 14-RELEASE remove it.	2020-08-10 10:58:43 +00:00
Hans Petter Selasky	3689652c65	Make sure the multicast release tasks are properly drained when destroying a VNET or a network interface. Else the inm release tasks, both IPv4 and IPv6 may cause a panic accessing a freed VNET or network interface. Reviewed by: jmg@ Discussed with: bz@ Differential Revision: https://reviews.freebsd.org/D24914 MFC after: 1 week Sponsored by: Mellanox Technologies	2020-08-10 10:46:08 +00:00
Hans Petter Selasky	a95ef9d38d	Use proper prototype for SYSINIT() functions. Mark the unused argument using the __unused macro. Discussed with: kib@ MFC after: 1 week Sponsored by: Mellanox Technologies	2020-08-10 10:40:19 +00:00
Bjoern A. Zeeb	a9839c4aee	IPV6_PKTINFO support for v4-mapped IPv6 sockets When using v4-mapped IPv6 sockets with IPV6_PKTINFO we do not respect the given v4-mapped src address on the IPv4 socket. Implement the needed functionality. This allows single-socket UDP applications (such as OpenVPN) to work better on FreeBSD. Requested by: Gert Doering (gert greenie.net), pfsense Tested by: Gert Doering (gert greenie.net) Reviewed by: melifaro MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24135	2020-08-07 15:13:53 +00:00
Andrey V. Elsukov	ce8875b6c4	Fix typo. Submitted by: Evgeniy Khramtsov <evgeniy at khramtsov org> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25932	2020-08-05 10:27:11 +00:00
Mark Johnston	cfae6a92ac	Remove an incorrect assertion from in6p_lookup_mcast_ifp(). The socket may be bound to an IPv4-mapped IPv6 address. However, the inp address is not relevant to the JOIN_GROUP or LEAVE_GROUP operations. While here remove an unnecessary check for inp == NULL. Reported by: syzbot+d01ab3d5e6c1516a393c@syzkaller.appspotmail.com Reviewed by: hselasky MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25888	2020-08-04 15:00:02 +00:00
Mark Johnston	19afc65a7e	ip6_output(): Check the return value of in6_getlinkifnet(). If the destination address has an embedded scope ID, make sure that it corresponds to a valid ifnet before proceeding. Otherwise a sendto() with a bogus link-local address can trigger a NULL pointer dereference. Reported by: syzkaller Reviewed by: ae Fixes: r358572 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25887	2020-07-30 17:43:23 +00:00
Alexander V. Chernikov	e1c05fd290	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-21 19:56:13 +00:00
Alexander V. Chernikov	725871230d	Temporarly revert r363319 to unbreak the build. Reported by: CI Pointy hat to: melifaro	2020-07-19 10:53:15 +00:00
Alexander V. Chernikov	8cee15d9e4	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-19 09:29:27 +00:00
Alexander V. Chernikov	4c7ba83f9d	Switch inet6 default route subscription to the new rib subscription api. Old subscription model allowed only single customer. Switch inet6 to the new subscription api and eliminate the old model. Differential Revision: https://reviews.freebsd.org/D25615	2020-07-12 11:24:23 +00:00
Alexander V. Chernikov	eddfb2e86f	Fix IPv6 regression introduced by r362900. PR: kern/247729	2020-07-03 08:06:26 +00:00
Alexander V. Chernikov	6ad7446c6f	Complete conversions from fib<4\|6>_lookup_nh_<basic\|ext> to fib<4\|6>_lookup(). fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. With no callers remaining, remove fib[46]_lookup_nh_ functions. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25445	2020-07-02 21:04:08 +00:00
Mark Johnston	95033af923	Add the SCTP_SUPPORT kernel option. This is in preparation for enabling a loadable SCTP stack. Analogous to IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured in order to support a loadable SCTP implementation. Discussed with: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-06-18 19:32:34 +00:00
Michael Tuexen	70486b27ae	Retire SCTP_SO_LOCK_TESTING. This was intended to test the locking used in the MacOS X kernel on a FreeBSD system, to make use of WITNESS and other debugging infrastructure. This hasn't been used for ages, to take it out to reduce the #ifdef complexity. MFC after: 1 week	2020-06-07 14:39:20 +00:00
Ryan Moeller	78a3645fd2	Fix typo in previous commit Applied the wrong patch Reported by: Michael Butler <imb@protected-networks.net> Approved by: mav (mentor) Sponsored by: iXsystems.com	2020-06-03 17:26:00 +00:00
Ryan Moeller	f057d56c6c	scope6: Check for NULL afdata before dereferencing Narrows the race window with if_detach. Approved by: mav (mentor) MFC after: 3 days Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25017	2020-06-03 16:57:30 +00:00
Alexander V. Chernikov	da187ddb3d	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:49:42 +00:00
Alexander V. Chernikov	e7403d0230	Revert r361704, it accidentally committed merged D25067 and D25070.	2020-06-01 20:40:40 +00:00
Alexander V. Chernikov	79674562b8	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:32:02 +00:00
Alexander V. Chernikov	a37a5246ca	Use fib[46]_lookup() in mtu calculations. fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Differential Revision: https://reviews.freebsd.org/D24974	2020-05-28 08:00:08 +00:00
Alexander V. Chernikov	1483c1c508	Replace ip6_ouput fib6_lookup_nh_<ext\|basic> calls with fib6_lookup(). fib6_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24973	2020-05-28 07:29:44 +00:00

1 2 3 4 5 ...

2291 Commits