freebsd-dev

Author	SHA1	Message	Date
Kristof Provost	e111d79806	Add FEATURE sysctls for ALTQ disciplines This will allow userspace to more easily figure out if ALTQ is built into the kernel and what disciplines are supported. Reviewed by: donner@ Differential Revision: https://reviews.freebsd.org/D28302	2021-01-25 19:58:22 +01:00
Vincenzo Maffione	f80efe5016	iflib: netmap: move per-packet operation out of fragments loop MFC after: 1 week	2021-01-24 21:38:59 +00:00
Vincenzo Maffione	aceaccab65	iflib: netmap: add support for NS_MOREFRAG The NS_MOREFRAG flag can be set in a netmap slot to represent a multi-fragment packet. Only the last fragment of a packet does not have the flag set. On TX rings, the flag may be set by the userspace application. The kernel will look at the flag and use it to properly set up the NIC TX descriptors. On RX rings, the kernel may set the flag if the packet received was split across multiple netmap buffers. The userspace application should look at the flag to know when the packet is complete. Submitted by: rajesh1.kumar_amd.com Reviewed by: vmaffione MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27799	2021-01-24 21:20:59 +00:00
Andrew Gallatin	0c864213ef	iflib: Fix a NULL pointer deref rxd_frag_to_sd() have pf_rv parameter as NULL with the current code. This patch fixes the NULL pointer dereference in that case thus avoiding a possible panic. Submitted by: rajesh1.kumar at amd.com Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D28115	2021-01-21 09:47:06 -05:00
Alexander V. Chernikov	9d6567bc30	Fix panic on vnet creation if fib algo has been set to fixed value. Make fixed algo property per-VNET instead of global.	2021-01-17 20:32:25 +00:00
Alexander V. Chernikov	f9e0752e35	Create new in6_purgeifaddr() which purges bound ifa prefix if it gets unused. Currently if_purgeifaddrs() uses in6_purgeaddr() to remove IPv6 ifaddrs. in6_purgeaddr() does not trrigger prefix removal if number of linked ifas goes to 0, as this is a low-level function. As a result, if_purgeifaddrs() purges all IPv4/IPv6 addresses but keeps corresponding IPv6 prefixes. Fix this by creating higher-level wrapper which handles unused prefix usecase and use it in if_purgeifaddrs(). Differential revision: https://reviews.freebsd.org/D28128	2021-01-17 20:32:25 +00:00
Alexander V. Chernikov	81728a538d	Split rtinit() into multiple functions. rtinit[1]() is a function used to add or remove interface address prefix routes, similar to ifa_maintain_loopback_route(). It was intended to be family-agnostic. There is a problem with this approach in reality. 1) IPv6 code does not use it for the ifa routes. There is a separate layer, nd6_prelist_(), providing interface for maintaining interface routes. Its part, responsible for the actual route table interaction, mimics rtenty() code. 2) rtinit tries to combine multiple actions in the same function: constructing proper route attributes and handling iterations over multiple fibs, for the non-zero net.add_addr_allfibs use case. It notably increases the code complexity. 3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag for p2p connections, host routes and p2p routes are handled in the same way. Additionally, mapping IFA flags to RTF flags makes the interface pretty messy. It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface aliases. 4) rtinit() is the last customer passing non-masked prefixes to rib_action(), complicating rib_action() implementation. 5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive" ifa messages in certain corner cases. To address all these points, the following has been done: * rtinit() has been split into multiple functions: - Route attribute construction were moved to the per-address-family functions, dealing with (2), (3) and (4). - funnction providing net.add_addr_allfibs handling and route rtsock notificaions is the new routing table inteface. - rtsock ifa notificaion has been moved out as well. resulting set of funcion are only responsible for the actual route notifications. Side effects: * /32 alias does not result in interface routes (/32 route and "host" route) * RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses Differential revision: https://reviews.freebsd.org/D28186	2021-01-16 22:42:41 +00:00
Alexander V. Chernikov	a6b7689718	Remove redundant rtinit() calls from tuntap. Removed code iterates over if_addrhead and tries to remove routes for each ifa. This is exactly the thing that if_purgeaddrs() do, and if_purgeaddr() is already called in the end. Reviewed by: glebius MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D28106	2021-01-13 10:03:15 +00:00
Ryan Libby	c86fa3b8d7	pf: quiet -Wredundant-decls for pf_get_ruleset_number In `e86bddea9f` sys/netpfil/pf/pf.h grew a declaration of pf_get_ruleset_number. Now delete the old declaration from sys/net/pfvar.h. Reviewed by: kp Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D28081	2021-01-10 21:53:15 -08:00
Alexander V. Chernikov	685de460bc	Use static initializers for fib algo to shift initialization to ealier stage. This allows to register modules loaded at boot time. Reported by: olivier	2021-01-11 00:16:54 +00:00
Vincenzo Maffione	55f0ad5fde	netmap: restore hwofs and support it in iflib Restore the hwofs functionality temporarily disabled by `7ba6ecf216` to prevent issues with iflib. This patch brings the necessary changes to iflib to enable howfs to allow interface restarts without disrupting netmap applications actively using its rings. After this change, it becomes possible for multiple non-cooperating netmap applications to use non-overlapping subsets of the available netmap rings without clashing with each other. PR: 252453 MFC after: 1 week	2021-01-10 22:51:15 +00:00
Vincenzo Maffione	8aa8484cbf	iflib: fix build failure in case DEV_NETMAP is not defined This addresses the build failure introduced by `3d65fd97e8`. MFC with: `3d65fd97e8`	2021-01-10 14:43:58 +00:00
Vincenzo Maffione	4ba9ad0dc3	iflib: add assert to prevent out-of-bounds array access The iflib_queues_alloc() allocates isc_nrxqs iflib_dma_info structs for each rxqset, and links each struct to a different free list. As a result, it must be isc_nrxqs >= isc_nfl (plus the completion queue, if present). Add an assertion to make this constraint explicit. MFC after: 2 weeks	2021-01-10 13:59:20 +00:00
Vincenzo Maffione	3d65fd97e8	netmap: iflib: enable/disable krings on any interface reinit Since `1d238b07d5`, krings are disabled before a reinit cycle triggered by iflib_netmap_register. However, this operation is actually necessary also for any interface reinit triggered by other causes (i.e., ifconfig commands). We achieve this goal by moving the krings enable/disable operation inside iflib_stop() and iflib_init_locked(). Once here, this change also removes some redundant operations from iflib_netmap_register(), that are already performed by iflib_stop(). PR: 252453 MFC after: 1 week	2021-01-10 12:04:08 +00:00
Vincenzo Maffione	3189ba6167	netmap: iflib: fix asserts in netmap_fl_refill() When netmap_fl_refill() is called at initialization time (e.g., during netmap_iflib_register()), nic_i must be 0, since the free list is reinitialized. At the end of the refill cycle, nic_i must still be zero, because exactly N descriptors (N is the ring size) are refilled. This patch therefore fixes the assertions to check on nic_i rather than on nm_i. The current netmap_reset() may in fact cause nm_i to be != 0 while the device is resetting: this may happen when multiple non-cooperating processes open different subsets of the available netmap rings. PR: 252518 MFC after: 1 week	2021-01-09 21:35:07 +00:00
Vincenzo Maffione	1d238b07d5	netmap: iflib: stop krings during interface reset When different processes open separate subsets of the available rings of a same netmap interface, a device reset may be performed while one of the processes is actively using some rings (e.g., caused by another process executing a nmport_open()). With this patch, such situation will cause the active process to get a POLLERR, so that it can have a chance to detect the situation. We also guarantee that no process is running a txsync or rxsync (ioctl or poll) while an iflib device reset is in progress. PR: 252453 MFC after: 1 week	2021-01-09 21:01:46 +00:00
Matt Macy	81be655266	iflib: ensure that tx interrupts enabled and cleanups Doing a 'dd' over iscsi will reliably cause stalls. Tx cleaning _should_ reliably happen as data is sent. However, currently if the transmit queue fills it will wait until the iflib timer (hz/2) runs. This change causes the the tx taskq thread to be run if there are completed descriptors. While here: - make timer interrupt delay a sysctl - simplify txd_db_check handling - comment on INTR types Background on the change: Initially doorbell updates were minimized by only writing to the register on every fourth packet. If txq_drain would return without writing to the doorbell it scheduled a callout on the next tick to do the doorbell write to ensure that the write otherwise happened "soon". At that time a sysctl was added for users to avoid the potential added latency by simply writing to the doorbell register on every packet. This worked perfectly well for e1000 and ixgbe ... and appeared to work well on ixl. However, as it turned out there was a race to this approach that would lockup the ixl MAC. It was possible for a lower producer index to be written after a higher one. On e1000 and ixgbe this was harmless - on ixl it was fatal. My initial response was to add a lock around doorbell writes - fixing the problem but adding an unacceptable amount of lock contention. The next iteration was to use transmit interrupts to drive delayed doorbell writes. If there were no packets in the queue all doorbell writes would be immediate as the queue started to fill up we could delay doorbell writes further and further. At the start of drain if we've cleaned any packets we know we've moved the state machine along and we write the doorbell (an obvious missing optimization was to skip that doorbell write if db_pending is zero). This change required that tx interrupts be scheduled periodically as opposed to just when the hardware txq was full. However, that just leads to our next problem. Initially dedicated msix vectors were used for both tx and rx. However, it was often possible to use up all available vectors before we set up all the queues we wanted. By having rx and tx share a vector for a given queue we could halve the number of vectors used by a given configuration. The problem here is that with this change only e1000 passed the necessary value to have the fast interrupt drive tx when appropriate. Reported by: mav@ Tested by: mav@ Reviewed by: gallatin@ MFC after: 1 month Sponsored by: iXsystems Differential Revision: https://reviews.freebsd.org/D27683	2021-01-07 14:07:35 -08:00
Alexander V. Chernikov	d68cf57b7f	Refactor rt_addrmsg() and rt_routemsg(). Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function. Reviewers: #network Subscribers: imp, ae, bz Differential Revision: https://reviews.freebsd.org/D28011	2021-01-07 19:38:19 +00:00
Kristof Provost	5a3b9507d7	pf: Convert pfi_kkif to use counter_u64 Improve caching behaviour by using counter_u64 rather than variables shared between cores. The result of converting all counters to counter(9) (i.e. this full patch series) is a significant improvement in throughput. As tested by olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see: x FreeBSD 20201223: inet packets-per-second + FreeBSD 20201223 with pf patches: inet packets-per-second +--------------------------------------------------------------------------+ \| + \| \| xx + \| \|xxx +++\| \|\|A\| \| \| \|A\|\| +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 9216962 9526356 9343902 9371057.6 116720.36 + 5 19427190 19698400 19502922 19546509 109084.92 Difference at 95.0% confidence 1.01755e+07 +/- 164756 108.584% +/- 2.9359% (Student's t, pooled s = 112967) Reviewed by: philip MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27763	2021-01-05 23:35:37 +01:00
Kristof Provost	26c841e2a4	pf: Allocate and free pfi_kkif in separate functions Factor out allocating and freeing pfi_kkif structures. This will be useful when we change the counters to be counter_u64, so we don't have to deal with that complexity in the multiple locations where we allocate pfi_kkif structures. No functional change. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27762	2021-01-05 23:35:37 +01:00
Kristof Provost	320c11165b	pf: Split pfi_kif into a user and kernel space structure No functional change. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27761	2021-01-05 23:35:37 +01:00
Kristof Provost	c3adacdad4	pf: Change pf_krule counters to use counter_u64 This improves the cache behaviour of pf and results in improved throughput. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27760	2021-01-05 23:35:37 +01:00
Kristof Provost	c7bdafe2f1	pf: Remove unused fields from pf_krule The u_* counters are used only to communicate with userspace, as userspace cannot use counter_u64. As pf_krule is not passed to userspace these fields are now obsolete. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27759	2021-01-05 23:35:36 +01:00
Kristof Provost	e86bddea9f	pf: Split pf_rule into kernel and user space versions No functional change intended. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27758	2021-01-05 23:35:36 +01:00
Kristof Provost	dc865dae89	pf: Migrate pf_rule and related structs to pf.h As part of the split between user and kernel mode structures we're moving all user space usable definitions into pf.h. No functional change intended. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27757	2021-01-05 23:35:36 +01:00
Kristof Provost	fbbf270eef	pf: Use counter_u64 in pf_src_node Reviewd by: philip MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27756	2021-01-05 23:35:36 +01:00
Kristof Provost	17ad7334ca	pf: Split pf_src_node into a kernel and userspace struct Introduce a kernel version of struct pf_src_node (pf_ksrc_node). This will allow us to improve the in-kernel data structure without breaking userspace compatibility. Reviewed by: philip MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27707	2021-01-05 23:35:36 +01:00
Alexander V. Chernikov	9c0ff6a8bb	Remove now-unused RT_GATEWAY* definitions. They were used to simplify nexthop transition, hence not needed anymore.	2021-01-04 21:45:46 +00:00
Hans Petter Selasky	747feea146	Streamline the infiniband code according to the ethernet code. Fix LINT-NOIP kernel build. Submitted by: rlibby @ Differential Revision: https://reviews.freebsd.org/D27861 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-12-31 10:07:02 +01:00
Hans Petter Selasky	ec52ff6d14	Streamline the infiniband code according to the ethernet code. Specifically implement the if_requestencap callback function for infiniband. Most of the changes are simply a cut and paste of the equivalent ethernet part. Reviewed by: melifaro @ Differential Revision: https://reviews.freebsd.org/D27631 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-12-29 18:01:57 +01:00
Hans Petter Selasky	19ecb5e8da	Fix for IPoIB over lagg(4). Need to update both link layer address and broadcast address when active link changes for IP over infiniband. This is because the broadcast address contains the so-called P-key, which is interface dependent. Reviewed by: kib @ Differential Revision: https://reviews.freebsd.org/D27658 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-12-29 17:35:06 +01:00
Ryan Libby	833dbf1e22	route: quiet -Wredundant-decls Remove declaration duplicated in `f5baf8bb12` Reviewed by: melifaro Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27790	2020-12-27 16:32:27 -08:00
Alexander V. Chernikov	f733d9701b	Fix default route handling in radix4_lockless algo. Improve nexthop debugging. Reported by: Florian Smeets <flo at smeets.xyz>	2020-12-26 22:51:02 +00:00
Alexander V. Chernikov	4e19e0d92a	Use light-weight versions of routing lookup functions in ng_netflow. Use recently-added combination of `fib[46]_lookup_rt()` which returns rtentry & raw nexthop with `rt_get_inet[6]_plen()` which returns address/prefix length of prefix inside `rt`. Add `nhop_select_func()` wrapper around inlined `nhop_select()` to allow callers external to the routing subsystem select the proper nexthop from the multipath group without including internal headers. New calls does not require reference counting objects and reduce the amount of copied/processed rtentry data. Differential Revision: https://reviews.freebsd.org/D27675	2020-12-26 11:27:38 +00:00
Alexander V. Chernikov	f5baf8bb12	Add modular fib lookup framework. This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookups. Framework takes care of handling initial synchronisation, route subscription, nhop/nhop groups reference and indexing, dataplane attachments and fib instance algorithm setup/teardown. Framework features automatic algorithm selection, allowing for picking the best matching algorithm on-the-fly based on the amount of routes in the routing table. Currently framework code is guarded under FIB_ALGO config option. An idea is to enable it by default in the next couple of weeks. The following algorithms are provided by default: IPv4: * bsearch4 (lockless binary search in a special IP array), tailored for small-fib (<16 routes) * radix4_lockless (lockless immutable radix, re-created on every rtable change), tailored for small-fib (<1000 routes) * radix4 (base system radix backend) * dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) IPv6: * radix6_lockless (lockless immutable radix, re-created on every rtable change), tailed for small-fib (<1000 routes) * radix6 (base system radix backend) * dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) Performance changes: Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604): IPv4: 8 routes: radix4: ~20mpps radix4_lockless: ~24.8mpps bsearch4: ~69mpps dpdk_lpm4: ~67 mpps 700k routes: radix4_lockless: 3.3mpps dpdk_lpm4: 46mpps IPv6: 8 routes: radix6_lockless: ~20mpps dpdk_lpm6: ~70mpps 100k routes: radix6_lockless: 13.9mpps dpdk_lpm6: 57mpps Forwarding benchmarks: + 10-15% IPv4 forwarding performance (small-fib, bsearch4) + 25% IPv4 forwarding performance (full-view, dpdk_lpm4) + 20% IPv6 forwarding performance (full-view, dpdk_lpm6) Control: Framwork adds the following runtime sysctls: List algos * net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4 * net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6 Debug level (7=LOG_DEBUG, per-route) net.route.algo.debug_level: 5 Algo selection (currently only for fib 0): net.route.algo.inet.algo: bsearch4 net.route.algo.inet6.algo: radix6_lockless Support for manually changing algos in non-default fib will be added soon. Some sysctl names will be changed in the near future. Differential Revision: https://reviews.freebsd.org/D27401	2020-12-25 11:33:17 +00:00
Ryan Libby	2fb4a03d55	rtsock: quiet -Wunused-variable in LINT-NOIP kernels Fixup after r368769 / `d68fb8d978`. Reported by: mjg Reviewed by: melifaro Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27730	2020-12-24 12:34:18 -08:00
Mark Johnston	92be2847e8	rtsock: Avoid copying uninitialized padding bytes When copying sockaddrs out to userspace, we pad them to a multiple of the platform alignment (sizeof(long)). However, some sockaddr sizes, such as struct sockaddr_dl, are not an integer multiple of the alignment, so we may end up copying out uninitialized bytes. Fix this by always bouncing through a pre-zeroed sockaddr_storage. Reported by: KASAN Reviewed by: melifaro MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27729	2020-12-23 11:16:40 -05:00
Kristof Provost	1c00efe98e	pf: Use counter(9) for pf_state byte/packet tracking This improves cache behaviour by not writing to the same variable from multiple cores simultaneously. pf_state is only used in the kernel, so can be safely modified. Reviewed by: Lutz Donnerhacke, philip MFC after: 1 week Sponsed by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27661	2020-12-23 12:03:21 +01:00
Kristof Provost	c3f69af03a	pf: Fix unaligned checksum updates The algorithm we use to update checksums only works correctly if the updated data is aligned on 16-bit boundaries (relative to the start of the packet). Import the OpenBSD fix for this issue. PR: 240416 Obtained from: OpenBSD MFC after: 1 week Reviewed by: tuexen (previous version) Differential Revision: https://reviews.freebsd.org/D27696	2020-12-23 12:03:20 +01:00
Hans Petter Selasky	ddce63fcb6	Remove not needed variable initialization. And switch from int to bool while at it. Reviewed by: melifaro@ Differential Revision: https://reviews.freebsd.org/D27725 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-12-23 12:04:46 +01:00
Konstantin Belousov	994e47023a	vxlan: stop checking CSUM_ENCAP_VXLAN when converting inner CSUM flags into normal, for decapsulation. The packet, if processed at this point, was already parsed to be UDP directed to a vxlan port. Connect-X 4+ does not provide easy method to infer which parser processed the packet, so driver cannot set the flag without a lot of efforts which are only to satisfy the formal requirements. Reviewed by: bryanv, np Sponsored by: Mellanox Technologies/NVidia Networking Differential revision: https://reviews.freebsd.org/D27449 MFC after: 1 week	2020-12-23 10:54:06 +02:00
Alexander V. Chernikov	d68fb8d978	Switch direct rt fields access in rtsock.c to newly-create field acessors. rtsock code was build around the assumption that each rtentry record in the system radix tree is a ready-to-use sockaddr. This assumptions turned out to be not quite true: * masks have their length tweaked, so we have rtsock_fix_netmask() hack * IPv6 addresses have their scope embedded, so we have another explicit deembedding hack. Change the code to decouple rtentry internals from rtsock code using newly-created rtentry accessors. This will allow to eventually eliminate both of the hacks and change rtentry dst/mask format. Differential Revision: https://reviews.freebsd.org/D27451	2020-12-18 22:00:57 +00:00
Brooks Davis	f3f2ee76ad	style(9): Correct whitespace in struct definitions struct ifconf and struct ifreq use the odd style "struct<tab>foo". struct ifdrv seems to have tried to follow this but was committed with spaces in place of most tabs resulting in "struct<space><space>ifdrv". MFC after: 3 days	2020-12-11 01:00:07 +00:00
Gleb Smirnoff	5ee33a9076	Fixup r368446 with KERN_TLS.	2020-12-08 23:54:09 +00:00
Gleb Smirnoff	e1074ed6a0	The list of ports in configuration path shall be protected by locks, epoch shall be used only for fast path. Thus use LAGG_XLOCK() in lagg_[un]register_vlan. This fixes sleeping in epoch panic. PR: 240609	2020-12-08 16:46:00 +00:00
Gleb Smirnoff	87bf9b9cbe	Convert LAGG_RLOCK() to NET_EPOCH_ENTER(). No functional changes.	2020-12-08 16:36:46 +00:00
Mark Johnston	c065d4e5e9	iflib: Avoid leaking the freelist bitmaps upon driver detach Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27342	2020-12-07 14:53:14 +00:00
Mark Johnston	102540192c	iflib: Detach tasks upon device registration failure In some error paths we would fail to detach from the iflib taskqueue groups. Also move the detach code into its own subroutine instead of duplicating it. Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27342	2020-12-07 14:52:57 +00:00
Alexander V. Chernikov	df9053920f	Add IPv4/IPv6 rtentry prefix accessors. Multiple consumers like ipfw, netflow or new route lookup algorithms need to get the prefix data out of struct rtentry. Instead of providing direct access to the rtentry, create IPv4/IPv6 accessors to abstract struct rtentry internals and avoid including internal routing headers for external consumers. While here, move struct route_nhop_data to the public header, so external customers can actually use lookup functions returning rt&nhop data. Differential Revision: https://reviews.freebsd.org/D27416	2020-12-03 22:23:57 +00:00
Kristof Provost	7f883a9b5b	net: Revert vnet/epair cleanup race mitigation Revert the mitigation code for the vnet/epair cleanup race (done in r365457). r368237 introduced a more reliable fix. MFC after: 2 weeks Sponsored by: Modirum MDPay	2020-12-01 16:34:43 +00:00
Kristof Provost	e133271fc1	if: Fix panic when destroying vnet and epair simultaneously When destroying a vnet and an epair (with one end in the vnet) we often panicked. This was the result of the destruction of the epair, which destroys both ends simultaneously, happening while vnet_if_return() was moving the struct ifnet to its home vnet. This can result in a freed ifnet being re-added to the home vnet V_ifnet list. That in turn panics the next time the ifnet is used. Prevent this race by ensuring that vnet_if_return() cannot run at the same time as if_detach() or epair_clone_destroy(). PR: 238870, 234985, 244703, 250870 MFC after: 2 weeks Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D27378	2020-12-01 16:23:59 +00:00
Alexander V. Chernikov	77df2c21cb	Renumber NHR_* flags after NHR_IFAIF removal in r368127. Suggested by: rpokala	2020-11-30 21:42:55 +00:00
Alexander V. Chernikov	d1d941c5b9	Remove RADIX_MPATH config option. ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244	2020-11-29 19:43:33 +00:00
Matt Macy	2338da0373	Import kernel WireGuard support Data path largely shared with the OpenBSD implementation by Matt Dunwoodie <ncon@nconroy.net> Reviewed by: grehan@freebsd.org MFC after: 1 month Sponsored by: Rubicon LLC, (Netgate) Differential Revision: https://reviews.freebsd.org/D26137	2020-11-29 19:38:03 +00:00
Alexander V. Chernikov	3b1654cb14	Introduce rib_walk_ext_internal() to allow iteration with rnh pointer. This solves the case when rib is not yet attached/detached to/from the system rib array. Differential Revision: https://reviews.freebsd.org/D27406	2020-11-29 13:54:49 +00:00
Alexander V. Chernikov	f47fa26065	Add nhop_ref_any() to unify referencing nhop or nexthop group. It allows code within routing subsystem to transparently reference nexthops and nexthop groups, similar to nhop_free_any(), abstracting ROUTE_MPATH details. Differential Revision: https://reviews.freebsd.org/D27410	2020-11-29 13:52:06 +00:00
Alexander V. Chernikov	b712e3e343	Refactor fib4/fib6 functions. No functional changes. * Make lookup path of fib<4\|6>_lookup_debugnet() separate functions (fib<46>_lookup_rt()). These will be used in the control plane code requiring unlocked radix operations and actual prefix pointer. * Make lookup part of fib<4\|6>_check_urpf() separate functions. This change simplifies the switch to alternative lookup implementations, which helps algorithmic lookups introduction. * While here, use static initializers for IPv4/IPv6 keys Differential Revision: https://reviews.freebsd.org/D27405	2020-11-29 13:41:49 +00:00
Alexander V. Chernikov	98d5c4e5c8	Add tracking for rib/nhops/nhgrp objects and provide cumulative number accessors. The resulting KPI can be used by routing table consumers to estimate the required scale for route table export. * Add tracking for rib routes * Add accessors for number of nexthops/nexthop objects * Simplify rib_unsubscribe: store rnh we're attached to instead of requiring it up again on destruction. This helps in the cases when rnh is not linked yet/already unlinked. Differential Revision: https://reviews.freebsd.org/D27404	2020-11-29 13:27:24 +00:00
Alexander V. Chernikov	ef6ef7e5da	Add nhgrp_get_idx() as a counterpart for nhop_get_idx(). It allows the routing-related code to reference nexthop groups by index instead of storing a pointer.	2020-11-28 15:46:40 +00:00
Alexander V. Chernikov	7a6dc73c98	Cleanup nexthops request flags: * remove NHR_IFAIF as it was used by previous version of nexthop KPI * update NHR_REF description	2020-11-28 15:11:59 +00:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Kristof Provost	bca0e1d2ac	if: Fix non-VIMAGE build if_link_ifnet() and if_unlink_ifnet() are needed even when VIMAGE is not enabled. MFC after: 2 weeks Sponsored by: Modirum MDPay	2020-11-25 17:15:24 +00:00
Kristof Provost	a779388f8b	if: Protect V_ifnet in vnet_if_return() When we terminate a vnet (i.e. jail) we move interfaces back to their home vnet. We need to protect our access to the V_ifnet CK_LIST. We could enter NET_EPOCH, but if_detach_internal() (called from if_vmove()) waits for net epoch callback completion. That's not possible from NET_EPOCH. Instead, we take the IFNET_WLOCK, build a list of the interfaces that need to move and, once we've released the lock, move them back to their home vnet. We cannot hold the IFNET_WLOCK() during if_vmove(), because that results in a LOR between ifnet_sx, in_multi_sx and iflib ctx lock. Separate out moving the ifp into or out of V_ifnet, so we can hold the lock as we do the list manipulation, but do not hold it as we if_vmove(). Reviewed by: melifaro MFC after: 2 weeks Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D27279	2020-11-25 15:07:22 +00:00
Kristof Provost	a60100fdfc	if: Remove ifnet_rwlock It no longer serves any purpose, as evidenced by the fact that we never take it without ifnet_sxlock. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D27278	2020-11-25 10:56:38 +00:00
Alexander V. Chernikov	7511a63825	Refactor rib iterator functions. * Make rib_walk() order of arguments consistent with the rest of RIB api * Add rib_walk_ext() allowing to exec callback before/after iteration. * Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del * Rename rt_forach_fib_walk -> rib_foreach_table_walk * Move rib_foreach_table_walk{_del} to route/route_helpers.c * Slightly refactor rib_foreach_table_walk{_del} to make the implementation consistent and prepare for upcoming iterator optimizations. Differential Revision: https://reviews.freebsd.org/D27219	2020-11-22 20:21:10 +00:00
Mitchell Horne	70af7ce99a	Make net/ifq.h C++ friendly Don't use "new" as an identifier, and add explicit casts from void *. As a general policy, FreeBSD doesn't make any C++ compatibility guarantees for kernel headers like it does for userland, but it is a small effort to do so in this case, to the benefit of a downstream consumer (NetApp). Reviewed by: rscheff Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27286	2020-11-20 14:45:45 +00:00
Andrew Gallatin	8732245d29	LACP: When suppressing distributing, return ENOBUFS When links come and go, lacp goes into a "suppress distributing" mode where it drops traffic for 3 seconds. When in this mode, lagg/lacp historiclally drops traffic with ENETDOWN. That return value causes TCP to close any connection where it gets that value back from the lower parts of the stack. This means that any TCP connection with active traffic during a 3-second windown when an LACP link comes or goes would get closed. TCP treats return values of ENOBUFS as transient errors, and re-schedules transmission later. So rather than returning ENETDOWN, lets return ENOBUFS instead. This allows TCP connections to be preserved. I've tested this by repeatedly bouncing links on a Netlfix CDN server under a moderate (20Gb/s) load and overved ENOBUFS reported back to the TCP stack (as reported by a RACK TCP sysctl). Reviewed by: jhb, jtl, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27188	2020-11-18 14:55:49 +00:00
Mark Johnston	54bf96fb4f	iflib: Free full mbuf chains when draining transmit queues Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> Reviewed by: gallatin, hselasky MFC after: 1 week Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27179	2020-11-11 18:00:06 +00:00
Andrey V. Elsukov	2f4ffa9f72	Fix possible NULL pointer dereference. lagg(4) replaces if_output method of its child interfaces and expects that this method can be called only by child interfaces. But it is possible that lagg_port_output() could be called by children of child interfaces. In this case ifnet's if_lagg field is NULL. Add check that lp is not NULL. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2020-11-11 15:53:36 +00:00
Mitchell Horne	4a3fc6e22e	Fix definition of rn_addmask() Add the missing static keyword present in the declaration. Reviewed by: melifaro Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27024	2020-11-08 19:02:22 +00:00
Alexander V. Chernikov	2d39824195	Switch net.add_addr_allfibs default to 0. The goal of the fib support is to provide multiple independent routing tables, isolated from each other. net.add_addr_allfibs default tries to shift gears in the opposite direction, unconditionally inserting all addresses to all of the fibs. There are use cases when this is necessary, however this is not a default expected behaviour, especially compared to other implementations. Provide WARNING message for the setups with multiple fibs to notify potential users of the feature. Differential Revision: https://reviews.freebsd.org/D26076	2020-11-08 18:27:49 +00:00
Alexander V. Chernikov	76e6b37f6b	Temporarily revert setting net.add_addr_allfibs to 0. It accidentally sweeped in r367486. Revert to allow for proper commit message & warning.	2020-11-08 18:11:12 +00:00
Alexander V. Chernikov	770495f4c0	Fix build broken by r367484: add route_ifaddrs.c. Pointy hat to: melifaro Reported by: jenkins	2020-11-08 13:30:44 +00:00
Alexander V. Chernikov	bad6b23606	Move all ifaddr route creation business logic to net/route/route_ifaddr.c Differential Revision: https://reviews.freebsd.org/D26318	2020-11-08 11:12:00 +00:00
Konstantin Belousov	80ba361b2f	if_media.c SIOCGMEDIAX handler: improve loop Stop advancing counter past the current iteration number at the start of iteration. This removes the need of subtracting one when calculating index for copyout, and arguably fixes off-by-one reporting of copied out elements when copyout failed. Reviewed by: hselasky Sponsored by: Mellanox Technologies / NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27073	2020-11-03 14:33:04 +00:00
Konstantin Belousov	1fbbe9dbf5	net/if_media.c: improve IFMEDIA_DEBUG output. Use consistent output format for hex. Print both media and mask where relevant. Reviewed by: hselasky Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27034	2020-11-01 16:38:30 +00:00
Konstantin Belousov	e399f19dba	Cleanup of net/if_media.c: simplify cleanup loop in ifmedia_removeall(). Reviewed by: hselasky Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27034	2020-11-01 16:36:21 +00:00
Konstantin Belousov	899322fdfa	Cleanup of net/if_media.c: some style. Reviewed by: hselasky Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27034	2020-11-01 16:30:17 +00:00
Konstantin Belousov	2193fb16b5	Cleanup of net/if_media.c: switch to ANSI C function definitions. Reviewed by: hselasky Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27034	2020-11-01 16:25:35 +00:00
Mitchell Horne	ced0f52457	net: add ETHER_IS_IPV6_MULTICAST This can be used to detect if an ethernet address is specifically an IPv6 multicast address, defined in accordance to RFC 2464. ETHER_IS_MULTICAST is still preferred in the general case. Reviewed by: ae Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26611	2020-10-30 13:32:58 +00:00
John Baldwin	36e0a362ac	Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc(). This gives a more uniform API for send tag life cycle management. Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27000	2020-10-29 23:28:39 +00:00
John Baldwin	521eac97f3	Support hardware rate limiting (pacing) with TLS offload. - Add a new send tag type for a send tag that supports both rate limiting (packet pacing) and TLS offload (mostly similar to D22669 but adds a separate structure when allocating the new tag type). - When allocating a send tag for TLS offload, check to see if the connection already has a pacing rate. If so, allocate a tag that supports both rate limiting and TLS offload rather than a plain TLS offload tag. - When setting an initial rate on an existing ifnet KTLS connection, set the rate in the TCP control block inp and then reset the TLS send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit send tag. This allocates the TLS send tag asynchronously from a task queue, so the TLS rate limit tag alloc is always sleepable. - When modifying a rate on a connection using KTLS, look for a TLS send tag. If the send tag is only a plain TLS send tag, assume we failed to allocate a TLS ratelimit tag (either during the TCP_TXTLS_ENABLE socket option, or during the send tag reset triggered by ktls_output_eagain) and ignore the new rate. If the send tag is a ratelimit TLS send tag, change the rate on the TLS tag and leave the inp tag alone. - Lock the inp lock when setting sb_tls_info for a socket send buffer so that the routines in tcp_ratelimit can safely dereference the pointer without needing to grab the socket buffer lock. - Add an IFCAP_TXTLS_RTLMT capability flag and associated administrative controls in ifconfig(8). TLS rate limit tags are only allocated if this capability is enabled. Note that TLS offload (whether unlimited or rate limited) always requires IFCAP_TXTLS[46]. Reviewed by: gallatin, hselasky Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26691	2020-10-29 00:23:16 +00:00
Vincenzo Maffione	be7a6b3d84	iflib: fix typo bug introduced by r367093 Code was supposed to call callout_reset_sbt_on() rather than callout_reset_sbt(). This resulted into passing a "cpu" value to a "flag" argument. A recipe for subtle errors. PR: 248652 Reported by: sg@efficientip.com MFC with: r367093	2020-10-28 21:06:17 +00:00
Vincenzo Maffione	17cec474c0	iflib: add per-tx-queue netmap timer The way netmap TX is handled in iflib when TX interrupts are not used (IFC_NETMAP_TX_IRQ not set) has some issues: - The netmap_tx_irq() function gets called by iflib_timer(), which gets scheduled with tick granularity (hz). This is not frequent enough for 10Gbps NICs and beyond (e.g., ixgbe or ixl). The end result is that the transmitting netmap application is not woken up fast enough to saturate the link with small packets. - The iflib_timer() functions also calls isc_txd_credits_update() to ask for more TX completion updates. However, this violates the netmap requirement that only txsync can access the TX queue for datapath operations. Only netmap_tx_irq() may be called out of the txsync context. This change introduces per-tx-queue netmap timers, using microsecond granularity to ensure that netmap_tx_irq() can be called often enough to allow for maximum packet rate. The timer routine simply calls netmap_tx_irq() to wake up the netmap application. The latter will wake up and call txsync to collect TX completion updates. This change brings back line rate speed with small packets for ixgbe. For the time being, timer expiration is hardcoded to 90 microseconds, in order to avoid introducing a new sysctl. We may eventually implement an adaptive expiration period or use another deferred work mechanism in place of timers. Also, fix the timers usage to make sure that each queue is serviced by a different CPU. PR: 248652 Reported by: sg@efficientip.com MFC after: 2 weeks	2020-10-27 21:53:33 +00:00
Hans Petter Selasky	1355e2dc4f	More style fixes (partial revert of r366994). Suggested by: danfe@ Differential Revision: https://reviews.freebsd.org/D26254 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-10-24 13:07:50 +00:00
Hans Petter Selasky	1d3a22e765	Fix order of header files: sys/systm.h should come right after sys/param.h Suggested by: kib@ Differential Revision: https://reviews.freebsd.org/D26254 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-10-24 10:52:09 +00:00
Hans Petter Selasky	01630a496b	Run code through "clang-format -style=file" with some additional fixes. No functional change. Suggested by: kib@ and emaste@ Differential Revision: https://reviews.freebsd.org/D26254 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-10-24 10:23:21 +00:00
Navdeep Parhar	610d345953	if_vxlan(4): csum_flags_to_inner_flags takes the tunnel protocol as a parameter. No functional change.	2020-10-22 17:05:55 +00:00
Hans Petter Selasky	ce329aa256	Compile fix for MIPS, MIPS64, POWERPC and POWERPC64. Add missing include files. Differential Revision: https://reviews.freebsd.org/D26254 Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-10-22 12:22:08 +00:00
Hans Petter Selasky	a92c4bb62a	Add support for IP over infiniband, IPoIB, to lagg(4). Currently only the failover protocol is supported due to limitations in the IPoIB architecture. Refer to the lagg(4) manual page for how to configure and use this new feature. A new network interface type, IFT_INFINIBANDLAG, has been added, similar to the existing IFT_IEEE8023ADLAG . ifconfig(8) has been updated to accept a new laggtype argument when creating lagg(4) network interfaces. This new argument is used to distinguish between ethernet and infiniband type of lagg(4) network interface. The laggtype argument is optional and defaults to ethernet. The lagg(4) command line syntax is backwards compatible. Differential Revision: https://reviews.freebsd.org/D26254 Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-10-22 09:47:12 +00:00
Hans Petter Selasky	9d40cf60d6	Factor out generic IP over infiniband, IPoIB, definitions and code into net/if_infiniband.c and net/infiniband.h . No functional change intended. Differential Revision: https://reviews.freebsd.org/D26254 Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-10-22 09:09:53 +00:00
Alexander V. Chernikov	c7cffd65c5	Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q). 802.1ad interfaces are created with ifconfig using the "vlanproto" parameter. Eg., the following creates a 802.1Q VLAN (id #42) over a 802.1ad S-VLAN (id #5) over a physical Ethernet interface (em0). ifconfig vlan5 create vlandev em0 vlan 5 vlanproto 802.1ad up ifconfig vlan42 create vlandev vlan5 vlan 42 inet 10.5.42.1/24 VLAN_MTU, VLAN_HWCSUM and VLAN_TSO capabilities should be properly supported. VLAN_HWTAGGING is only partially supported, as there is currently no IFCAP_VLAN_* denoting the possibility to set the VLAN EtherType to anything else than 0x8100 (802.1ad uses 0x88A8). Submitted by: Olivier Piras Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D26436	2020-10-21 21:28:20 +00:00
Alexander V. Chernikov	0c325f53f1	Implement flowid calculation for outbound connections to balance connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523	2020-10-18 17:15:47 +00:00
Marcin Wojtas	1148702e43	Add SADB_SAFLAGS_ESN flag This flag is going to be used by IKE daemon to signal if Extended Sequence Number feature is going to be used. Value for this flag was taken from OpenBSD source code `6b4cbaf181` Submitted by: Patryk Duda <pdk@semihalf.com> Reviewed by: ae Differential revision: https://reviews.freebsd.org/D22366 Obtained from: Semihalf Sponsored by: Stormshield	2020-10-16 11:22:29 +00:00
Richard Scheffenegger	868aabb470	Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow. This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to any priority between 0 and 7. Note that for untagged traffic, explicitly adding a priority will insert a special 801.1Q vlan header with vlan ID = 0 to carry the priority setting Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26409	2020-10-09 12:06:43 +00:00
Konstantin Belousov	cefdb89514	Fix typo. Sponsored by: Mellanox Technologies/NVIDIA Networking MFC after: 3 days	2020-10-07 10:58:56 +00:00
Kristof Provost	4af1bd8157	bridge: call member interface ioctl() without NET_EPOCH We're not allowed to hold NET_EPOCH while sleeping, so when we call ioctl() handlers for member interfaces we cannot be in NET_EPOCH. We still need some protection of our CK_LISTs, so hold BRIDGE_LOCK instead. That requires changing BRIDGE_LOCK into a sleepable lock, and separating the BRIDGE_RT_LOCK, to protect bridge_rtnode lists. That lock is taken in the data path (while in NET_EPOCH), so it cannot be a sleepable lock. While here document the locking strategy. MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D26418	2020-10-06 19:19:56 +00:00
John Baldwin	56fb710f1b	Store the send tag type in the common send tag header. Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with their own identical headers that stored the type that the type-specific tag structures inherited from, so in practice it seems drivers need this in the tag anyway. This permits removing these extra header indirections (struct cxgbe_snd_tag and struct mlx5e_snd_tag). In addition, this permits driver-independent code to query the type of a tag, e.g. to know what type of tag is being queried via if_snd_query. Reviewed by: gallatin, hselasky, np, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26689	2020-10-06 17:58:56 +00:00
Alexander V. Chernikov	1b95005e95	Fix route flags update during RTM_CHANGE. Nexthop lookup was not consireding rt_flags when doing structure comparison, which lead to an original nexthop selection when changing flags. Fix the case by adding rt_flags field into comparison and rearranging nhop_priv fields to allow for efficient matching. Fix `route change X/Y flags` case - recent changes disallowed specifying RTF_GATEWAY flag without actual gateway. It turns out, route(8) fills in RTF_GATEWAY by default, unless -interface flag is specified. Fix regression by clearing RTF_GATEWAY flag instead of failing. Fix route flag reporting in RTM_CHANGE messages by explicitly updating rtm_flags after operation competion. Add IPv4/IPv6 tests for flag-only route changes.	2020-10-04 13:24:58 +00:00
Alexander V. Chernikov	9c584fa4bc	Remove ROUTE_MPATH-related warnings introduced in r366390. Reported by: mjg	2020-10-03 14:37:54 +00:00
Alexander V. Chernikov	fedeb08b6a	Introduce scalable route multipath. This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449	2020-10-03 10:47:17 +00:00
Vincenzo Maffione	adf41f0788	netmap: fix constness warnings generated by "-Wcast-qual" Submitted by: milosz.kaniewski@gmail.com MFC after: 3 days	2020-10-03 09:33:29 +00:00
Ed Maste	c1aedfcbd9	add SIOCGIFDATA ioctl For interfaces that do not support SIOCGIFMEDIA (for which there are quite a few) the only fallback is to query the interface for if_data->ifi_link_state. While it's possible to get at if_data for an interface via getifaddrs(3) or sysctl, both are heavy weight mechanisms. SIOCGIFDATA is a simple ioctl to retrieve this fast with very little resource use in comparison. This implementation mirrors that of other similar ioctls in FreeBSD. Submitted by: Roy Marples <roy@marples.name> Reviewed by: markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D26538	2020-09-28 16:54:39 +00:00
Alexander V. Chernikov	2259a03020	Rework part of routing code to reduce difference to D26449. * Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different times. * Start filling route weight in route change notifications * Pass flowid to UDP/raw IP route lookups * Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact that rtentry can contain multiple nexthops. Differential Revision: https://reviews.freebsd.org/D26497	2020-09-21 20:02:26 +00:00
Alexander V. Chernikov	1440f62266	Remove unused nhop_ref_any() function. Remove "opt_mpath.h" header where not needed. No functional changes.	2020-09-20 21:32:52 +00:00
Alexander V. Chernikov	c4bcfe98e2	Fix gw updates / flag updates during route changes. * Zero gw_sdl if switching to interface route - the assumption that underlying storage is zeroed is incorrect with route changes. * Apply proper flag mask to rte. Reported by: vangyzen	2020-09-20 12:31:48 +00:00
Navdeep Parhar	b092fd6c97	if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS. This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx and rx), TSO, and RSS if the NIC is capable of performing these operations on inner VXLAN traffic. A VXLAN interface inherits the capabilities of its vxlandev interface if one is specified or of the interface that hosts the vxlanlocal address. If other interfaces will carry traffic for that VXLAN then they must have the same hardware capabilities. On transmit, if_vxlan verifies that the outbound interface has the required capabilities and then translates the CSUM_ flags to their inner equivalents. This tells the hardware ifnet that it needs to operate on the inner frame and not the outer VXLAN headers. An event is generated when a VXLAN ifnet starts. This allows hardware drivers to configure their devices to expect VXLAN traffic on the specified incoming port. On receive, the hardware does RSS and checksum verification on the inner frame. if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is not very clear why it didn't do this already. Future work: Rx: it should be possible to avoid the first trip up the protocol stack to get the frame to if_vxlan just so it can decapsulate and requeue for a second trip up the stack. The hardware NIC driver could directly call an if_vxlan receive routine for VXLAN traffic instead. Rx: LRO. depends on what happens with the previous item. There will have to to be a mechanism to indicate that it's time for if_vxlan to flush its LRO state. Reviewed by: kib@ Relnotes: Yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2020-09-18 02:37:57 +00:00
Navdeep Parhar	830edb4561	Add two new ifnet capabilities for hw checksumming and TSO for VXLAN traffic. These are similar to the existing VLAN capabilities. Reviewed by: kib@ Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2020-09-18 02:10:28 +00:00
Mitchell Horne	ceff9b9d25	if_media: definitions for 40GE LM4 ethernet media type Reviewed by: erj Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26276	2020-09-16 14:45:16 +00:00
Alexander V. Chernikov	2b32d93e55	Fix RADIX_MPATH build broken by r365521. Reported by: jenkins, Hartmann, O. <ohartmann at walstatt.org>	2020-09-10 07:05:31 +00:00
Alexander V. Chernikov	aa8f9f90ff	Update nexthop handling for route addition/deletion in preparation for mpath. Currently kernel requests deletion for the certain routes with specified gateway, but this gateway is not actually checked. With multipath routes, internal gateway checking becomes mandatory. Add the logic performing this check. Generalise RTF_PINNED routes to the generic route priorities, simplifying the logic. Add lookup_prefix() function to perform exact match search based on data in @info. Differential Revision: https://reviews.freebsd.org/D26356	2020-09-09 22:07:54 +00:00
Alexander V. Chernikov	cd6298d5c5	Retain marking net.fibs sysctl as a tunable. Suggested by: avg	2020-09-09 21:45:18 +00:00
Alexander V. Chernikov	4a8201c13a	Fix panic with net.fibs tunable set in loader.conf. Fix by removing forgotten CTLFLAG_RWTUN flag from the sysctl, loader variable will be read later in vnet_rtables_init(). Reported by: mav	2020-09-08 21:39:34 +00:00
Kristof Provost	a969635b83	net: mitigate vnet / epair cleanup races There's a race where dying vnets move their interfaces back to their original vnet, and if_epair cleanup (where deleting one interface also deletes the other end of the epair). This is commonly triggered by the pf tests, but also by cleanup of vnet jails. As we've not yet been able to fix the root cause of the issue work around the panic by not dereferencing a NULL softc in epair_qflush() and by not re-attaching DYING interfaces. This isn't a full fix, but makes a very common panic far less likely. PR: 244703, 238870 Reviewed by: lutz_donnerhacke.de MFC after: 4 days Differential Revision: https://reviews.freebsd.org/D26324	2020-09-08 14:54:10 +00:00
Alexander V. Chernikov	05aca418f4	Consistently use the same gateway when adding/deleting interface routes. Use the same link-level gateway when adding or deleting interface routes. This helps nexthop checking in the upcoming multipath changes. Differential Revision: https://reviews.freebsd.org/D26317	2020-09-07 10:13:54 +00:00
Ed Maste	4fa9815a3d	rtsock.c: remove extraneous space Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D26249	2020-09-05 16:13:36 +00:00
Alexander V. Chernikov	8f07963360	Fix regression for IPv6 loopback routes. After nexthop introduction, loopback routes for the interface addresses were created without embedding actual interface index in the gateway. The latter is needed to pass the IPv6 scope during transmission via loopback.. Fix the regression by actually using passed gateway data with interface index. Differential Revision: https://reviews.freebsd.org/D26306	2020-09-03 22:24:52 +00:00
Mateusz Guzik	662c13053f	net: clean up empty lines in .c and .h files	2020-09-01 21:19:14 +00:00
Vincenzo Maffione	35d8a463e8	iflib: leave only 1 receive descriptor unused The pidx argument of isc_rxd_flush() indicates which is the last valid receive descriptor to be used by the NIC. However, current code has multiple issues: - Intel drivers write pidx to their RDT register, which means that NICs will only use the descriptors up to pidx-1 (modulo ring size N), and won't actually use the one pointed by pidx. This does not break reception, but it is anyway confusing and suboptimal (the NIC will actually see only N-2 descriptors as available, rather than N-1). Other drivers (if_vmx, if_bnxt, if_mgb) adhere to this semantic). - The semantic used by Intel (RDT is one descriptor past the last valid one) is used by most (if not all) NICs, and it is also used on the TX side (also in iflib). Since iflib is not currently using this semantic for RX, it must decrement fl->ifl_pidx (modulo N) before calling isc_rxd_flush(), and then the per-driver callback implementation must increment the index again (to match the real semantic). This is confusing and suboptimal. - The iflib refill function is also called at initialization. However, in case the ring size is smaller than 128 (e.g. if_mgb), the refill function will actually prepare all the receive descriptors (N), without leaving one unused, as most of NICs assume (e.g. to avoid RDT to overrun RDH). I can speculate that the code looks like this right now because this issue showed up during testing (e.g. with if_mgb), and it was easy to workaround by decrementing pidx before isc_rxd_flush(). The goal of this change is to simplify the code (removing a bunch of instructions from the RX fast path), and to make the semantic of isc_rxd_flush() consistent across drivers. To achieve this, we: - change the semantics of the pidx argument to the usual one (that is the index one past the last valid one), so that both iflib and drivers avoid the decrement/increment dance. - fix the initialization code to prepare at most N-1 descriptors. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26191	2020-09-01 20:41:47 +00:00
Alexander V. Chernikov	b8d2d479cd	Revert uma zone alignemnt cache unadvertenly committed in r364950.	2020-08-29 12:04:13 +00:00
Alexander V. Chernikov	6498f66f7c	Fix build with RADIX_MPATH. Reported by: Hartmann, O <ohartmann@walstatt.org>	2020-08-29 11:04:24 +00:00
Alexander V. Chernikov	7c89a3b63f	Move fib_rte_to_nh_flags() from net/route_var.h to net/route/nhop_ctl.c. No functional changes. Initially this function was created to perform runtime flag conversions for the previous incarnation of fib lookup functions. As these functions got deprecated, move the function to the file with the only remaining caller. Lastly, rename it to convert_rt_to_nh_flags() to follow the naming notation.	2020-08-28 23:01:56 +00:00
Alexander V. Chernikov	a624ca3dff	Move net/route/shared.h definitions to net/route/route_var.h. No functional changes. net/route/shared.h was created in the inital phases of nexthop conversion. It was intended to serve the same purpose as route_var.h - share definitions of functions and structures between the routing subsystem components. At that time route_var.h was included by many files external to the routing subsystem, which largerly defeats its purpose. As currently this is not the case anymore and amount of route_var.h includes is roughly the same as shared.h, retire the latter in favour of the former.	2020-08-28 22:50:20 +00:00
Alexander V. Chernikov	b122304f6a	Further split nhop creation and rtable operations. As nexthops are immutable, some operations such as route attribute changes require nexthop fetching, forking, modification and route switching. These operations are not atomic, so they may need to be retried multiple times in presence of multiple speakers changing the same route. This change introduces "synchronisation" primitive: route_update_conditional(), simplifying logic for route changes and upcoming multipath operations. Differential Revision: https://reviews.freebsd.org/D26216	2020-08-28 21:59:10 +00:00
Vincenzo Maffione	ae750d5cdf	iflib: netmap: publish all the receive buffer At initialization time, the netmap RX refill function used to prepare the NIC RX ring with N-1 buffers rather than N (with N equal to the number of descriptors in the NIC RX ring). This is not how netmap is supposed to work, as it would keep kring->nr_hwcur not in sync with the NIC "next index to refill" (i.e., fl->ifl_pidx). Instead we prepare N buffers, although we still publish (with isc_rxd_flush()) only the first N-1 buffers, to avoid the NIC producer pointer to overrun the NIC consumer pointer (for NICs where this is a real issue, e.g. Intel ones). MFC after: 2 weeks	2020-08-25 15:19:45 +00:00
Alexander V. Chernikov	592d300e34	Remove RT_LOCK mutex from rte. rtentry lock traditionally served 2 purposed: first was protecting refcounts, the second was assuring consistent field access/changes. Since route nexthop introduction, the need for the former disappeared and the need for the latter reduced. To be more precise, the following rte field are mutable: rt_nhop (nexthop pointer, updated with RIB_WLOCK, passed in rib_cmd_info) rte_flags (only RTF_HOST and RTF_UP, where RTF_UP gets changed at rte removal) rt_weight (relative weight, updated with RIB_WLOCK, passed in rib_cmd_info) rt_expire (time when rte deletion is scheduled, updated with RIB_WLOCK) rt_chain (deletion chain pointer, updated with RIB_WLOCK) All of them are updated under RIB_WLOCK, so the only remaining concern is the reading. rt_nhop and rt_weight (addressed in this review) are read under rib lock and stored in the rib_cmd_info, so the caller has no problem with consitency. rte_flags is currently read unlocked in rtsock reporting (however the scope is only RTF_UP flag, which is pretty static). rt_expire is currently read unlocked in rtsock reporting. rt_chain accesses are safe, as this is only used at route deletion. rt_expire and rte_flags reads will be dealt in a separate reviews soon. Differential Revision: https://reviews.freebsd.org/D26162	2020-08-24 20:23:34 +00:00
Vincenzo Maffione	de5b46107c	iflib: fix isc_rxd_flush call in netmap_fl_refill() The semantic of the pidx argument of isc_rxd_flush() is the last valid index of in the free list, rather than the next index to be published. However, netmap was still using the old convention. While there, also refactor the netmap_fl_refill() to simplify a little bit and add an assertion. MFC after: 2 weeks	2020-08-24 11:44:20 +00:00
Alexander V. Chernikov	eb1c7adb70	Finish r364492 by renaming rt_flags to rte_flags for multipath code.	2020-08-22 20:02:40 +00:00
Alexander V. Chernikov	93bfd365d2	Rename rt_flags to rte_flags && reduce number of rt_nhop accesses. No functional changes. Most of the routing flags are stored in the netxtop instead of rtentry. Rename rt->rt_flags to rt->rte_flags to simplify reading/modifying code checking routing flags. In the new multipath code, rt->rt_nhop may actually point to nexthop group instead of nhop. To ease transition, reduce the amount of rt->rt_nhop->... accesses. Differential Revision: https://reviews.freebsd.org/D26156	2020-08-22 19:30:56 +00:00
Mateusz Guzik	c93d310f87	Fix tinderbox build after r364465	2020-08-22 07:43:38 +00:00
Alexander V. Chernikov	f5247a232a	Make net.fibs growable. Allow to dynamically grow the amount of fibs in each vnet. This change alters current behavior. Currently, if one defines ROUTETABLES > 1 in the kernel config, each vnet will be created with the number of fibs defined in the kernel config. After this commit vnets will be created with fibs=1. Dynamic net.fibs is not compatible with net.add_addr_allfibs. The plan is to deprecate the latter and make net.add_addr_allfibs=0 default behaviour. Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26062	2020-08-21 21:34:52 +00:00
Warner Losh	773e541e8d	Use devctl.h instead of bus.h to reduce newbus pollution. There's no need for these parts of the kernel to know about newbus, so narrow what is included to devctl.h for device_notify_*. Suggested by: kib@	2020-08-21 00:03:24 +00:00
Bjoern A. Zeeb	2ae32ca2c8	For consistency and to avoid any problems getting past the 31bit boundry change the last two IF_Mbps(2500) and additionally one IF_Mbps(5000) to ULL as well. MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")	2020-08-17 13:51:25 +00:00
Qing Li	a5154bb2e5	Correct the mask byte order when checking for reserved bits. Reviewed by: gnn Approved by: gnn MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26071	2020-08-15 16:48:58 +00:00
Alexander V. Chernikov	bec053ffe0	Make net.inet6.ip6.deembed_scopeid behaviour default & remove sysctl. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25637	2020-08-15 11:37:44 +00:00
Alexander V. Chernikov	2f23f45b20	Simplify dom_<rtattach\|rtdetach>. Remove unused arguments from dom_rtattach/dom_rtdetach functions and make them return/accept 'struct rib_head' instead of 'void **'. Declare inet/inet6 implementations in the relevant _var.h headers similar to domifattach / domifdetach. Add rib_subscribe_internal() function to accept subscriptions to the rnh directly. Differential Revision: https://reviews.freebsd.org/D26053	2020-08-14 21:29:56 +00:00
Bryan Drewery	3869d41465	lagg: Avoid adding a port to a lagg device being destroyed. The lagg_clone_destroy() handles detach and waiting for ifconfig callers to drain already. This narrows the race for 2 panics that the tests triggered. Both were a consequence of adding a port to the lagg device after it had already detached from all of its ports. The link state task would run after lagg_clone_destroy() free'd the lagg softc. kernel:trap_fatal+0xa4 kernel:trap_pfault+0x61 kernel:trap+0x316 kernel:witness_checkorder+0x6d kernel:_sx_xlock+0x72 if_lagg.ko:lagg_port_state+0x3b kernel:if_down+0x144 kernel:if_detach+0x659 if_tap.ko:tap_destroy+0x46 kernel:if_clone_destroyif+0x1b7 kernel:if_clone_destroy+0x8d kernel:ifioctl+0x29c kernel:kern_ioctl+0x2bd kernel:sys_ioctl+0x16d kernel:amd64_syscall+0x337 kernel:trap_fatal+0xa4 kernel:trap_pfault+0x61 kernel:trap+0x316 kernel:witness_checkorder+0x6d kernel:_sx_xlock+0x72 if_lagg.ko:lagg_port_state+0x3b kernel:do_link_state_change+0x9b kernel:taskqueue_run_locked+0x10b kernel:taskqueue_run+0x49 kernel:ithread_loop+0x19c kernel:fork_exit+0x83 PR: 244168 Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D25284	2020-08-13 22:06:27 +00:00
Alexander V. Chernikov	6cbadc4234	Move rtzone handling code to net/route_ctl.c After moving the route control plane code from net/route.c, all rtzone users ended up being in net/route_ctl.c. Move uma(9) rtzone setup/teardown code to net/route_ctl.c as well to have everything in a single place. While here, remove custom initializers from the zone. It was added originally to avoid setup/teardown of costy per-cpu couters. With these counters removed, the only remaining job was avoiding rte mutex setup/teardown. Mutex setup is relatively cheap. Additionally, this mutex will soon be removed. With that in mind, there is no sense in keeping custom zone callbacks. Differential Revision: https://reviews.freebsd.org/D26051	2020-08-13 18:35:29 +00:00
Mitchell Horne	f7d79f6c6d	Correctly set error in rt_mpath_unlink It is possible for rn_delete() to return NULL. If this happens, then set *perror to ESRCH, as is done in the rest of the function. Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D25871	2020-08-12 16:43:20 +00:00
Vincenzo Maffione	6d84e76a25	iflib: netmap: improve rxsync to support IFLIB_HAS_RXCQ For drivers with IFLIB_HAS_RXCQ set, there is a separate completion queue. In this case, the netmap rxsync routine needs to update rxq->ifr_cq_cidx in the same way it is updated by iflib_rxeof(). This improves the situation for vmx(4) and bnxt(4) drivers, which use iflib and have the IFLIB_HAS_RXCQ bit set. PR: 248494 MFC after: 3 weeks	2020-08-12 14:45:31 +00:00
Vincenzo Maffione	530960be8d	iflib: refactor netmap_fl_refill and fix off-by-one issue First, fix the initialization of the fl->ifl_rxd_idxs array, which was affected by an off-by-one bug. Once there, refactor the function to use better names for local variables, optimize the variable assignments, and merge the bus_dmamap_sync() inner loop with the outer one. PR: 248494 MFC after: 3 weeks	2020-08-12 14:17:38 +00:00
Alexander V. Chernikov	8a0917c35b	Do not enter epoch in add_route(), as it is already called in epoch. Reviewed by: glebius	2020-08-11 07:23:07 +00:00
Alexander V. Chernikov	136a1f8da8	Make <add\|del\|change>_route() static to finish the transition to the new kpi. Discussed with: glebius	2020-08-11 07:21:32 +00:00
Alexander V. Chernikov	9a00f6d067	Fix rib_subscribe() waitok flag by performing allocation outside epoch. Make in6_inithead() use rib_subscribe with waitok to achieve reliable subscription allocation. Reviewed by: glebius	2020-08-11 07:05:30 +00:00
Vincenzo Maffione	c9d886cd7f	iflib: netmap: drop redundant check The validity of head is already checked by nm_rxsync_prologue(). MFC after: 2 weeks	2020-08-06 21:37:38 +00:00
Vincenzo Maffione	ee07345d20	iflib: netmap: don't increment ifl_cidx on the wrong free list Netmap only uses free list 0 to keep it consistent with its one-to-one mapping between each netmap ring and a device RX (or TX) queue. However, the current iflib_netmap_rxsync() routine was mistakenly updating the ifl_cidx field of both free lists. PR: 248494 MFC after: 2 weeks	2020-08-06 21:32:25 +00:00
Mark Johnston	96ad26eefb	Remove free_domain() and uma_zfree_domain(). These functions were introduced before UMA started ensuring that freed memory gets placed in domain-local caches. They no longer serve any purpose since UMA now provides their functionality by default. Remove them to simplyify the kernel memory allocator interfaces a bit. Reviewed by: cem, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25937	2020-08-04 13:58:36 +00:00
Matt Macy	0ae0e8d2bd	iflib: fix LOR with bpf detach Reported by: grehan@ Approved by: grehan@ MFC after: 1 week Sponsored by: Netgate Differential Revision: https://reviews.freebsd.org/D25530	2020-07-27 01:17:59 +00:00
Alexander V. Chernikov	e1c05fd290	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-21 19:56:13 +00:00
Vincenzo Maffione	ac11d85740	iflib: initialize netmap with the correct number of descriptors In case the network device has a RX or TX control queue, the correct number of TX/RX descriptors is contained in the second entry of the isc_ntxd (or isc_nrxd) array, rather than in the first entry. This case is correctly handled by iflib_device_register() and iflib_pseudo_register(), but not by iflib_netmap_attach(). If the first entry is larger than the second, this can result in a panic. This change fixes the bug by introducing two helper functions that also lead to some code simplification. PR: 247647 MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D25541	2020-07-20 21:08:56 +00:00
Alexander V. Chernikov	725871230d	Temporarly revert r363319 to unbreak the build. Reported by: CI Pointy hat to: melifaro	2020-07-19 10:53:15 +00:00
Alexander V. Chernikov	8cee15d9e4	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-19 09:29:27 +00:00
Kristof Provost	93ed6ade6e	bridge: Don't sleep during epoch While it doesn't trigger INVARIANTS or WITNESS on head it does in stable/12. There's also no reason for it, as we can easily report the out of memory error to the caller (i.e. userspace). All of these can already fail. PR: 248046 MFC after: 3 days	2020-07-18 12:43:11 +00:00
Kyle Evans	cef5fc74c2	tuntap: drop redundant if_mtu assignment in tuncreate ether_ifattach will immediately clobber if_mtu with ETHERMTU anyways, just let it happen. MFC after: 1 week	2020-07-16 15:02:11 +00:00
Kyle Evans	fa2ab81e32	ether_ifattach: set mtu before calling if_attach() if_attach() -> if_attach_internal() will call if_attachdomain1(ifp) any time an ethernet interface is setup after SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_FIRST. This eventually leads to nd6_ifattach() -> nd6_setmtu0() stashing off ifp->if_mtu in ndi->maxmtu before ifp->if_mtu has been properly set in some scenarios, e.g., USB ethernet adapter plugged in later on. For interfaces that are created in early boot, we don't have this issue as domains aren't constructed enough for them to attach and thus it gets deferred to domainifattach at SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_SECOND after the mtu has been set earlier in ether_ifattach(). PR: 248005 Submitted by: Mathew <mjanelle blackberry com> MFC after: 1 week	2020-07-16 13:37:32 +00:00
Alexander V. Chernikov	4c7ba83f9d	Switch inet6 default route subscription to the new rib subscription api. Old subscription model allowed only single customer. Switch inet6 to the new subscription api and eliminate the old model. Differential Revision: https://reviews.freebsd.org/D25615	2020-07-12 11:24:23 +00:00
Alexander V. Chernikov	edc37a66e3	Add destructor for the rib subscription system to simplify users code. Subscriptions are planned to be used by modules such as route lookup engines. In that case that's the module task to properly unsibscribe before detach. However, the in-kernel customer - inet6 wants to track default route changes. To avoid having inet6 store per-fib subscriptions, handle automatic destruction internally. Differential Revision: https://reviews.freebsd.org/D25614	2020-07-12 11:18:09 +00:00
Mark Johnston	26dd427800	Split nhop_ref_object(). Now nhop_ref_object() unconditionally acquires a reference, and the new nhop_try_ref_object() uses refcount_acquire_if_not_zero() to conditionally acquire a reference. Since the former is cheaper, use it when we know that the initial counter value is non-zero. No functional change intended. Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D25535	2020-07-06 21:20:57 +00:00
Mark Johnston	b256d25c50	iflib: Fix some nits in the rx refill code. - Get rid of the ifl_vm_addrs array. It is not used by any existing consumer, so we are just dirtying a couple of cache lines for no reason. - Use uma_zalloc(fl->ifl_zone) instead of m_cljget(). Otherwise m_cljget() is doing unnecessary work to look up the correct zone, when iflib already knows what that zone is. - ifl_gen is only used when INVARIANTS is on, so make that more clear. - Fix some style nits and inconsistencies. Reviewed by: gallatin Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25490	2020-07-06 14:52:21 +00:00
Mark Johnston	a363e1d4d0	iflib: Fix handling of mbuf cluster allocation failures. When refilling an rx freelist, make sure we only update the hardware producer index if at least one cluster was allocated. Otherwise the NIC is programmed to write a previously used cluster, typically resulting in a use-after-free when packet data is written by the hardware. Also make sure that we don't update the fragment index cursor if the last allocation attempt didn't succeed. For at least Intel drivers, iflib assumes that the consumer index and fragment index cursor stay in lockstep, but this assumption was violated in the face of cluster allocation failures. Reported and tested by: pho Reviewed by: gallatin, hselasky MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25489	2020-07-06 14:52:09 +00:00
Alexander V. Chernikov	6ad7446c6f	Complete conversions from fib<4\|6>_lookup_nh_<basic\|ext> to fib<4\|6>_lookup(). fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. With no callers remaining, remove fib[46]_lookup_nh_ functions. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25445	2020-07-02 21:04:08 +00:00
Ryan Moeller	e5539fb618	libifconfig: Add function to get bridge status The new function operates similarly to ifconfig_lagg_get_lagg_status and likewise is accompanied by a function to free the bridge status data structure. I have included in this patch the relocation of some strings describing STP parameters and the PV2ID macro from ifconfig into net/if_bridgevar.h as they are useful for consumers of libifconfig. Reviewed by: kp, melifaro, mmacy Approved by: mmacy (mentor) MFC after: 1 week Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25460	2020-07-01 02:32:41 +00:00
Vincenzo Maffione	9503233f87	iflib: fix compilation issue introduced in r362621 The ifp local variable is useful even without netmap and altq, as it is used to check for IFF_DRV_RUNNING. MFC after: 2 weeks	2020-06-25 20:43:21 +00:00
Vincenzo Maffione	d8b2d26b15	iflib: netmap: add support for partial ring openings Reviewed by: gallatin MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D25254	2020-06-25 19:44:24 +00:00
Vincenzo Maffione	88a688663a	iflib: netmap: add per-tx-queue netmap support Reviewed by: gallatin MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D25253	2020-06-25 19:35:43 +00:00
Vincenzo Maffione	0ff2126795	iflib: netmap: fix rsync index overrun In the current iflib_netmap_rxsync, there is nothing that prevents kring->nr_hwtail to overrun kring->nr_hwcur during the descriptor import phase. This may cause errors in netmap applications, such as: em1 RX0: fail 'head < kring->nr_hwcur \|\| head > kring->nr_hwtail' h 795 c 795 t 282 rh 795 rc 795 rt 282 hc 282 ht 282 Reviewed by: gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25252	2020-06-23 20:23:56 +00:00
Tycho Nightingale	c774294c57	To avoid a startup script race change net.bpf.optimize_writers from CTLFLAG_RW to CTLFLAG_RWTUN to allow it to be modified by a loader tunable. Sponsored by: Dell EMC Isilon	2020-06-23 13:57:53 +00:00
Matt Macy	9aeca21324	iflib: fix cloneattach fail and generalize pseudo device handling - a cloneattach failure will not currently be handled correctly, jump to the right target - pseudo devices are all treat as if they're ethernet devices - this often doesn't make sense MFC after: 1 week Sponsored by: Netgate, Inc. Differential Revision: https://reviews.freebsd.org/D25083	2020-06-21 22:02:49 +00:00
Pawel Biernacki	9daf71541c	net.link.generic.ifdata.<ifindex>.linkspecific: rework handler This OID was added in r17352 but the write path of IFDATA_LINKSPECIFIC seems unused as there are no in-base writers, and as far as I can tell we had issues with this code before, see PR 219472. Drop the write path to make the handler read-only as described in comments and man-pages. It can be marked as MPSAFE now. Reviewed by: bdragon, kib, melifaro, wollman Approved by: kib (mentor) Sponsored by: Mysterious Code Ltd. Differential Revision: https://reviews.freebsd.org/D25348	2020-06-21 18:40:17 +00:00
Vincenzo Maffione	0a182b4c63	iflib: netmap: enter/exit netmap mode after device stops Avoid possible race conditions by calling nm_set_native_flags() and nm_clear_native_flags() only after the device has been stopped. MFC after: 1 week	2020-06-14 21:07:12 +00:00
Ravi Pokala	2a73c8f5e1	Decode the "LACP Fast Timeout" LAGG option flag r286700 added the "lacp_fast_timeout" option to `ifconfig', but we forgot to include the new option in the string used to decode the option bits. Add "LACP_FAST_TIMO" to LAGG_OPT_BITS. Also, s/LAGG_OPT_LACP_TIMEOUT/LAGG_OPT_LACP_FAST_TIMO/g , to be clearer that the flag indicates "Fast Timeout" mode. Reported by: Greg Foster <gfoster at panasas dot com> Reviewed by: jpaetzel MFC after: 1 week Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D25239	2020-06-11 22:46:08 +00:00
Alexander V. Chernikov	a287a973e3	Switch rtsock code to using newly-create rib_action() KPI call. This simplifies the code and allows to further split rtentry and nexthop, removing one of the blockers for multipath code introduction, described in D24141. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25192	2020-06-10 07:46:22 +00:00
Vincenzo Maffione	e136e9c88f	iflib: netmap: honor netmap_irx_irq return values In the receive interrupt routine, always call netmap_rx_irq(). The latter function will return != NM_IRQ_PASS if netmap is not active on that specific receive queue, so that the driver can go on with iflib_rxeof(). Note that netmap supports partial opening, where only a subset of the RX or TX rings can be open in netmap mode. Checking the IFCAP_NETMAP flag is not enough to make sure that the queue is indeed in netmap mode. Moreover, in case netmap_rx_irq() returns NM_IRQ_RESCHED, it means that netmap expects the driver to call netmap_rx_irq() again as soon as possible. Currently, this may happen when the device is attached to a VALE switch. Reviewed by: gallatin MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D25167	2020-06-09 19:15:43 +00:00
John Baldwin	00a4311adc	Refer to AES-CBC as "aes-cbc" rather than "rijndael-cbc" for IPsec. At this point, AES is the more common name for Rijndael128. setkey(8) will still accept the old name, and old constants remain for compatiblity. Reviewed by: cem, bcr (manpages) MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24964	2020-06-04 22:58:37 +00:00
Andrey V. Elsukov	dd4490fdab	Add if_reassing method to all tunneling interfaces. After r339550 tunneling interfaces have started handle appearing and disappearing of ingress IP address on the host system. When such interfaces are moving into VNET jail, they lose ability to properly handle ifaddr_event_ext event. And this leads to need to reconfigure tunnel to make it working again. Since moving an interface into VNET jail leads to removing of all IP addresses, it looks consistent, that tunnel configuration should also be cleared. This is what will do if_reassing method. Reported by: John W. O'Brien <john saltant com> MFC after: 1 week	2020-06-03 13:02:31 +00:00
Alexander V. Chernikov	41e66f4eca	Add rib subscription API. Currently there is no easy way of subscribing for the routing table changes. The only existing way is to set ifa_rtrequest callback in the each protocol ifaddr, which is not convenient or extandable. This change provides generic notification subscription mechanism, that will replace current ifa_rtrequest one and allow other applications such as accelerated routing lookup modules subscribe for the changes. In particular, this change provides 2 hooks: 1) synchronous one (RIB_NOTIFY_IMMEDIATE), called under RIB_WLOCK, which ensures exact ordering of the changes and 2) async one, (RIB_NOTIFY_DELAYED) that is called after the change w/o holding locks. The latter one does not provide any notification ordering guarantee. Differential Revision: https://reviews.freebsd.org/D25070	2020-06-01 21:52:24 +00:00
Alexander V. Chernikov	46cc6153d4	Finish r361706: add sys/net/route/route_ctl.h, missed in previous commit.	2020-06-01 21:51:20 +00:00
Alexander V. Chernikov	da187ddb3d	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:49:42 +00:00
Alexander V. Chernikov	e7403d0230	Revert r361704, it accidentally committed merged D25067 and D25070.	2020-06-01 20:40:40 +00:00
Alexander V. Chernikov	79674562b8	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:32:02 +00:00
Matt Macy	1f93e931d9	Fix panics when using iflib pseudo device support Reviewed by: gallatin@, hselasky@ MFC after: 1 week Sponsored by: Netgate, Inc. Differential Revision: https://reviews.freebsd.org/D23710	2020-05-31 18:42:00 +00:00
John Baldwin	28d2a72bbf	Consistently include opt_ipsec.h for consumers of <netipsec/ipsec.h>. This fixes ipsec.ko to include all of IPSEC_DEBUG. Reviewed by: imp MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25046	2020-05-29 19:22:40 +00:00
Alexander V. Chernikov	cb86ca48bf	Unlock rtentry before calling for epoch(9) destruction as the destruction may happen immediately, leading to panic. Reported by: bdragon	2020-05-28 07:23:27 +00:00
Alexander V. Chernikov	4d2c2509f2	Move <add\|del\|change>_route() functions to route_ctl.c in preparation of multipath control plane changed described in D24141. Currently route.c contains core routing init/teardown functions, route table manipulation functions and various helper functions, resulting in >2KLOC file in total. This change moves most of the route table manipulation parts to a dedicated file, simplifying planned multipath changes and making route.c more manageable. Differential Revision: https://reviews.freebsd.org/D24870	2020-05-23 19:06:57 +00:00
Alexander V. Chernikov	a82f62ec2d	Remove refcounting from rtentry. After making rtentry reclamation backed by epoch(9) in r361409, there is no reason in keeping reference counting code. Differential Revision: https://reviews.freebsd.org/D24867	2020-05-23 12:15:47 +00:00
Alexander V. Chernikov	2bbab0af6d	Use epoch(9) for rtentries to simplify control plane operations. Currently the only reason of refcounting rtentries is the need to report the rtable operation details immediately after the execution. Delaying rtentry reclamation allows to stop refcounting and simplify the code. Additionally, this change allows to reimplement rib_lookup_info(), which is used by some of the customers to get the matching prefix along with nexthops, in more efficient way. The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to nhop_priv to be able to reliably set curvnet even during vnet teardown. Rest of the reference counting code will be removed in the D24867 . Differential Revision: https://reviews.freebsd.org/D24866	2020-05-23 10:21:02 +00:00
Pawel Biernacki	9033ad5f15	sysctl: fix setting net.isr.dispatch during early boot Fix another collateral damage of r357614: netisr is initialised way before malloc() is available hence it can't use sysctl_handle_string() that allocates temporary buffer. Handle that internally in sysctl_netisr_dispatch_policy(). PR: 246114 Reported by: delphij Reviewed by: kib Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D24858	2020-05-16 17:05:44 +00:00
Mark Johnston	c1be839971	pf: Add a new zone for per-table entry counters. Right now we optionally allocate 8 counters per table entry, so in addition to memory consumed by counters, we require 8 pointers worth of space in each entry even when counters are not allocated (the default). Instead, define a UMA zone that returns contiguous per-CPU counter arrays for use in table entries. On amd64 this reduces sizeof(struct pfr_kentry) from 216 to 160. The smaller size also results in better slab efficiency, so memory usage for large tables is reduced by about 28%. Reviewed by: kp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24843	2020-05-16 00:28:12 +00:00
Kyle Evans	c79cee7136	kernel: provide panicky version of __unreachable __builtin_unreachable doesn't raise any compile-time warnings/errors on its own, so problems with its usage can't be easily detected. While it would be nice for this situation to change and compilers to at least add a warning for trivial cases where local state means the instruction can't be reached, this isn't the case at the moment and likely will not happen. This commit adds an __assert_unreachable, whose intent is incredibly clear: it asserts that this instruction is unreachable. On INVARIANTS builds, it's a panic(), and on non-INVARIANTS it expands to __unreachable(). Existing users of __unreachable() are converted to __assert_unreachable, to improve debuggability if this assumption is violated. Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D23793	2020-05-13 18:07:37 +00:00
Warner Losh	83b4342743	Kill trailing newline while I'm here...	2020-05-12 23:46:52 +00:00
Alexander V. Chernikov	4a6ee281d9	Remove unused rnh_close callback from rtable & cleanup depends. rnh_close callbackes was used by the in[6]_clsroute() handlers, doing cleanup in the route cloning code. Route cloning was eliminated somewhere around r186119. Last callback user was eliminated in r186215, 11 years ago. Differential Revision: https://reviews.freebsd.org/D24793	2020-05-11 06:09:18 +00:00
Alexander V. Chernikov	d223372545	Remove rtalloc1(_fib) KPI. Last user of rtalloc1() KPI has been eliminated in rS360631. As kernel is now fully switched to use new routing KPI defined in rS359823, remove old lookup functions. Differential Revision: https://reviews.freebsd.org/D24776	2020-05-10 09:34:48 +00:00
Alexander V. Chernikov	656442a718	Embed dst sockaddr into rtentry and remove rte packet counter Currently each rtentry has dst&gateway allocated separately from another zone, bloating cache accesses. Current 'struct rtentry' has 12 "mandatory" radix pointers in the beginning, leaving 4 usable pointers/32 bytes in the first 2 cache lines (amd64). Fields needed for the datapath are destination sockaddr and rt_nhop. So far it doesn't look like there is other routable addressing protocol other than IPv4/IPv6/MPLS, which uses keys longer than 20 bytes. With that in mind, embed dst into struct rtentry, making the first 24 bytes of rtentry within 128 bytes. That is enough to make IPv6 address within first 128 bytes. It is still pretty easy to add code for supporting separately-allocated dst, however it doesn't make a lot of sense in having such code without a use case. As rS359823 moved the gateway to the nexthop structure, the dst embedding change removes the need for any additional allocations done by rt_setgate(). Lastly, as a part of cleanup, remove counter(9) allocation code, as this field is not used in packet processing anymore. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24669	2020-05-08 21:06:10 +00:00
Alexander V. Chernikov	682b902daf	Add rib_lookup() sockaddr lookup wrapper and make ifa_ifwithroute use it. Create rib_lookup() wrapper around per-af dataplane lookup functions. This will help in the cases of having control plane af-agnostic code. Switch ifa_ifwithroute() to use this function instead of rtalloc1(). Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24731	2020-05-07 08:11:36 +00:00
Alexander V. Chernikov	d94be4ccaa	Switch DDB show route to direct rnh_matchaddr() call instead of rtalloc1(). Eliminate the last rtalloc1() call to finish transition to the new routing KPI defined in r359823. Differential Revision: https://reviews.freebsd.org/D24663	2020-05-04 15:07:57 +00:00
Alexander V. Chernikov	4f08f052ad	Simplify address parsing in DDB show route command. Use db_get_line() to overcome parser limitation. Differential Revision: https://reviews.freebsd.org/D24662	2020-05-04 15:00:19 +00:00
Alexander V. Chernikov	9e02229580	Remove now-unused rt_ifp,rt_ifa,rt_gateway,rt_mtu rte fields. After converting routing subsystem customers to use nexthop objects defined in r359823, some fields in struct rtentry became unused. This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry along with the code initializing and updating these fields. Cleanup of the remaining fields will be addressed by D24669. This commit also changes the implementation of the RTM_CHANGE handling. Old implementation tried to perform the whole operation under radix WLOCK, resulting in slow performance and hacks like using RTF_RNH_LOCKED flag. New implementation looks up the route nexthop under radix RLOCK, creates new nexthop and tries to update rte nhop pointer. Only last part is done under WLOCK. In the hypothetical scenarious where multiple rtsock clients repeatedly issue RTM_CHANGE requests for the same route, route may get updated between read and update operation. This is addressed by retrying the operation multiple (3) times before returning failure back to the caller. Differential Revision: https://reviews.freebsd.org/D24666	2020-05-04 14:31:45 +00:00
Mark Johnston	814fa34dfb	Increase the iflib txq callout mutex name length to 32 bytes. With a length of 16, the name ("<if name>:TX(<qid>):callout") typically gets truncated. PR: 245712 Reported by: ghuckriede@blackberry.com MFC after: 1 week	2020-04-30 15:39:04 +00:00
Alexander V. Chernikov	8c61eb2107	Convert more rtentry field accesses into nhop fields accesses. Continue routing subsystem conversion to nhop objects defined in r359823. Use fields from nhop structure instead of "struct rtentry" fields. This is one of the last changes prior to removing rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry. Differential Revision: https://reviews.freebsd.org/D24609	2020-04-29 21:54:09 +00:00
Alexander V. Chernikov	74787ef47b	Add nhop to the ifa_rtrequest() callback. With the upcoming multipath changes described in D24141, rt->rt_nhop can potentially point to a nexthop group instead of an individual nhop. To simplify caller handling of such cases, change ifa_rtrequest() callback to pass changed nhop directly. Differential Revision: https://reviews.freebsd.org/D24604	2020-04-29 19:28:56 +00:00

... 2 3 4 5 6 ...

4694 Commits