freebsd-dev

Author	SHA1	Message	Date
Alexander V. Chernikov	730bfa2805	routing: add rib_match_gw() helper Finish `02e05b8fae`: * add gateway matcher function that can be used in rib_del_route_px() or any rib_walk-family functions. It will be used in the upcoming migration to the new KPI * rename gw_fulter_func to match_gw_one() to better signal the function purpose / semantic. MFC after: 1 month	2022-08-12 09:31:21 +00:00
Mateusz Guzik	f73e4f6c58	routing: unbreak the build of a bunch of kernels Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-08-11 21:50:37 +00:00
Alexander V. Chernikov	d8b42ddcac	rtsock: subscribe to ifnet eventhandlers instead of direct calls. Stop treating rtsock as a "special" consumer and use already-provided ifaddr arrival/departure notifications. MFC after: 2 weeks Test Plan: ``` 21:05 [0] m@devel0 route -n monitor -> ifconfig vtnet0.2 create got message of size 24 on Tue Aug 9 21:05:44 2022 RTM_IFANNOUNCE: interface arrival/departure: len 24, if# 3, what: arrival got message of size 168 on Tue Aug 9 21:05:54 2022 RTM_IFINFO: iface status change: len 168, if# 3, link: up, flags:<BROADCAST,RUNNING,SIMPLEX,MULTICAST> -> ifconfig vtnet0.2 destroy got message of size 24 on Tue Aug 9 21:05:54 2022 RTM_IFANNOUNCE: interface arrival/departure: len 24, if# 3, what: departure ``` Reviewed By: glebius Differential Revision: https://reviews.freebsd.org/D36095 MFC after: 2 weeks	2022-08-11 20:36:59 +00:00
Gleb Smirnoff	f63cb32c19	Retire 4.4BSD raw sockets Until today the remnants of the original code had provided some aid in implementation of routing socket and IPSEC key socket. There were more obfuscation rather than generalisation with this aid. A historical reference on the original idea of the raw sockets can be found in chapter 11 of 4.4BSD System Manager Manual: https://raw.githubusercontent.com/sergev/4.4BSD-Lite2/master/usr/share/doc/smm/18.net.pdf Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36124	2022-08-11 09:19:36 -07:00
Gleb Smirnoff	36b10ac2cd	rtsock: do not use raw socket code This makes routing socket implementation self contained and removes one of the last dependencies on the raw socket code and pr_output method. There are very subtle API visible changes: - now routing socket would return EOPNOTSUPP instead of EINVAL on syscalls that are not supposed to be called on a routing socket. - routing socket buffer sizes are now controlled by net.rtsock sysctls instead of net.raw. The latter were not documented anywhere, and even Internet search doesn't find any references or discussions related to these sysctls. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36122	2022-08-11 09:19:36 -07:00
Gleb Smirnoff	d94ec7490d	rtsock: do not allocate mbufs_tags(9) just to store a 8-bit value Use local storage of the mbuf packet header instead. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36121	2022-08-11 09:19:36 -07:00
Gleb Smirnoff	b8103ca76d	netinet: get interface event notifications directly via EVENTHANDLER(9) The old mechanism of getting them via domains/protocols control input is a relict from the previous century, when nothing like EVENTHANDLER(9) existed yet. Retire PRC_IFDOWN/PRC_IFUP as netinet was the only one to use them. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36116	2022-08-11 09:19:36 -07:00
Mateusz Guzik	69077c81e5	routing: fix non-debug build Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-08-11 14:12:59 +00:00
Alexander V. Chernikov	40503b792f	routing: populate fibs with interface routes after growing net.fibs. Currently it is possible to extend number of fibs in runtime, but this functionality is of limited use when net.add_addrs_all_fibs is non-zero, as the routing tables are created empty. This change automatically populate newly-created fibs with the kernel-originated interface routes (filtered by RTF_PINNED flag) if net.add_addrs_all_fibs is set. ``` -> sysctl net.add_addr_allfibs=1 net.add_addr_allfibs: 0 -> 1 -> sysctl net.fibs net.fibs: 2 -> sysctl net.fibs=3 net.fibs: 2 -> 3 BEFORE: -> setfib 2 netstat -rn Routing tables (fib: 2) AFTER: -> setfib 2 netstat -rn Routing tables (fib: 2) Internet: Destination Gateway Flags Netif Expire 10.0.0.0/24 link#1 U vtnet0 10.0.0.5 link#1 UHS lo0 127.0.0.1 link#2 UH lo0 Internet6: Destination Gateway Flags Netif Expire ::1 link#2 UHS lo0 2a01:4f9:3a:fa00::/64 link#1 U vtnet0 2a01:4f9:3a:fa00:5054:ff:fe15:4a3b link#1 UHS lo0 fe80::%vtnet0/64 link#1 U vtnet0 fe80::5054:ff:fe15:4a3b%vtnet0 link#1 UHS lo0 fe80::%lo0/64 link#2 U lo0 fe80::1%lo0 link#2 UHS lo0 ``` Differential Revision: https://reviews.freebsd.org/D36075 MFC after: 1 month	2022-08-11 12:48:08 +00:00
Alexander V. Chernikov	02e05b8fae	routing: fixup empty mask prefix handling after `2ce553854c`. MFC after: 1 month	2022-08-11 12:48:04 +00:00
Alexander V. Chernikov	258828d03b	routing: fix build warning without ROUTE_MPATH Reported by: Gary Jennejohn <garyj@gmx.de> MFC after: 1 month	2022-08-11 09:47:26 +00:00
Kristof Provost	fd6b3bede5	if_ovpn: reject non-UDP sockets We must ensure that the fd provided by userspace is really for a UDP socket. If it's not we'll panic in udp_set_kernel_tunneling(). Reported by: Gert Doering <gert@greenie.muc.de> Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-08-11 10:40:03 +02:00
Alexander V. Chernikov	685866bbe1	routing: fix build without ROUTE_MPATH MFC after: 1 month	2022-08-10 20:45:22 +00:00
Alexander V. Chernikov	5c4d2252d7	routing: move rtentry and subscription code out of route_ctl.c route_ctl.c size has grown considerably since initial introduction. Factor out non-relevant parts: * all rtentry logic, such as creation/destruction and accessors goes to net/route/route_rtentry.c * all rtable subscription logic goes to net/route/route_subscription.c Differential Revision: https://reviews.freebsd.org/D36074 MFC after: 1 month	2022-08-10 18:56:01 +00:00
Alexander V. Chernikov	2ce553854c	routing: add rib_<add\|del>_route_px() functions operating with nexthops. This change adds public KPI to work with routes using pre-created nexthops, instead of using data from addrinfo structures. These functions will be later used for adding/deleting kernel-originated routes and upcoming netlink protocol. As a part of providing this KPI, low-level route addition code has been reworked to provide more control over route creation or change. Specifically, a number of operation flags (RTM_F_<CREATE\|EXCL\|REPLACE\|APPEND>) have been added, defining the desired behaviour the the route already exists (or not exists). This change required some changes in the multipath addition code, resulting in moving this code to route_ctl.c, rendering mpath_ctl.c empty. Differential Revision: https://reviews.freebsd.org/D36073 MFC after: 1 month	2022-08-10 18:56:01 +00:00
Alexander V. Chernikov	66230639ce	routing: split nexthop creation and rtentry creation. This change is required for the upcoming introduction of the next nexhop-based operations KPI, as it will create rtentry and nexthops at different stages of route table modification. Differential Revision: https://reviews.freebsd.org/D36072 MFC after: 2 weeks	2022-08-10 18:27:13 +00:00
Alexander V. Chernikov	dedeec1143	routing: refactor #2 * Use same filter func (rib_filter_f_t) for nexhtop groups to simplify callbacks. * simplify conditional route deletion & remove the need to pass rt_addrinfo to the low-level deletion functions * speedup rib_walk_del() by removing an additional per-prefix lookup Differential Revision: https://reviews.freebsd.org/D36071 MFC after: 1 month	2022-08-10 18:20:21 +00:00
Alexander V. Chernikov	0d60e88b41	routing: refactor control cmds #1 This and the follow-up routing-related changes target to remove or reduce `struct rt_addrinfo` usage and use recently-landed nhop(9) KPI instead. Traditionally `rt_addrinfo` structure has been used to propagate all necessary information between the protocol/rtsock and a routing layer. Many functions inside routing subsystem uses it internally. However, using this structure became somewhat complicated, as there are too many ways of specifying a single state and verifying data consistency is hard. For example, arerouting flgs consistent with mask/gateway sockaddr pointers? Is mask really a host mask? Are sockaddr "valid" (e.g. properly zeroed, masked, have proper length)? Are they mutable? Is the suggested interface specified by the interface index embedded into the sockadd_dl gateway, or passed as RTAX_IFP parameter, or directly provided by rti_ifp or it needs to be derived from the ifa? These (and other similar) questions have to be considered every time when a function has `rt_addrinfo` pointer as an argument. The new approach is to bring more control back to the protocols and construct the desired routing objects themselves - in the end, it's the protocol/subsystem who knows the desired outcome. This specific diff changes the following: * add explicit basic low-level radix operations: add_route() (renamed from add_route_nhop()) delete_route() (factored from change_route_nhop()) change_route() (renamed from change_route_nhop) * remove "info" parameter from change_route_conditional() as a part of reducing rt_addrinfo usage in the internal KPIs * add lookup_prefix_rt() wrapper for doing re-lookups after RIB lock/unlock Differential Revision: https://reviews.freebsd.org/D36070 MFC after: 2 weeks	2022-08-10 18:20:20 +00:00
Gordon Bergling	b2b1bb0410	debugnet: Fix a typo in a source code comment - s/paramaters/parameters/ MFC after: 3 days	2022-08-07 16:07:01 +02:00
Alexander V. Chernikov	93dd3adac7	fib_algo: set vnet when destroying algo instance Reported by: Konrad Kręciwilk <konrad.kreciwilk@korbank.pl> MFC after: 2 weeks	2022-08-06 12:51:22 +00:00
Mark Johnston	220818ac03	bpf: Fix BIOCPROMISC locking BPF might put an interface in promiscuous mode when handling the BIOCSDLT ioctl. When this happens, a flag is set in the BPF descriptor so that the old interface can be restored when the BPF descriptor is destroyed. The BIOCPROMISC ioctl can also be used to put a BPF descriptor's interface into promiscuous mode, but there was nothing synchronizing the flag. Fix this by modifying the ioctl handler to acquire the global BPF mutex, which is used to synchronize ifpromisc() calls elsewhere in BPF. Reviewed by: kp, melifaro MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D36045	2022-08-05 16:26:34 -04:00
Kristof Provost	8449762738	if_ovpn: fix unused functions with NOINET / NOINET6 ovpn_find_peer_by_ip() is not used if INET is not defined. Do not define the function in that case. Same for ovpn_find_peer_by_ip6(). Fix these warnings: /usr/src/sys/net/if_ovpn.c:1580:1: warning: unused function 'ovpn_find_peer_by_ip' [-Wunused-function] ovpn_find_peer_by_ip(struct ovpn_softc sc, const struct in_addr addr) ^ /usr/src/sys/net/if_ovpn.c:1599:1: warning: unused function 'ovpn_find_peer_by_ip6' [-Wunused-function] ovpn_find_peer_by_ip6(struct ovpn_softc sc, const struct in6_addr *addr) ^ Reported by: mjg Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-08-04 14:00:32 +02:00
Alexander V. Chernikov	d46b000ecc	routing: remove duplicate error message after `5c23343b8c`. MFC after: 2 weeks	2022-08-04 09:53:58 +00:00
Mateusz Guzik	412bdb5a46	route: fix NOIP builds Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-08-03 21:23:32 +00:00
Alexander V. Chernikov	ae6bfd12c8	routing: refactor private KPI * Make nhgrp_get_nhops() return const struct weightened_nhop to indicate that the list is immutable * Make nhgrp_get_group() return the actual group, instead of group+weight. MFC after: 2 weeks	2022-08-01 10:02:12 +00:00
Alexander V. Chernikov	5c23343b8c	routing: convert remnants of DPRINTF to FIB_CTL_LOG(). Convert the last remaining pieces of old-style debug messages to the new debugging framework. Differential Revision: https://reviews.freebsd.org/D35994 MFC after: 2 weeks	2022-08-01 08:55:07 +00:00
Alexander V. Chernikov	800c68469b	routing: add nhop(9) kpi. Differential Revision: https://reviews.freebsd.org/D35985 MFC after: 1 month	2022-08-01 08:52:26 +00:00
Alexander V. Chernikov	29029b06a6	routing: remove info argument from add/change_route_nhop(). Currently, rt_addrinfo(info) serves as a main "transport" moving state between various functions inside the routing subsystem. As all of the fields are filled in directly by the customers, it is problematic to maintain consistency, resulting in repeated checks inside many functions. Additionally, there are multiple ways of specifying the same value (RTAX_IFP vs rti_ifp / rti_ifa) and so on. With the upcoming nhop(9) kpi it is possible to store all of the required state in the nexthops in the consistent fashion, reducing the need to use "info" in the KPI calls. Finally, rt_addrinfo structure format was derived from the rtsock wire format, which is different from other kernel routing users or netlink. This cleanup simplifies upcoming nhop(9) kpi and netlink introduction. Reviewed by: zlei.huang@gmail.com Differential Revision: https://reviews.freebsd.org/D35972 MFC after: 2 weeks	2022-08-01 07:41:07 +00:00
Alexander V. Chernikov	97ffaff859	net: constantify radix.c functions Mark dst/mask public API functions fields as const to clearly indicate that these parameters are not modified or stored in the datastructure. Differential Revision: https://reviews.freebsd.org/D35971 MFC after: 2 weeks	2022-08-01 07:32:40 +00:00
Alexander V. Chernikov	2717e958df	routing: move route expiration time to its nexthop Expiration time is actually a path property, not a route property. Move its storage to nexthop to simplify upcoming nhop(9) KPI changes and netlink introduction. Differential Revision: https://reviews.freebsd.org/D35970 MFC after: 2 weeks	2022-08-01 07:26:53 +00:00
Alexander V. Chernikov	27f107e1b4	routing: add debug printing helpers for rtentry and RTM* cmds. MFC after: 2 weeks	2022-07-31 09:01:42 +00:00
Zhenlei Huang	150486f6a9	Introduce and use the NET_EPOCH_DRAIN_CALLBACKS() macro Reviewed by: melifao, kp Differential Revision: https://reviews.freebsd.org/D35968	2022-07-29 21:21:10 +02:00
James Skon	13890d30f8	altq: improve pfctl config time for large numbers of queues In the current implementation of altq_hfsc.c, whne new queues are being added (by pfctl), each queue is added to the tail of the siblings linked list under the parent queue. On a system with many queues (50,000+) this leads to very long load times at the insertion process must scan the entire list for every new queue, Since this list is unordered, this changes merely adds the new queue to the head of the list rather than the tail. Reviewed by: kp MFC after: 3 weeks Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D35964	2022-07-28 22:00:07 +02:00
Andrew Gallatin	713ceb99b6	lagg: fix lagg ifioctl after SIOCSIFCAPNV Lagg was broken by SIOCSIFCAPNV when all underlying devices support SIOCSIFCAPNV. This change updates lagg to work with SIOCSIFCAPNV and if_capabilities2. Reviewed by: kib, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35865	2022-07-28 10:39:00 -04:00
Dimitry Andric	5e1097f83c	Adjust function definitions in route_ctl.c to avoid clang 15 warnings With clang 15, the following -Werror warnings are produced: sys/net/route/route_ctl.c:130:17: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] vnet_rtzone_init() ^ void sys/net/route/route_ctl.c:139:20: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] vnet_rtzone_destroy() ^ void This is because vnet_rtzone_init() and vnet_rtzone_destroy() are declared with (void) argument lists, but defined with empty argument lists. Make the definitions match the declarations. MFC after: 3 days	2022-07-26 21:25:09 +02:00
Dimitry Andric	a8adf13a63	Adjust function definition in nhop_ctl.c to avoid clang 15 warnings With clang 15, the following -Werror warning is produced: sys/net/route/nhop_ctl.c:508:21: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] alloc_nhop_structure() ^ void This is alloc_nhop_structure() is declared with a (void) argument list, but defined with an empty argument list. Make the definition match the declaration. MFC after: 3 days	2022-07-26 21:25:09 +02:00
Kristof Provost	151abc80cd	if_vlan: avoid hash table thrashing when adding and removing entries vlan_remhash() uses incorrect value for b. When using the default value for VLAN_DEF_HWIDTH (4), the VLAN hash-list table expands from 16 chains to 32 chains as the 129th entry is added. trunk->hwidth becomes 5. Say a few more entries are added and there are now 135 entries. trunk-hwidth will still be 5. If an entry is removed, vlan_remhash() will calculate a value of 32 for b. refcnt will be decremented to 134. The if comparison at line 473 will return true and vlan_growhash() will be called. The VLAN hash-list table will be compressed from 32 chains wide to 16 chains wide. hwidth will become 4. This is an error, and it can be seen when a new VLAN is added. The table will again be expanded. If an entry is then removed, again the table is contracted. If the number of VLANS stays in the range of 128-512, each time an insert follows a remove, the table will expand. Each time a remove follows an insert, the table will be contracted. The fix is simple. The line 473 should test that the number of entries has decreased such that the table should be contracted using what would be the new value of hwidth. line 467 should be: b = 1 << (trunk->hwidth - 1); PR: 265382 Reviewed by: kp MFC after: 2 weeks Sponsored by: NetApp, Inc.	2022-07-22 19:18:41 +02:00
Dimitry Andric	0294e95da4	Fix unused variable warning in iflib.c With clang 15, the following -Werror warning is produced: sys/net/iflib.c:993:8: error: variable 'n' set but not used [-Werror,-Wunused-but-set-variable] u_int n; ^ The 'n' variable appears to have been a debugging aid that has never been used for anything, so remove it. MFC after: 3 days	2022-07-21 21:19:39 +02:00
Dimitry Andric	fa267a329f	Fix unused variable warning in if_lagg.c With clang 15, the following -Werror warning is produced: sys/net/if_lagg.c:2413:6: error: variable 'active_ports' set but not used [-Werror,-Wunused-but-set-variable] int active_ports = 0; ^ The 'active_ports' variable appears to have been a debugging aid that has never been used for anything (ref https://reviews.freebsd.org/D549), so remove it. MFC after: 3 days	2022-07-21 21:05:51 +02:00
Kristof Provost	663f556b03	if_vlan: allow vlan and vlanproto to be changed It's currently not possible to change the vlan ID or vlan protocol (i.e. 802.1q vs. 802.1ad) without de-configuring the interface (i.e. ifconfig vlanX -vlandev). Add a specific flow for this, allowing both the protocol and id (but not parent interface) to be changed without going through the '-vlandev' step. Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D35846	2022-07-21 18:36:01 +02:00
Mitchell Horne	c84c5e00ac	ddb: annotate some commands with DB_CMD_MEMSAFE This is not completely exhaustive, but covers a large majority of commands in the tree. Reviewed by: markj Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D35583	2022-07-18 22:06:09 +00:00
Mike Karels	efe58855f3	IPv4: experimental changes to allow net 0/8, 240/4, part of 127/8 Combined changes to allow experimentation with net 0/8 (network 0), 240/4 (Experimental/"Class E"), and part of the loopback net 127/8 (all but 127.0/16). All changes are disabled by default, and can be enabled by the following sysctls: net.inet.ip.allow_net0=1 net.inet.ip.allow_net240=1 net.inet.ip.loopback_prefixlen=16 When enabled, the corresponding addresses can be used as normal unicast IP addresses, both as endpoints and when forwarding. Add descriptions of the new sysctls to inet.4. Add <machine/param.h> to vnet.h, as CACHE_LINE_SIZE is undefined in various C files when in.h includes vnet.h. The proposals motivating this experimentation can be found in https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0 https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240 https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127 Reviewed by: rgrimes, pauamma_gundo.com; previous versions melifaro, glebius Differential Revision: https://reviews.freebsd.org/D35741	2022-07-13 09:46:05 -05:00
Kristof Provost	59219dde9a	if_ovpn: fix mbuf leak If the link is down or we can't find a peer we do not transmit the packet, but also don't fee it. Remember to m_freem() mbufs we can't transmit. Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-07-12 14:19:25 +02:00
Zhenlei Huang	7f7a804ae0	vxlan: Add support for socket ioctls SIOC[SG]TUNFIB Submitted by: Luiz Amaral <email@luiz.eng.br> PR: 244004 Differential Revision: https://reviews.freebsd.org/D32820 MFC after: 2 weeks	2022-07-08 18:14:19 +00:00
Kristof Provost	37f604b49d	vnet: make VNET_FOREACH() always be a loop VNET_FOREACH() is a LIST_FOREACH if VIMAGE is set, but empty if it's not. This means that users of the macro couldn't use 'continue' or 'break' as one would expect of a loop. Change VNET_FOREACH() to be a loop in all cases (although one that is fixed to one iteration if VIMAGE is not set). Reviewed by: karels, melifaro, glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D35739	2022-07-07 09:52:21 +02:00
Kristof Provost	6ba6c05cb2	if_ovpn: deal with short packets If we receive a UDP packet (directed towards an active OpenVPN socket) which is too short to contain an OpenVPN header ('struct ovpn_wire_header') we wound up making m_copydata() read outside the mbuf, and panicking the machine. Explicitly check that the packet is long enough to copy the data we're interested in. If it's not we will pass the packet to userspace, just like we'd do for an unknown peer. Extend a test case to provoke this situation. Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-07-05 19:27:00 +02:00
Mitchell Horne	258958b3c7	ddb: use _FLAGS command macros where appropriate Some command definitions were forced to use DB_FUNC in order to specify their required flags, CS_OWN or CS_MORE. Use the new macros to simplify these. Reviewed by: markj, jhb MFC after: 3 days Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D35582	2022-07-05 11:56:55 -03:00
Mateusz Guzik	db4b40213a	routing: hide notify_add and notify_del behind ROUTE_MPATH Fixes a warn about unused routines without the option. Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-07-04 08:38:13 +00:00
Gordon Bergling	e8b7972cfe	if_clone: Fix a typo in a source code comment - s/fucntions/functions/ MFC ater: 3 days	2022-07-03 15:13:32 +02:00
Kristof Provost	6c77f8f0e0	if_ovpn: handle m_pullup() failure Ensure we correctly handle m_pullup() failing in ovpn_finish_rx(). Reported by: Coverity (CID 1490340) Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-07-01 10:02:32 +02:00
Kristof Provost	9f7c81eb33	if_ovpn: deal with v4 mapped IPv6 addresses Openvpn defaults to binding to IPv6 sockets (with setsockopt(IPV6_V6ONLY=0)), which we didn't deal with. That resulted in us trying to in6_selectsrc_addr() on a v4 mapped v6 address, which does not work. Instead we translate the mapped address to v4 and treat it as an IPv4 address. Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-07-01 10:02:32 +02:00
Kristof Provost	b33308db39	if_ovpn: static probe points Sprinkle a few SDTs around if_ovpn to ease debugging. Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-06-28 13:50:54 +02:00
Kristof Provost	ab91feabcc	ovpn: Introduce OpenVPN DCO support OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing (i.e. tunneling and cryptography) into the kernel, rather than using tap devices. This avoids significant copying and context switching overhead between kernel and user space and improves OpenVPN throughput. In my test setup throughput improved from around 660Mbit/s to around 2Gbit/s. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34340	2022-06-28 11:33:10 +02:00
Alexander V. Chernikov	8010b7a78a	routing: simplify decompose_change_notification(). The function's goal is to compare old/new nhop/nexthop group for the route and decompose it into the series of RTM_ADD/RTM_DELETE single-nhop events, calling specified callback for each event. Simplify it by properly leveraging the fact that both old/new groups are sorted nhop-# ascending. Tested by: Claudio Jeker<claudio.jeker@klarasystems.com> Differential Revision: https://reviews.freebsd.org/D35598 MFC after: 2 weeks	2022-06-27 17:30:52 +00:00
Alexander V. Chernikov	76f1ab8eff	routing: actually sort nexthops in nhgs by their index Nexthops in the nexthop groups needs to be deterministically sorted by some their property to simplify reporting cost when changing large nexthop groups. Fix reporting by actually sorting next hops by their indices (`wn_cmp_idx()`). As calc_min_mpath_slots_fast() has an assumption that next hops are sorted using their relative weight in the nexthop groups, it needs to be addressed as well. The latter sorting is required to quickly determine the layout of the next hops in the actual forwarding group. For example, what's the best way to split the traffic between nhops with weights 19,31 and 47 if the maximum nexthop group width is 64? It is worth mentioning that such sorting is only required during nexthop group creation and is not used elsewhere. Lastly, normally all nexthop are of the same weight. With that in mind, (a) use spare 32 bytes inside `struct weightened_nexthop` to avoid another memory allocation and (b) use insertion sort to sort the nexthop weights. Reported by: thj Tested by: Claudio Jeker<claudio.jeker@klarasystems.com> Differential Revision: https://reviews.freebsd.org/D35599 MFC after: 2 weeks	2022-06-27 17:30:52 +00:00
Kristof Provost	1865ebfb12	if_bridge: change MTU for new members Rather than reject new bridge members because they have the wrong MTU change it to match the bridge. If that fails, reject the new interface. PR: 264883 Different Revision: https://reviews.freebsd.org/D35597	2022-06-27 08:27:27 +02:00
Alexander V. Chernikov	33a0803f00	routing: fix debug headers added in `6fa8ed43ee` #2 . Move debug declaration out of COMPAT_FREEBSD32 in rtsock.c MFC after: 2 weeks	2022-06-26 07:28:15 +00:00
Alexander V. Chernikov	0e87bab6b4	routing: fix debug headers added in `6fa8ed43ee`. - move debug headers out of COMPAT_FREEBSD32 in rtsock.c - remove accidentally-added LOG_ defines from syslog.h MFC after: 2 weeks	2022-06-25 23:05:25 +00:00
Alexander V. Chernikov	76179e400a	routing: fix syslog include for rtsock.c MFC after: 2 weeks	2022-06-25 22:08:10 +00:00
Alexander V. Chernikov	6fa8ed43ee	routing: improve debugging. Use unified guidelines for the severity across the routing subsystem. Update severity for some of the already-used messages to adhere the guidelines. Convert rtsock logging to the new FIB_ reporting format. MFC after: 2 weeks	2022-06-25 19:53:31 +00:00
Alexander V. Chernikov	c260d5cd8e	routing: fix crash when RTM_CHANGE results in no-op for the multipath route. Reporting logic assumed there is always some nhop change for every successful modification operation. Explicitly check that the changed nexthop indeed exists when reporting back to userland. MFC after: 2 weeks Reported by: Claudio Jeker <claudio.jeker@klarasystems.com> Tested by: Claudio Jeker <claudio.jeker@klarasystems.com>	2022-06-25 19:35:09 +00:00
Alexander V. Chernikov	c38da70c28	routing: fix RTM_CHANGE nhgroup updates. RTM_CHANGE operates on a single component of the multipath route (e.g. on a single nexthop). Search of this nexthop is peformed by iterating over each component from multipath (nexthop) group, using check_info_match_nhop. The problem with the current code that it incorrectly assumes that `check_info_match_nhop()` returns true value on match, while in reality it returns an error code on failure). Fix this by properly comparing the result with 0. Additionally, the followup code modified original necthop group instead of a new one. Fix this by targetting new nexthop group instead. Reported by: thj Tested by: Claudio Jeker <claudio.jeker@klarasystems.com> Differential Revision: https://reviews.freebsd.org/D35526 MFC after: 2 weeks	2022-06-25 18:54:57 +00:00
Alexander V. Chernikov	5d6894bd66	routing: improve debug logging Use standard logging (FIB_XX_LOG) across nhg code instead of using old-style DPRINTFs. Add debug object printer for nhgs (`nhgrp_print_buf`). Example: ``` Jun 19 20:17:09 devel2 kernel: [nhgrp] inet.0 nhgrp_ctl_alloc_default: multipath init done Jun 19 20:17:09 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 2, compiled_nhop: 2 Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 3, compiled_nhop: 3 Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 destroy_nhgrp: destroying nhg#0/sz=2:[#6:1,#5:1] ``` Differential Revision: https://reviews.freebsd.org/D35525 MFC after: 2 weeks	2022-06-22 15:59:21 +00:00
Mark Johnston	60b4ad4b6b	bpf: Zero pad bytes preceding BPF headers BPF headers are word-aligned when copied into the store buffer. Ensure that pad bytes following the preceding packet are cleared. Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-06-20 12:48:13 -04:00
Mark Johnston	c88f6908b4	bpf: Correct a comment MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-06-20 12:48:13 -04:00
Kristof Provost	1f61367f8d	pf: support matching on tags for Ethernet rules Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D35362	2022-06-20 10:16:20 +02:00
Mark Johnston	c262d5e877	debugnet: Fix an error handling bug in the DDB command tokenizer MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2022-06-16 10:05:10 -04:00
Mark Johnston	8414331481	debugnet: Handle batches of packets from if_input Some drivers will collect multiple mbuf chains, linked by m_nextpkt, before passing them to upper layers. debugnet_pkt_in() didn't handle this and would process only the first packet, typically leading to retransmits. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2022-06-16 10:02:00 -04:00
Andrew Gallatin	43c72c45a1	lacp: Remove racy kassert In lacp_select_tx_port_by_hash(), we assert that the selected port is DISTRIBUTING. However, the port state is protected by the LACP_LOCK(), which is not held around lacp_select_tx_port_by_hash(). So this assertion is racy, and can result in a spurious panic when links are flapping. It is certainly possible to fix it by acquiring LACP_LOCK(), but this seems like an early development assert, and it seems best to just remove it, rather than add complexity inside an ifdef INVARIANTS. Sponsored by: Netflix Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D35396	2022-06-13 11:32:10 -04:00
Hans Petter Selasky	892eded5b8	vlan(4): Add support for allocating TLS receive tags. The TLS receive tags are allocated directly from the receiving interface, because mbufs are flowing in the opposite direction and then route change checks are not useful, because they only work for outgoing traffic. Differential revision: https://reviews.freebsd.org/D32356 Sponsored by: NVIDIA Networking	2022-06-07 12:54:42 +02:00
Hans Petter Selasky	1967e31379	lagg(4): Add support for allocating TLS receive tags. The TLS receive tags are allocated directly from the receiving interface, because mbufs are flowing in the opposite direction and then route change checks are not useful, because they only work for outgoing traffic. Differential revision: https://reviews.freebsd.org/D32356 Sponsored by: NVIDIA Networking	2022-06-07 12:54:42 +02:00
Gordon Bergling	4f493559b0	if_llatbl: Fix a typo in a debug statement - s/droped/dropped/ Obtained from: NetBSD MFC after: 3 days	2022-06-04 15:22:09 +02:00
Gordon Bergling	f7faa4ad48	if_bridge(4): Fix a typo in a source code comment - s/accross/across/ MFC after: 3 days	2022-06-04 11:26:01 +02:00
Arseny Smalyuk	d18b4bec98	netinet6: Fix mbuf leak in NDP Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d. It happens because lltable_drop_entry_queue() rely on `la_numheld` counter when dropping NDP entries (lles). It turned out NDP code never increased `la_numheld`, so the actual free never happened. Fix the issue by introducing unified lltable_append_entry_queue(), common for both ARP and NDP code, properly addressing packet queue maintenance. Reviewed By: melifaro Differential Revision: https://reviews.freebsd.org/D35365 MFC after: 2 weeks	2022-05-31 21:06:14 +00:00
KUROSAWA Takahiro	d6cd20cc5c	netinet6: fix ndp proxying We could insert proxy NDP entries by the ndp command, but the host with proxy ndp entries had not responded to Neighbor Solicitations. Change the following points for proxy NDP to work as expected: * join solicited-node multicast addresses for proxy NDP entries in order to receive Neighbor Solicitations. * look up proxy NDP entries not on the routing table but on the link-level address table when receiving Neighbor Solicitations. Reviewed By: melifaro Differential Revision: https://reviews.freebsd.org/D35307 MFC after: 2 weeks	2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro	77001f9b6d	lltable: introduce the llt_post_resolved callback In order to decrease ifdef INET/INET6s in the lltable implementation, introduce the llt_post_resolved callback and implement protocol-dependent code in the protocol-dependent part. Reviewed By: melifaro Differential Revision: https://reviews.freebsd.org/D35322 MFC after: 2 weeks	2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro	3719dedb91	lltable: use sa_family_t instead of int for lltable.llt_af Reviewed By: melifaro, #network Differential Revision: https://reviews.freebsd.org/D35323 MFC after: 2 weeks	2022-05-30 10:53:33 +00:00
Konrad Sewiłło-Jopek	c9a5c48ae8	arp: Implement sticky ARP mode for interfaces. Provide sticky ARP flag for network interface which marks it as the "sticky" one similarly to what we have for bridges. Once interface is marked sticky, any address resolved using the ARP will be saved as a static one in the ARP table. Such functionality may be used to prevent ARP spoofing or to decrease latencies in Ethernet networks. The drawbacks include potential limitations in usage of ARP-based load-balancers and high-availability solutions such as carp(4). The implemented option is disabled by default, therefore should not impact the default behaviour of the networking stack. Sponsored by: Conclusive Engineering sp. z o.o. Reviewed By: melifaro, pauamma_gundo.com Differential Revision: https://reviews.freebsd.org/D35314 MFC after: 2 weeks	2022-05-27 12:41:30 +00:00
Konstantin Belousov	6a311e6fa5	Add ifcap2 names for RXTLS4 and RXTLS6 interface capabilities and corresponding nvlist capabilities name strings. Reviewed by: hselasky, jhb, kp (previous version) Sponsored by: NVIDIA Networking MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D32551	2022-05-24 23:59:32 +03:00
Konstantin Belousov	051e7d78b0	Kernel-side infrastructure to implement nvlist-based set/get ifcaps Reviewed by: hselasky, jhb, kp (previous version) Sponsored by: NVIDIA Networking MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D32551	2022-05-24 23:59:32 +03:00
Konstantin Belousov	b96549f057	struct ifnet: add if_capabilities2 and if_capenable2 bitmasks We are running out of bits in if_capabilities. Suggested by: jhb Reviewed by: hselasky, jhb, kp (previous version) Sponsored by: NVIDIA Networking MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D32551	2022-05-24 23:59:32 +03:00
Andrey V. Elsukov	f2ab916084	[vlan + lagg] add IFNET_EVENT_UPDATE_BAUDRATE event use it to update if_baudrate for vlan interfaces created on the LACP lagg. Differential revision: https://reviews.freebsd.org/D33405	2022-05-20 06:38:43 +02:00
Mitchell Horne	a84bf5eaa1	debugnet: fix an errant assertion We may call debugnet_free() before g_debugnet_pcb_inuse is true, specifically in the cases where the interface is down or does not support debugnet. pcb->dp_drv_input is used to hold the real driver if_input callback while debugnet is in use, so we can check the status of this field in the assertion. This can be triggered trivially by trying to configure netdump on an unsupported interface at the ddb prompt. Initializing the dp_drv_input field to NULL explicitly is not necessary but helps display the intent. PR: 263929 Reported by: Martin Filla <freebsd@sysctl.cz> Reviewed by: cem, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D35179	2022-05-14 10:27:53 -03:00
Kurosawa Takahiro	9573cc3555	rtsock: fix a stack overflow struct sockaddr is not sufficient for buffer that can hold any sockaddr_* structure. struct sockaddr_storage should be used. Test: ifconfig epair create ifconfig epair0a inet6 add 2001:db8::1 up ndp -s 2001:db8::2 02:86:98:2e:96:0b proxy # this triggers kernel stack overflow Reviewed by: markj, kp Differential Revision: https://reviews.freebsd.org/D35188	2022-05-13 20:05:36 +02:00
Kristof Provost	cbbce42345	epair: unbind prior to returning to userspace If 'options RSS' is set we bind the epair tasks to different CPUs. We must take care to not keep the current thread bound to the last CPU when we return to userspace. MFC after: 1 week Sponsored by: Orange Business Services	2022-05-07 18:17:33 +02:00
Kristof Provost	a6b0c8d04d	epair: fix set but not used warning If 'options RSS' is set. MFC after: 1 week Sponsored by: Orange Business Services	2022-05-07 18:17:32 +02:00
Kristof Provost	868bf82153	if: avoid interface destroy race When we destroy an interface while the jail containing it is being destroyed we risk seeing a race between if_vmove() and the destruction code, which results in us trying to move a destroyed interface. Protect against this by using the ifnet_detach_sxlock to also covert if_vmove() (and not just detach). PR: 262829 MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D34704	2022-05-06 13:55:08 +02:00
Gleb Smirnoff	51f798e761	netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268 (cherry picked from commit `6871de9363`)	2022-05-05 14:38:07 -04:00
Gleb Smirnoff	4d7a1361ef	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show `e6abef0918` (cherry picked from commit `e1882428dc`)	2022-05-05 14:38:07 -04:00
Gleb Smirnoff	80e60e236d	ifnet: make if_index global Now that ifindex is static to if.c we can unvirtualize it. For lifetime of an ifnet its index never changes. To avoid leaking foreign interfaces the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI filter their returned value on curvnet. Since if_vmove() no longer changes the if_index, inline ifindex_alloc() and ifindex_free() into if_alloc() and if_free() respectively. API wise the only change is that now minimum interface index can be greater than 1. The holes in interface indexes were always allowed. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33672 (cherry picked from commit `91f44749c6`)	2022-05-05 14:38:07 -04:00
Marko Zec	d461deeaa4	VNET: Revert "ifnet: make if_index global" This reverts commit `91f44749c6`. Devirtualization of V_if_index and V_ifindex_table was rushed into the tree lacking proper context, discussion, and declaration of intent, so I'm backing it out as harmful to VNET on the following grounds: 1) The change repurposed the decades-old and stable if_index KBI for new, unclear goals which were omitted from the commit note. 2) The change opened up a new resource exhaustion vector where any vnet could starve the system of ifnet indices, including vnet0. 3) To circumvent the newly introduced problem of separating ifnets belonging to different vnets from the globalized ifindex_table, the author introduced sysctl_ifcount() which does a linear traversal over the (potentially huge) global ifnet list just to return a simple upper bound on existing ifnet indices. 4) The change effectively led to nonuniform ifnet index allocation among vnets. 5) The commit note clearly stated that the patch changed the implicit if_index ABI contract where ifnet indices were assumed to be starting from one. The commit note also included a correct observation that holes in interface indices were always allowed, but failed to declare that the userland-observable ifindex tables could now include huge empty spans even under modest operating conditions. 6) The author had an earlier proposal in the works which did not affect per-vnet ifnet lists (D33265) but which he abandoned without providing the rationale behind his decision to do so, at the expense of sacrificing the vnet isolation contract and if_index ABI / KBI. Furthermore, the author agreed to back out his changes himself and to follow up with a proposal for a less intrusive alternative, but later silently declined to act. Therefore, I decided to resolve the status-quo by backing this out myself. This in no way precludes a future proposal aiming to mitigate ifnet-removal related system crashes or panics to be accepted, provided it would not unnecessarily compromise the goal of as strict as possible isolation between vnets. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex	2022-05-03 19:27:57 +02:00
Marko Zec	6c741ffbfa	Revert "mbuf: do not restore dying interfaces" This reverts commit `703e533da5`. Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif" This reverts commit `e1882428dc`. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex	2022-05-03 19:11:40 +02:00
Marko Zec	0fa5636966	Revert "netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs" This reverts commit `6871de9363`. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex	2022-05-03 19:11:39 +02:00
Greg Foster	00a80538b4	lacp: short timeout erroneously declares link-flapping Panasas was seeing a higher-than-expected number of link-flap events. After joint debugging with the switch vendor, we determined there were problems on both sides; either of which might cause the occasional event, but together caused lots of them. On the switch side, an internal queuing issue was causing LACP PDUs -- which should be sent every second, in short-timeout mode -- to sometimes be sent slightly later than they should have been. In some cases, two successive PDUs were late, but we never saw three late PDUs in a row. On the FreeBSD side, we saw a link-flap event every time there were two late PDUs, while the spec says that it takes three seconds of downtime to trigger that event. It turns out that if a PDU was received shortly before the timer code was run, it would decrement less than a full second after the PDU arrived. Then two delayed PDUs would cause two additional decrements, causing it to reach zero less than three seconds after the most-recent on-time PDU. The solution is to note the time a PDU arrives, and only decrement if at least a full second has elapsed since then. Reported by: Greg Foster <gfoster@panasas.com> Reviewed by: gallatin Tested by: Greg Foster <gfoster@panasas.com> MFC after: 3 days Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D35070	2022-04-27 12:41:30 -07:00
Reid Linnemann	0abcc1d2d3	pf: Add per-rule timestamps for rule and eth_rule Similar to ipfw rule timestamps, these timestamps internally are uint32_t snaps of the system time in seconds. The timestamp is CPU local and updated each time a rule or a state associated with a rule or state is matched. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34970	2022-04-22 19:53:20 +02:00
Kristof Provost	812839e5aa	pf: allow the use of tables in ethernet rules Allow tables to be used for the l3 source/destination matching. This requires taking the PF_RULES read lock. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34917	2022-04-20 13:01:12 +02:00
John Baldwin	ac3e46fa3e	infiniband_resolve_addr: ih is only used for INET or INET6.	2022-04-13 16:08:21 -07:00
John Baldwin	d98981585c	ether_resolve_addr: eh is only used for INET or INET6.	2022-04-13 16:08:21 -07:00
John Baldwin	2884a93651	vlan: ifa is only used under #ifdef INET.	2022-04-13 16:08:21 -07:00
John Baldwin	2174f0f2f2	net/route: Use __diagused for variables only used in KASSERT().	2022-04-13 16:08:19 -07:00
Kristof Provost	742e7210d0	udp: allow udp_tun_func_t() to indicate it did not eat the packet Allow udp tunnel functions to indicate they have not taken ownership of the packet, and that normal UDP processing should continue. This is especially useful for scenarios where the kernel has taken ownership of a socket that was originally created by userspace. It allows the tunnel function to pass through certain packets for userspace processing. The primary user of this is if_ovpn, when it receives messages from unknown peers (which might be a new client). Reviewed by: tuexen Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34883	2022-04-12 10:04:59 +02:00
Gordon Bergling	1a15a383a6	net: Fix a typo in a source code comment - s/peform/perform/ MFC after: 3 days	2022-04-09 11:37:57 +02:00
John Baldwin	d08cb45362	iflib: Use empty inline functions for prefetch() on non-x86. This avoids warnings about unused variables in expressions passed to prefetch().	2022-04-08 17:25:14 -07:00
Mark Johnston	990a6d18b0	net: Fix memory leaks in lltable_calc_llheader() error paths Also convert raw epoch_call() calls to lltable_free_entry() calls, no functional change intended. There's no need to asynchronously free the LLEs in that case to begin with, but we might as well use the lltable interfaces consistently. Noticed by code inspection; I believe lltable_calc_llheader() failures do not generally happen in practice. Reviewed by: bz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34832	2022-04-08 11:47:25 -04:00
John Baldwin	f7236dd068	change_mpath_route: Remove write-only nh variable. While here, cleanup the style of the function prologue by moving an assignment out of the middle of two variable declaration blocks.	2022-04-06 16:45:28 -07:00
John Baldwin	371c917b0b	unlink_nhgrp: Remove write-only variable. Possibly one could assert that ret should always be 0 here (that is, that there was always an index found in the bitmask). That should be true since a bitmask index is allocated before the nhgrp is inserted in the ctl->gr_head list in link_nhgrp.	2022-04-06 16:45:27 -07:00
Warner Losh	e606e5d157	sysctl_dumpentry: move error to inner scope Sponsored by: Netflix	2022-04-04 22:30:50 -06:00
Warner Losh	5de5b5a34d	route_ctl: eliminate write only variables ifa and nh Sponsored by: Netflix	2022-04-04 22:30:48 -06:00
Warner Losh	7f9c3339a4	get_nhop: eliminate write only variable gateway Sponsored by: Netflix	2022-04-04 22:30:47 -06:00
Gordon Bergling	d792dc7ebb	net(4): Fix a typo in a source code comment - s/accomodate/accommodate/ MFC after: 3 days	2022-04-02 14:57:06 +02:00
Gordon Bergling	cba46da538	net(3): Fix a typo in a source code comment - s/verion/version/ MFC after: 3 days	2022-04-02 10:53:40 +02:00
Gordon Bergling	f8d292b665	net(3): Fix a typo in a source code comment - s/Multilik/Multilink/ Obtained from: NetBSD MFC after: 3 days	2022-04-02 09:41:10 +02:00
Gordon Bergling	23677398ca	net(3): Fix a typo in a source code comment - s/paramenters/parameters/ MFC after: 3 days	2022-04-02 09:24:48 +02:00
Kristof Provost	9bb06778f8	pf: support listing ethernet anchors Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-30 10:28:19 +02:00
Gordon Bergling	bef80a7285	vxlan(4): Fix two typos in sysctl descriptions - s/fowarding/forwarding/ MFC after: 3 days	2022-03-28 19:35:34 +02:00
Mateusz Guzik	bd7762c869	pf: add a rule rb tree with md5 sum used as key. This gets rid of the quadratic rule traversal when "keep_counters" is set. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:45:03 +00:00
Mateusz Guzik	1a3e98a5b8	pf: pre-compute rule hash Makes it cheaper to compare rules when "keep_counters" is set. This also sets up keeping them in a RB tree. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:44:52 +00:00
Mateusz Guzik	93f8c38c03	pf: add pf_config_lock For now only protects rule creation/destruction, but will allow gradually reducing the scope of rules lock when changing the rules. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:44:46 +00:00
Alexander V. Chernikov	1b8b69508b	routing: copy nexthop fib when changing existing nexthop MFC after: 1 day	2022-03-28 11:32:30 +00:00
Gordon Bergling	ef88adc527	pf(4): Fix a typo in a source code comment - s/seaching/searching/ MFC after: 3 days	2022-03-27 19:57:49 +02:00
Kristof Provost	0bf7acd6b7	if_epair: build fix `66acf7685b` failed to build on riscv (and mips). This is because the atomic_testandset_int() (and friends) functions do not exist there. Happily those platforms do have the long variant, so switch to that. PR: 262571 MFC after: 3 days	2022-03-17 06:43:47 +01:00
Michael Gmelin	66acf7685b	if_epair: fix race condition on multi-core systems As an unwanted side effect of the performance improvements in `24f0bfbad5`, epair interfaces stop forwarding traffic on higher load levels when running on multi-core systems. This happens due to a race condition in the logic that decides when to place work in the task queue(s) responsible for processing the content of ring buffers. In order to fix this, a field named state is added to the epair_queue structure. This field is used by the affected functions to signal each other that something happened in the underlying ring buffers that might require work to be scheduled in task queue(s), replacing the existing logic, which relied on checking if ring buffers are empty or not. epair_menq() does: - set BIT_MBUF_QUEUED - queue mbuf - if testandset BIT_QUEUE_TASK: enqueue task epair_tx_start_deferred() does: - swap ring buffers - process mbufs - clear BIT_QUEUE_TASK - if testandclear BIT_MBUF_QUEUED enqueue task PR: 262571 Reported by: Johan Hendriks <joh.hendriks@gmail.com> MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34569	2022-03-16 23:08:55 +01:00
Kristof Provost	8a42005d1e	pf: support basic L3 filtering in the Ethernet rules Allow filtering based on the source or destination IP/IPv6 address in the Ethernet layer rules. Reviewed by: pauamma_gundo.com (man), debdrup (man) Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34482	2022-03-14 22:42:37 +01:00
Mateusz Guzik	f11b6505f1	pf: add PF_UNLNKDRULES_ASSERT Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-10 17:20:41 +00:00
Vincenzo Maffione	09a1893398	netmap: fix refcount bug in netmap allocator Symptom: when a single extmem memory region is provided to netmap multiple times, for multiple interfaces, the memory region is never released by netmap once all the existing file descriptors are closed. Fix the relevant condition in netmap_mem_drop(): release the memory when the last user of netmap_adapter is gone, rather then when the last user of netmap_mem_d is gone. MFC after: 2 weeks	2022-03-06 16:39:16 +00:00
Santiago Martinez	52bcdc5b80	if_epair: fix build with RSS and INET or INET6 disabled Reviewed by: kp MFC after: 1 week	2022-03-03 18:31:26 +01:00
Kristof Provost	b590f17a11	pf: support masking mac addresses When filtering Ethernet packets allow rules to specify a mac address with a mask. This indicates which bits of the specified address are significant. This allows users to do things like filter based on device manufacturer. Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-02 17:00:08 +01:00
Kristof Provost	c5131afee3	pf: add anchor support for ether rules Support anchors in ether rules. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32482	2022-03-02 17:00:07 +01:00
Kristof Provost	fb330f3931	pf: support dummynet on L2 rules Allow packets to be tagged with dummynet information. Note that we do not apply dummynet shaping on the L2 traffic, but instead mark it for dummynet processing in the L3 code. This is the same approach as we take for ALTQ. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32222	2022-03-02 17:00:06 +01:00
Kristof Provost	20c4899a8e	pf: Do not hold PF_RULES_RLOCK while processing Ethernet rules Avoid the overhead of acquiring a (read) RULES lock when processing the Ethernet rules. We can get away with that because when rules are modified they're staged in V_pf_keth_inactive. We take care to ensure the swap to V_pf_keth is atomic, so that pf_test_eth_rule() always sees either the old rules, or the new ruleset. We need to take care not to delete the old ruleset until we're sure no pf_test_eth_rule() is still running with those. We accomplish that by using NET_EPOCH_CALL() to actually free the old rules. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31739	2022-03-02 17:00:03 +01:00
Kristof Provost	e732e742b3	pf: Initial Ethernet level filtering code This is the kernel side of stateless Ethernel level filtering for pf. The primary use case for this is to enable captive portal functionality to allow/deny access by MAC address, rather than per IP address. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31737	2022-03-02 17:00:03 +01:00
Kristof Provost	36637dd19d	bridge: Don't share broadcast packets if_bridge duplicates broadcast packets with m_copypacket(), which creates shared packets. In certain circumstances these packets can be processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as part of the checksum verification. That may lead to incorrect packets being transmitted. Use m_dup() to create independent mbufs instead. Reported by: Richard Russo <toast@ruka.org> Reviewed by: donner, afedorov MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D34319	2022-02-21 19:03:44 +01:00
Mateusz Guzik	430e0e409c	vnet: add CURVNET_ASSERT_SET for !VIMAGE Reported by: ler Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-02-19 21:00:00 +00:00
Mateusz Guzik	75cde1f872	vnet: add CURVNET_ASSERT_SET Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34312	2022-02-19 13:10:01 +00:00
Li-Wen Hsu	7442b63231	if_epair: Use ANSI C definition This fixes -Werror=strict-prototypes from gcc9 Sponsored by: The FreeBSD Foundation	2022-02-15 21:45:22 +08:00
Kristof Provost	24f0bfbad5	if_epair: implement fanout Allow multiple cores to be used to process if_epair traffic. We do this (if RSS is enabled) based on the RSS hash of the incoming packet. This allows us to distribute the load over multiple cores, rather than sending everything to the same one. We also switch from swi_sched() to taskqueues, which also contributes to better throughput. Benchmark results: With net.isr.maxthreads=-1 Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1) Before 627 Kpps After (no RSS) 1.198 Mpps After (RSS) 3.148 Mpps Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1) Before 7.705 Kpps After (no RSS) 1.017 Mpps After (RSS) 2.083 Mpps MFC after: 3 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D33731	2022-02-15 09:03:24 +01:00
Kristof Provost	78bc3d5e17	vlan: allow net.link.vlan.mtag_pcp to be set per vnet The primary reason for this change is to facilitate testing. MFC after: 1 week	2022-02-14 22:51:10 +01:00
Aleksandr Fedorov	ceaf442ff2	if_vxlan(4): Allow netmap_generic to intercept RX packets. Netmap (generic) intercepts the if_input method to handle RX packets. Call ifp->if_input() instead of netisr_dispatch(). Add stricter check for incoming packet length. This change is very useful with bhyve + vale + if_vxlan. Reviewed by: vmaffione (mentor), kib, np, donner Approved by: vmaffione (mentor), kib, np, donner MFC after: 2 weeks Sponsored by: vstack.com Differential Revision: https://reviews.freebsd.org/D30638	2022-02-06 15:27:46 +03:00
Kristof Provost	4daa31c108	pflog: align header to 4 bytes, not 8 `6d4baa0d01` incorrectly rounded the lenght of the pflog header up to 8 bytes, rather than 4. PR: 261566 Reported by: Guy Harris <gharris@sonic.net> MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-02-01 18:17:44 +01:00
Mark Johnston	773e3a71b2	pf: Initialize pf_kpool mutexes earlier There are some error paths in ioctl handlers that will call pf_krule_free() before the rule's rpool.mtx field is initialized, causing a panic with INVARIANTS enabled. Fix the problem by introducing pf_krule_alloc() and initializing the mutex there. This does mean that the rule->krule and pool->kpool conversion functions need to stop zeroing the input structure, but I don't see a nicer way to handle this except perhaps by guarding the mtx_destroy() with a mtx_initialized() check. Constify some related functions while here and add a regression test based on a syzkaller reproducer. Reported by: syzbot+77cd12872691d219c158@syzkaller.appspotmail.com Reviewed by: kp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34115	2022-01-31 16:14:00 -05:00
Gleb Smirnoff	964b8f8b99	ifnet: garbage collect unused function ifaddr_byindex(). Last use was removed in `5adea417d4`.	2022-01-28 09:51:52 -08:00
Gleb Smirnoff	6abb5043a6	rtsock: always set m_pkthdr.rcvif when queueing on netisr netisr uses global workstreams and after dequeueing an mbuf it uses rcvif to get the VNET of the mbuf. Of course, this is not needed when kernel is compiled without VIMAGE. It came out that routing socket does not set rcvif if compiled without VIMAGE. Make this assignment not depending on VIMAGE option. Fixes: `6871de9363`	2022-01-27 09:41:31 -08:00
Gleb Smirnoff	6871de9363	netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	e1882428dc	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show `e6abef0918`	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	91f44749c6	ifnet: make if_index global Now that ifindex is static to if.c we can unvirtualize it. For lifetime of an ifnet its index never changes. To avoid leaking foreign interfaces the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI filter their returned value on curvnet. Since if_vmove() no longer changes the if_index, inline ifindex_alloc() and ifindex_free() into if_alloc() and if_free() respectively. API wise the only change is that now minimum interface index can be greater than 1. The holes in interface indexes were always allowed. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33672	2022-01-26 21:58:44 -08:00
Hans Petter Selasky	c8f2c290e4	Add definitions for TLS receive tags using the existing send tag infrastructure. Although send tags are strictly used for transmit, the name might be changed in the future to be more generic. The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the network driver itself when creating the TLS RX decryption offload filter. TLS receive tags have a modify callback to tell the network driver about the progress of decryption. Currently decryption is done IP packet by IP packet, even if the IP packet contains a partial TLS record. The modify callback allows the network driver to keep track of TCP sequence numbers pointing to the beginning of TLS records after TCP packet reassembly. These callbacks only happen when encrypted or partially decrypted data is received and are used to verify the decryptions starting point for the hardware. Typically the hardware will guess where TLS headers start and needs help from the software to know if the guess was correct. This is the purpose of the modify callback. Differential Revision: https://reviews.freebsd.org/D32356 Discussed with: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking	2022-01-26 12:55:00 +01:00
Gleb Smirnoff	6d1808f051	if_clone: correctly destroy a clone from a different vnet Try to live with cruel reality fact - if_vmove doesn't move an interface from previous vnet cloning infrastructure to the new one. Let's admit this as design feature and make it work better. * Delete two blocks of code that would fallback to vnet0, if a cloner isn't found. They didn't do any good job and also whole idea of treating vnet0 as special one is wrong. * When deleting a cloned interface, lookup its cloner using it's home vnet. With this change simple sequence works correctly: ifconfig foo0 create jail -c name=jj persist vnet vnet.interface=foo0 jexec jj ifconfig foo0 destroy Differential revision: https://reviews.freebsd.org/D33942	2022-01-24 21:07:16 -08:00
Gleb Smirnoff	54712fc423	if_vmove: improve restoration in cloner's ifgroup membership * Do a single call into if_clone.c instead of two. The cloner can't disappear since the interface sits on its list. * Make restoration smarter - check that cloner with same name exists in the new vnet. Differential revision: https://reviews.freebsd.org/D33941	2022-01-24 21:06:59 -08:00
Eric Joyner	213e91399b	iflib: Allow drivers to determine which queue to TX on Adds a new function pointer to struct if_txrx in order to allow drivers to set their own function that will determine which queue a packet should be sent on. Since this includes a kernel ABI change, bump the __FreeBSD_version as well. (This motivation behind this is to allow the driver to examine the UP in the VLAN tag and determine which queue to TX on based on that, in support of HW TX traffic shaping.) Signed-off-by: Eric Joyner <erj@FreeBSD.org> Reviewed by: kbowling@, stallamr@netapp.com Tested by: jeffrey.e.pieper@intel.com Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D31485	2022-01-24 18:22:02 -08:00
Vincenzo Maffione	e0e1240528	netmap: fix LOR in iflib_netmap_register In iflib_device_register(), the CTX_LOCK is acquired first and then IFNET_WLOCK is acquired by ether_ifattach(). However, in netmap_hw_reg() we do the opposite: IFNET_RLOCK is acquired first, and then CTX_LOCK is acquired by iflib_netmap_register(). Fix this LOR issue by wrapping the CTX_LOCK/UNLOCK calls in iflib_device_register with an additional IFNET_WLOCK. This is safe since the IFNET_WLOCK is recursive. MFC after: 1 month	2022-01-14 21:09:04 +00:00

1 2 3 4 5 ...

5068 Commits