freebsd-dev

Author	SHA1	Message	Date
Kristof Provost	24f0bfbad5	if_epair: implement fanout Allow multiple cores to be used to process if_epair traffic. We do this (if RSS is enabled) based on the RSS hash of the incoming packet. This allows us to distribute the load over multiple cores, rather than sending everything to the same one. We also switch from swi_sched() to taskqueues, which also contributes to better throughput. Benchmark results: With net.isr.maxthreads=-1 Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1) Before 627 Kpps After (no RSS) 1.198 Mpps After (RSS) 3.148 Mpps Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1) Before 7.705 Kpps After (no RSS) 1.017 Mpps After (RSS) 2.083 Mpps MFC after: 3 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D33731	2022-02-15 09:03:24 +01:00
Kristof Provost	78bc3d5e17	vlan: allow net.link.vlan.mtag_pcp to be set per vnet The primary reason for this change is to facilitate testing. MFC after: 1 week	2022-02-14 22:51:10 +01:00
Aleksandr Fedorov	ceaf442ff2	if_vxlan(4): Allow netmap_generic to intercept RX packets. Netmap (generic) intercepts the if_input method to handle RX packets. Call ifp->if_input() instead of netisr_dispatch(). Add stricter check for incoming packet length. This change is very useful with bhyve + vale + if_vxlan. Reviewed by: vmaffione (mentor), kib, np, donner Approved by: vmaffione (mentor), kib, np, donner MFC after: 2 weeks Sponsored by: vstack.com Differential Revision: https://reviews.freebsd.org/D30638	2022-02-06 15:27:46 +03:00
Kristof Provost	4daa31c108	pflog: align header to 4 bytes, not 8 `6d4baa0d01` incorrectly rounded the lenght of the pflog header up to 8 bytes, rather than 4. PR: 261566 Reported by: Guy Harris <gharris@sonic.net> MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-02-01 18:17:44 +01:00
Mark Johnston	773e3a71b2	pf: Initialize pf_kpool mutexes earlier There are some error paths in ioctl handlers that will call pf_krule_free() before the rule's rpool.mtx field is initialized, causing a panic with INVARIANTS enabled. Fix the problem by introducing pf_krule_alloc() and initializing the mutex there. This does mean that the rule->krule and pool->kpool conversion functions need to stop zeroing the input structure, but I don't see a nicer way to handle this except perhaps by guarding the mtx_destroy() with a mtx_initialized() check. Constify some related functions while here and add a regression test based on a syzkaller reproducer. Reported by: syzbot+77cd12872691d219c158@syzkaller.appspotmail.com Reviewed by: kp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34115	2022-01-31 16:14:00 -05:00
Gleb Smirnoff	964b8f8b99	ifnet: garbage collect unused function ifaddr_byindex(). Last use was removed in `5adea417d4`.	2022-01-28 09:51:52 -08:00
Gleb Smirnoff	6abb5043a6	rtsock: always set m_pkthdr.rcvif when queueing on netisr netisr uses global workstreams and after dequeueing an mbuf it uses rcvif to get the VNET of the mbuf. Of course, this is not needed when kernel is compiled without VIMAGE. It came out that routing socket does not set rcvif if compiled without VIMAGE. Make this assignment not depending on VIMAGE option. Fixes: `6871de9363`	2022-01-27 09:41:31 -08:00
Gleb Smirnoff	6871de9363	netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	e1882428dc	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show `e6abef0918`	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	91f44749c6	ifnet: make if_index global Now that ifindex is static to if.c we can unvirtualize it. For lifetime of an ifnet its index never changes. To avoid leaking foreign interfaces the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI filter their returned value on curvnet. Since if_vmove() no longer changes the if_index, inline ifindex_alloc() and ifindex_free() into if_alloc() and if_free() respectively. API wise the only change is that now minimum interface index can be greater than 1. The holes in interface indexes were always allowed. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33672	2022-01-26 21:58:44 -08:00
Hans Petter Selasky	c8f2c290e4	Add definitions for TLS receive tags using the existing send tag infrastructure. Although send tags are strictly used for transmit, the name might be changed in the future to be more generic. The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the network driver itself when creating the TLS RX decryption offload filter. TLS receive tags have a modify callback to tell the network driver about the progress of decryption. Currently decryption is done IP packet by IP packet, even if the IP packet contains a partial TLS record. The modify callback allows the network driver to keep track of TCP sequence numbers pointing to the beginning of TLS records after TCP packet reassembly. These callbacks only happen when encrypted or partially decrypted data is received and are used to verify the decryptions starting point for the hardware. Typically the hardware will guess where TLS headers start and needs help from the software to know if the guess was correct. This is the purpose of the modify callback. Differential Revision: https://reviews.freebsd.org/D32356 Discussed with: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking	2022-01-26 12:55:00 +01:00
Gleb Smirnoff	6d1808f051	if_clone: correctly destroy a clone from a different vnet Try to live with cruel reality fact - if_vmove doesn't move an interface from previous vnet cloning infrastructure to the new one. Let's admit this as design feature and make it work better. * Delete two blocks of code that would fallback to vnet0, if a cloner isn't found. They didn't do any good job and also whole idea of treating vnet0 as special one is wrong. * When deleting a cloned interface, lookup its cloner using it's home vnet. With this change simple sequence works correctly: ifconfig foo0 create jail -c name=jj persist vnet vnet.interface=foo0 jexec jj ifconfig foo0 destroy Differential revision: https://reviews.freebsd.org/D33942	2022-01-24 21:07:16 -08:00
Gleb Smirnoff	54712fc423	if_vmove: improve restoration in cloner's ifgroup membership * Do a single call into if_clone.c instead of two. The cloner can't disappear since the interface sits on its list. * Make restoration smarter - check that cloner with same name exists in the new vnet. Differential revision: https://reviews.freebsd.org/D33941	2022-01-24 21:06:59 -08:00
Eric Joyner	213e91399b	iflib: Allow drivers to determine which queue to TX on Adds a new function pointer to struct if_txrx in order to allow drivers to set their own function that will determine which queue a packet should be sent on. Since this includes a kernel ABI change, bump the __FreeBSD_version as well. (This motivation behind this is to allow the driver to examine the UP in the VLAN tag and determine which queue to TX on based on that, in support of HW TX traffic shaping.) Signed-off-by: Eric Joyner <erj@FreeBSD.org> Reviewed by: kbowling@, stallamr@netapp.com Tested by: jeffrey.e.pieper@intel.com Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D31485	2022-01-24 18:22:02 -08:00
Vincenzo Maffione	e0e1240528	netmap: fix LOR in iflib_netmap_register In iflib_device_register(), the CTX_LOCK is acquired first and then IFNET_WLOCK is acquired by ether_ifattach(). However, in netmap_hw_reg() we do the opposite: IFNET_RLOCK is acquired first, and then CTX_LOCK is acquired by iflib_netmap_register(). Fix this LOR issue by wrapping the CTX_LOCK/UNLOCK calls in iflib_device_register with an additional IFNET_WLOCK. This is safe since the IFNET_WLOCK is recursive. MFC after: 1 month	2022-01-14 21:09:04 +00:00
Kristof Provost	5f5e32f1b3	pf: protect the rpool from races The roundrobin pool stores its state in the rule, which could potentially lead to invalid addresses being returned. For example, thread A just executed PF_AINC(&rpool->counter) and immediately afterwards thread B executes PF_ACPY(naddr, &rpool->counter) (i.e. after the pf_match_addr() check of rpool->counter). Lock the rpool with its own mutex to prevent these races. The performance impact of this is expected to be low, as each rule has its own lock, and the lock is also only relevant when state is being created (so only for the initial packets of a connection, not for all traffic). See also: https://redmine.pfsense.org/issues/12660 Reviewed by: glebius MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33874	2022-01-14 10:30:33 +01:00
Alexander Motin	618d49f5ca	Revert "iflib: Relax timer period from 0.5 to 0.5-0.75s." I've noticed relations between iflib_timer() vs ixl_admin_timer(). Both scheduled at the same 2Hz rate, but the second is rescheduling the first each time, so if the first get any slower, it won't be executed at all. Revert this until deeper investigation. This reverts commit `90bc1cf657`.	2022-01-10 09:40:38 -05:00
Alexander Motin	90bc1cf657	iflib: Relax timer period from 0.5 to 0.5-0.75s. While there switch it from hardclock ticks to milliseconds. MFC after: 2 weeks	2022-01-09 20:32:50 -05:00
Ryan Stone	5adea417d4	Fix ifa refcount leak in ifa_ifwithnet() In `4f6c66cc9c`, ifa_ifwithnet() was changed to no longer ifa_ref() the returned ifaddr, and instead the caller was required to stay in the net_epoch for as long as they wanted the ifaddr to remain valid. However, this missed the case where an AF_LINK lookup would call ifaddr_byindex(), which still does ifa_ref() the ifaddr. This would cause a refcount leak. Fix this by inlining the relevant parts of ifaddr_byindex() here, with the ifa_ref() call removed. This also avoids an unnecessary entry and exit from the net_epoch for this case. I've audited all in-tree consumers of ifa_ifwithnet() that could possibly perform an AF_LINK lookup and confirmed that none of them will expect the ifaddr to have a reference that they need to release. MFC after: 2 months Sponsored by: Dell Inc Differential Revision: https://reviews.freebsd.org/D28705 Reviewed by: melifaro	2022-01-06 15:04:24 -05:00
Ed Maste	a6668e31aa	Fix kernel build without INET and INET6 Reviewed by: brooks, melifaro Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33718	2022-01-05 09:41:38 -05:00
Gleb Smirnoff	644ca0846d	domains: make domain_init() initialize only global state Now that each module handles its global and VNET initialization itself, there is no VNET related stuff left to do in domain_init(). Differential revision: https://reviews.freebsd.org/D33541	2022-01-03 10:15:22 -08:00
Gleb Smirnoff	89128ff3e4	protocols: init with standard SYSINIT(9) or VNET_SYSINIT The historical BSD network stack loop that rolls over domains and over protocols has no advantages over more modern SYSINIT(9). While doing the sweep, split global and per-VNET initializers. Getting rid of pr_init allows to achieve several things: o Get rid of ifdef's that protect against double foo_init() when both INET and INET6 are compiled in. o Isolate initializers statically to the module they init. o Makes code easier to understand and maintain. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33537	2022-01-03 10:15:21 -08:00
Ed Maste	818952c638	Fix kernel build without INET6 Reported by: Gary Jennejohn Fixes: `ff3a85d324` ("[lltable] Add per-family lltable ...") Sponsored by: The FreeBSD Foundation	2021-12-30 18:40:46 -05:00
Stefan Eßer	e2650af157	Make CPU_SET macros compliant with other implementations The introduction of <sched.h> improved compatibility with some 3rd party software, but caused the configure scripts of some ports to assume that they were run in a GLIBC compatible environment. Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being added to ports, but there still were compatibility issues due to invalid assumptions made in autoconfigure scripts. The differences between the FreeBSD version of macros like CPU_AND, CPU_OR, etc. and the GLIBC versions was in the number of arguments: FreeBSD used a 2-address scheme (one source argument is also used as the destination of the operation), while GLIBC uses a 3-adderess scheme (2 source operands and a separately passed destination). The GLIBC scheme provides a super-set of the functionality of the FreeBSD macros, since it does not prevent passing the same variable as source and destination arguments. In code that wanted to preserve both source arguments, the FreeBSD macros required a temporary copy of one of the source arguments. This patch set allows to unconditionally provide functions and macros expected by 3rd party software written for GLIBC based systems, but breaks builds of externally maintained sources that use any of the following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR. One contributed driver (contrib/ofed/libmlx5) has been patched to support both the old and the new CPU_OR signatures. If this commit is merged to -STABLE, the version test will have to be extended to cover more ranges. Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do no longer require that option. The FreeBSD version has been bumped to 1400046 to reflect this incompatible change. Reviewed by: kib MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D33451	2021-12-30 12:20:32 +01:00
Alexander V. Chernikov	63f7f3921b	routing: Add unified level-based logging support for the routing subsystem. Summary: MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33664	2021-12-29 21:30:18 +00:00
Alexander V. Chernikov	823a08d740	nhops: split nh_family into nh_upper_family and nh_neigh_family. With IPv4 over IPv6 nexthops and IP->MPLS support, there is a need to distingush "upper" e.g. traffic family and "neighbor" e.g. LLE/gateway address family. Store them explicitly in the private part of the nexthop data. While here, store nhop fibnum in nhop_prip datastructure to make it self-contained. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33663	2021-12-29 21:03:19 +00:00
Alexander V. Chernikov	ff3a85d324	[lltable] Add per-family lltable getters. Introduce a new function, lltable_get(), to retrieve lltable pointer for the specified interface and family. Use it to avoid all-iftable list traversal when adding or deleting ARP/ND records. Differential Revision: https://reviews.freebsd.org/D33660 MFC after: 2 weeks	2021-12-29 20:57:15 +00:00
Vincenzo Maffione	4561c4f0ca	net: iflib: sync isc_capenable to if_capenable On SIOCSIFCAP, some bits in ifp->if_capenable may be toggled. When this happens, apply the same change to isc_capenable, which is the iflib private copy of if_capenable (for a subset of the IFCAP_* bits). In this way the iflib drivers can check the bits using isc_capenable rather than if_capenable. This is convenient because the latter access requires an additional indirection through the ifp, and it is also less likely to be in cache. PR: 260068 Reviewed by: kbowling, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33156	2021-12-28 10:55:21 +00:00
Kristof Provost	e7809dceb5	pf: make if_pfsync.h self-contained Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33504	2021-12-17 12:38:35 +01:00
Kristof Provost	dc04fa802d	pf: make if_pflog.h self-contained Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33503	2021-12-17 12:38:35 +01:00
Kristof Provost	e9167358e4	net: make if_bridgevar.h self-contained Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33502	2021-12-17 12:38:35 +01:00
Kristof Provost	f4096a7c8a	net: make ethernet.h self-contained Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33501	2021-12-17 12:38:35 +01:00
Kristof Provost	c658610b92	pf: make pfvar.h self-contained Ensure that the pfvar.h header can be included without including any other headers. Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33499	2021-12-17 12:38:34 +01:00
Kristof Provost	b29c145cc1	if_stf: make if_stf.h self-contained Ensure that the if_stf.h header can be included without including any other headers. Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33498	2021-12-17 12:38:34 +01:00
Warner Losh	c6df6f5322	Create wrapper for Giant taken for newbus Create a wrapper for newbus to take giant and for busses to take it too. bus_topo_lock() should be called before interacting with newbus routines and unlocked with bus_topo_unlock(). If you need the topology lock for some reason, bus_topo_mtx() will provide that. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D31831	2021-12-09 17:04:45 -07:00
Mateusz Guzik	e735fa3212	net/if.c: plug set-but-not-unused vars Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-12-09 20:39:40 +00:00
Gleb Smirnoff	7e0bba4d80	ifnet: make V_if_index static to if.c This requires moving net.link.generic sysctl declaration from if_mib.c to if.c. Ideally if_mib.c needs just to be merged to if.c, but they have different license texts. Differential revision: https://reviews.freebsd.org/D33263	2021-12-06 09:32:31 -08:00
Gleb Smirnoff	d74b7baeb0	ifnet_byindex() actually requires network epoch Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment. Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263	2021-12-06 09:32:31 -08:00
Gleb Smirnoff	7b40b00fad	ifnet: merge ifindex_alloc(), ifnet_setbyindex(), if_grow() and call magic Now it is possible to just merge all this complexity into single linear function. Note that IFNET_WLOCK() is a sleepable lock, so we can M_WAITOK and epoch_wait_preempt(). Reviewed by: melifaro, bz, kp Differential revision: https://reviews.freebsd.org/D33262	2021-12-06 09:32:31 -08:00
Gleb Smirnoff	6ff4cac2ee	ifnet: initial if_grow() shall always succeed So let's just call malloc() directly. This also avoids hidden doubling of default V_if_indexlim. Reviewed by: melifaro, bz, kp Differential revision: https://reviews.freebsd.org/D33261	2021-12-06 09:32:31 -08:00
Gleb Smirnoff	450394af27	ifnet: use ck_pr(3) store & load setting ifnet pointer in ifindex The lockless access to the array is protected by the network epoch. Reviewed by: bz, kp Differential revision: https://reviews.freebsd.org/D33260	2021-12-06 09:32:30 -08:00
Gleb Smirnoff	8062e5759c	ifnet: allocate index at the end of if_alloc_domain() Now that if_alloc_domain() never fails and actually doesn't expose ifnet to outside we can eliminate IFNET_HOLD and two step index allocation. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33259	2021-12-06 09:32:30 -08:00
Gleb Smirnoff	ad2a0aec29	nhop: hash ifnet pointer instead of if_index Yet another problem created by VIMAGE/if_vmove/epair design that relocates ifnet between vnets and changes if_index. Since if_index changes, nhop hash values also changes, unlink_nhop() isn't able to find entry in hash and leaks the nhop. Since nhop references ifnet, the latter is also leaked. As result running network tests leaks memory on every single test that creates vnet jail. While here, rewrite whole hash_priv() to use static initializer, per Alexander's suggestion. Reviewed by: melifaro	2021-12-04 10:05:46 -08:00
Kristof Provost	6d4baa0d01	if_pflog: fix packet length There were two issues with the new pflog packet length. The first is that the length is expected to be a multiple of sizeof(long), but we'd assumed it had to be a multiple of sizeof(uint32_t). The second is that there's some broken software out there (such as Wireshark) that makes incorrect assumptions about the amount of padding. That is, Wireshark assumes there's always three bytes of padding, rather than however much is needed to get to a multiple of sizeof(long). Fix this by adding extra padding, and a fake field to maintain Wireshark's assumption. Reported by: Ozkan KIRIK <ozkan.kirik@gmail.com> Tested by: Ozkan KIRIK <ozkan.kirik@gmail.com> MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33236	2021-12-04 08:42:55 +01:00
Cy Schubert	db0ac6ded6	Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816" This reverts commit `266f97b5e9`, reversing changes made to `a10253cffe`. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.	2021-12-02 14:45:04 -08:00
Cy Schubert	266f97b5e9	wpa: Import wpa_supplicant/hostapd commit 14ab4a816 This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month	2021-12-02 13:35:14 -08:00
Gleb Smirnoff	9e93d2b335	ifnet: enable & fix if_debug build Fixes: `ce40632a31`	2021-12-02 10:59:43 -08:00
Gleb Smirnoff	93c67567e0	Remove "options PCBGROUP" With upcoming changes to the inpcb synchronisation it is going to be broken. Even its current status after the move of PCB synchronization to the network epoch is very questionable. This experimental feature was sponsored by Juniper but ended never to be used in Juniper and doesn't exist in their source tree [sjg@, stevek@, jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix [gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@]. I'm up to resurrecting it back if there is any interest from anybody. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33020	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	1cec1c5831	Allow to compile RSS without PCBGROUP. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33019	2021-12-02 10:48:48 -08:00
Zhenlei Huang	73d41cc730	if_epair: Also mark the flag of pair b with IFF_KNOWSEPOCH Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33210	2021-12-01 15:54:23 +01:00

1 2 3 4 5 ...

4833 Commits