freebsd-dev

Author	SHA1	Message	Date
Kristof Provost	868bf82153	if: avoid interface destroy race When we destroy an interface while the jail containing it is being destroyed we risk seeing a race between if_vmove() and the destruction code, which results in us trying to move a destroyed interface. Protect against this by using the ifnet_detach_sxlock to also covert if_vmove() (and not just detach). PR: 262829 MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D34704	2022-05-06 13:55:08 +02:00
Gleb Smirnoff	51f798e761	netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268 (cherry picked from commit `6871de9363`)	2022-05-05 14:38:07 -04:00
Gleb Smirnoff	4d7a1361ef	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show `e6abef0918` (cherry picked from commit `e1882428dc`)	2022-05-05 14:38:07 -04:00
Gleb Smirnoff	80e60e236d	ifnet: make if_index global Now that ifindex is static to if.c we can unvirtualize it. For lifetime of an ifnet its index never changes. To avoid leaking foreign interfaces the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI filter their returned value on curvnet. Since if_vmove() no longer changes the if_index, inline ifindex_alloc() and ifindex_free() into if_alloc() and if_free() respectively. API wise the only change is that now minimum interface index can be greater than 1. The holes in interface indexes were always allowed. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33672 (cherry picked from commit `91f44749c6`)	2022-05-05 14:38:07 -04:00
Marko Zec	d461deeaa4	VNET: Revert "ifnet: make if_index global" This reverts commit `91f44749c6`. Devirtualization of V_if_index and V_ifindex_table was rushed into the tree lacking proper context, discussion, and declaration of intent, so I'm backing it out as harmful to VNET on the following grounds: 1) The change repurposed the decades-old and stable if_index KBI for new, unclear goals which were omitted from the commit note. 2) The change opened up a new resource exhaustion vector where any vnet could starve the system of ifnet indices, including vnet0. 3) To circumvent the newly introduced problem of separating ifnets belonging to different vnets from the globalized ifindex_table, the author introduced sysctl_ifcount() which does a linear traversal over the (potentially huge) global ifnet list just to return a simple upper bound on existing ifnet indices. 4) The change effectively led to nonuniform ifnet index allocation among vnets. 5) The commit note clearly stated that the patch changed the implicit if_index ABI contract where ifnet indices were assumed to be starting from one. The commit note also included a correct observation that holes in interface indices were always allowed, but failed to declare that the userland-observable ifindex tables could now include huge empty spans even under modest operating conditions. 6) The author had an earlier proposal in the works which did not affect per-vnet ifnet lists (D33265) but which he abandoned without providing the rationale behind his decision to do so, at the expense of sacrificing the vnet isolation contract and if_index ABI / KBI. Furthermore, the author agreed to back out his changes himself and to follow up with a proposal for a less intrusive alternative, but later silently declined to act. Therefore, I decided to resolve the status-quo by backing this out myself. This in no way precludes a future proposal aiming to mitigate ifnet-removal related system crashes or panics to be accepted, provided it would not unnecessarily compromise the goal of as strict as possible isolation between vnets. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex	2022-05-03 19:27:57 +02:00
Marko Zec	6c741ffbfa	Revert "mbuf: do not restore dying interfaces" This reverts commit `703e533da5`. Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif" This reverts commit `e1882428dc`. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex	2022-05-03 19:11:40 +02:00
Marko Zec	0fa5636966	Revert "netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs" This reverts commit `6871de9363`. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex	2022-05-03 19:11:39 +02:00
Greg Foster	00a80538b4	lacp: short timeout erroneously declares link-flapping Panasas was seeing a higher-than-expected number of link-flap events. After joint debugging with the switch vendor, we determined there were problems on both sides; either of which might cause the occasional event, but together caused lots of them. On the switch side, an internal queuing issue was causing LACP PDUs -- which should be sent every second, in short-timeout mode -- to sometimes be sent slightly later than they should have been. In some cases, two successive PDUs were late, but we never saw three late PDUs in a row. On the FreeBSD side, we saw a link-flap event every time there were two late PDUs, while the spec says that it takes three seconds of downtime to trigger that event. It turns out that if a PDU was received shortly before the timer code was run, it would decrement less than a full second after the PDU arrived. Then two delayed PDUs would cause two additional decrements, causing it to reach zero less than three seconds after the most-recent on-time PDU. The solution is to note the time a PDU arrives, and only decrement if at least a full second has elapsed since then. Reported by: Greg Foster <gfoster@panasas.com> Reviewed by: gallatin Tested by: Greg Foster <gfoster@panasas.com> MFC after: 3 days Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D35070	2022-04-27 12:41:30 -07:00
Reid Linnemann	0abcc1d2d3	pf: Add per-rule timestamps for rule and eth_rule Similar to ipfw rule timestamps, these timestamps internally are uint32_t snaps of the system time in seconds. The timestamp is CPU local and updated each time a rule or a state associated with a rule or state is matched. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34970	2022-04-22 19:53:20 +02:00
Kristof Provost	812839e5aa	pf: allow the use of tables in ethernet rules Allow tables to be used for the l3 source/destination matching. This requires taking the PF_RULES read lock. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34917	2022-04-20 13:01:12 +02:00
John Baldwin	ac3e46fa3e	infiniband_resolve_addr: ih is only used for INET or INET6.	2022-04-13 16:08:21 -07:00
John Baldwin	d98981585c	ether_resolve_addr: eh is only used for INET or INET6.	2022-04-13 16:08:21 -07:00
John Baldwin	2884a93651	vlan: ifa is only used under #ifdef INET.	2022-04-13 16:08:21 -07:00
John Baldwin	2174f0f2f2	net/route: Use __diagused for variables only used in KASSERT().	2022-04-13 16:08:19 -07:00
Kristof Provost	742e7210d0	udp: allow udp_tun_func_t() to indicate it did not eat the packet Allow udp tunnel functions to indicate they have not taken ownership of the packet, and that normal UDP processing should continue. This is especially useful for scenarios where the kernel has taken ownership of a socket that was originally created by userspace. It allows the tunnel function to pass through certain packets for userspace processing. The primary user of this is if_ovpn, when it receives messages from unknown peers (which might be a new client). Reviewed by: tuexen Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34883	2022-04-12 10:04:59 +02:00
Gordon Bergling	1a15a383a6	net: Fix a typo in a source code comment - s/peform/perform/ MFC after: 3 days	2022-04-09 11:37:57 +02:00
John Baldwin	d08cb45362	iflib: Use empty inline functions for prefetch() on non-x86. This avoids warnings about unused variables in expressions passed to prefetch().	2022-04-08 17:25:14 -07:00
Mark Johnston	990a6d18b0	net: Fix memory leaks in lltable_calc_llheader() error paths Also convert raw epoch_call() calls to lltable_free_entry() calls, no functional change intended. There's no need to asynchronously free the LLEs in that case to begin with, but we might as well use the lltable interfaces consistently. Noticed by code inspection; I believe lltable_calc_llheader() failures do not generally happen in practice. Reviewed by: bz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34832	2022-04-08 11:47:25 -04:00
John Baldwin	f7236dd068	change_mpath_route: Remove write-only nh variable. While here, cleanup the style of the function prologue by moving an assignment out of the middle of two variable declaration blocks.	2022-04-06 16:45:28 -07:00
John Baldwin	371c917b0b	unlink_nhgrp: Remove write-only variable. Possibly one could assert that ret should always be 0 here (that is, that there was always an index found in the bitmask). That should be true since a bitmask index is allocated before the nhgrp is inserted in the ctl->gr_head list in link_nhgrp.	2022-04-06 16:45:27 -07:00
Warner Losh	e606e5d157	sysctl_dumpentry: move error to inner scope Sponsored by: Netflix	2022-04-04 22:30:50 -06:00
Warner Losh	5de5b5a34d	route_ctl: eliminate write only variables ifa and nh Sponsored by: Netflix	2022-04-04 22:30:48 -06:00
Warner Losh	7f9c3339a4	get_nhop: eliminate write only variable gateway Sponsored by: Netflix	2022-04-04 22:30:47 -06:00
Gordon Bergling	d792dc7ebb	net(4): Fix a typo in a source code comment - s/accomodate/accommodate/ MFC after: 3 days	2022-04-02 14:57:06 +02:00
Gordon Bergling	cba46da538	net(3): Fix a typo in a source code comment - s/verion/version/ MFC after: 3 days	2022-04-02 10:53:40 +02:00
Gordon Bergling	f8d292b665	net(3): Fix a typo in a source code comment - s/Multilik/Multilink/ Obtained from: NetBSD MFC after: 3 days	2022-04-02 09:41:10 +02:00
Gordon Bergling	23677398ca	net(3): Fix a typo in a source code comment - s/paramenters/parameters/ MFC after: 3 days	2022-04-02 09:24:48 +02:00
Kristof Provost	9bb06778f8	pf: support listing ethernet anchors Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-30 10:28:19 +02:00
Gordon Bergling	bef80a7285	vxlan(4): Fix two typos in sysctl descriptions - s/fowarding/forwarding/ MFC after: 3 days	2022-03-28 19:35:34 +02:00
Mateusz Guzik	bd7762c869	pf: add a rule rb tree with md5 sum used as key. This gets rid of the quadratic rule traversal when "keep_counters" is set. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:45:03 +00:00
Mateusz Guzik	1a3e98a5b8	pf: pre-compute rule hash Makes it cheaper to compare rules when "keep_counters" is set. This also sets up keeping them in a RB tree. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:44:52 +00:00
Mateusz Guzik	93f8c38c03	pf: add pf_config_lock For now only protects rule creation/destruction, but will allow gradually reducing the scope of rules lock when changing the rules. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:44:46 +00:00
Alexander V. Chernikov	1b8b69508b	routing: copy nexthop fib when changing existing nexthop MFC after: 1 day	2022-03-28 11:32:30 +00:00
Gordon Bergling	ef88adc527	pf(4): Fix a typo in a source code comment - s/seaching/searching/ MFC after: 3 days	2022-03-27 19:57:49 +02:00
Kristof Provost	0bf7acd6b7	if_epair: build fix `66acf7685b` failed to build on riscv (and mips). This is because the atomic_testandset_int() (and friends) functions do not exist there. Happily those platforms do have the long variant, so switch to that. PR: 262571 MFC after: 3 days	2022-03-17 06:43:47 +01:00
Michael Gmelin	66acf7685b	if_epair: fix race condition on multi-core systems As an unwanted side effect of the performance improvements in `24f0bfbad5`, epair interfaces stop forwarding traffic on higher load levels when running on multi-core systems. This happens due to a race condition in the logic that decides when to place work in the task queue(s) responsible for processing the content of ring buffers. In order to fix this, a field named state is added to the epair_queue structure. This field is used by the affected functions to signal each other that something happened in the underlying ring buffers that might require work to be scheduled in task queue(s), replacing the existing logic, which relied on checking if ring buffers are empty or not. epair_menq() does: - set BIT_MBUF_QUEUED - queue mbuf - if testandset BIT_QUEUE_TASK: enqueue task epair_tx_start_deferred() does: - swap ring buffers - process mbufs - clear BIT_QUEUE_TASK - if testandclear BIT_MBUF_QUEUED enqueue task PR: 262571 Reported by: Johan Hendriks <joh.hendriks@gmail.com> MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34569	2022-03-16 23:08:55 +01:00
Kristof Provost	8a42005d1e	pf: support basic L3 filtering in the Ethernet rules Allow filtering based on the source or destination IP/IPv6 address in the Ethernet layer rules. Reviewed by: pauamma_gundo.com (man), debdrup (man) Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34482	2022-03-14 22:42:37 +01:00
Mateusz Guzik	f11b6505f1	pf: add PF_UNLNKDRULES_ASSERT Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-10 17:20:41 +00:00
Vincenzo Maffione	09a1893398	netmap: fix refcount bug in netmap allocator Symptom: when a single extmem memory region is provided to netmap multiple times, for multiple interfaces, the memory region is never released by netmap once all the existing file descriptors are closed. Fix the relevant condition in netmap_mem_drop(): release the memory when the last user of netmap_adapter is gone, rather then when the last user of netmap_mem_d is gone. MFC after: 2 weeks	2022-03-06 16:39:16 +00:00
Santiago Martinez	52bcdc5b80	if_epair: fix build with RSS and INET or INET6 disabled Reviewed by: kp MFC after: 1 week	2022-03-03 18:31:26 +01:00
Kristof Provost	b590f17a11	pf: support masking mac addresses When filtering Ethernet packets allow rules to specify a mac address with a mask. This indicates which bits of the specified address are significant. This allows users to do things like filter based on device manufacturer. Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-02 17:00:08 +01:00
Kristof Provost	c5131afee3	pf: add anchor support for ether rules Support anchors in ether rules. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32482	2022-03-02 17:00:07 +01:00
Kristof Provost	fb330f3931	pf: support dummynet on L2 rules Allow packets to be tagged with dummynet information. Note that we do not apply dummynet shaping on the L2 traffic, but instead mark it for dummynet processing in the L3 code. This is the same approach as we take for ALTQ. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32222	2022-03-02 17:00:06 +01:00
Kristof Provost	20c4899a8e	pf: Do not hold PF_RULES_RLOCK while processing Ethernet rules Avoid the overhead of acquiring a (read) RULES lock when processing the Ethernet rules. We can get away with that because when rules are modified they're staged in V_pf_keth_inactive. We take care to ensure the swap to V_pf_keth is atomic, so that pf_test_eth_rule() always sees either the old rules, or the new ruleset. We need to take care not to delete the old ruleset until we're sure no pf_test_eth_rule() is still running with those. We accomplish that by using NET_EPOCH_CALL() to actually free the old rules. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31739	2022-03-02 17:00:03 +01:00
Kristof Provost	e732e742b3	pf: Initial Ethernet level filtering code This is the kernel side of stateless Ethernel level filtering for pf. The primary use case for this is to enable captive portal functionality to allow/deny access by MAC address, rather than per IP address. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31737	2022-03-02 17:00:03 +01:00
Kristof Provost	36637dd19d	bridge: Don't share broadcast packets if_bridge duplicates broadcast packets with m_copypacket(), which creates shared packets. In certain circumstances these packets can be processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as part of the checksum verification. That may lead to incorrect packets being transmitted. Use m_dup() to create independent mbufs instead. Reported by: Richard Russo <toast@ruka.org> Reviewed by: donner, afedorov MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D34319	2022-02-21 19:03:44 +01:00
Mateusz Guzik	430e0e409c	vnet: add CURVNET_ASSERT_SET for !VIMAGE Reported by: ler Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-02-19 21:00:00 +00:00
Mateusz Guzik	75cde1f872	vnet: add CURVNET_ASSERT_SET Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34312	2022-02-19 13:10:01 +00:00
Li-Wen Hsu	7442b63231	if_epair: Use ANSI C definition This fixes -Werror=strict-prototypes from gcc9 Sponsored by: The FreeBSD Foundation	2022-02-15 21:45:22 +08:00
Kristof Provost	24f0bfbad5	if_epair: implement fanout Allow multiple cores to be used to process if_epair traffic. We do this (if RSS is enabled) based on the RSS hash of the incoming packet. This allows us to distribute the load over multiple cores, rather than sending everything to the same one. We also switch from swi_sched() to taskqueues, which also contributes to better throughput. Benchmark results: With net.isr.maxthreads=-1 Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1) Before 627 Kpps After (no RSS) 1.198 Mpps After (RSS) 3.148 Mpps Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1) Before 7.705 Kpps After (no RSS) 1.017 Mpps After (RSS) 2.083 Mpps MFC after: 3 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D33731	2022-02-15 09:03:24 +01:00

1 2 3 4 5 ...

4882 Commits