freebsd-dev

Author	SHA1	Message	Date
Mark Johnston	990a6d18b0	net: Fix memory leaks in lltable_calc_llheader() error paths Also convert raw epoch_call() calls to lltable_free_entry() calls, no functional change intended. There's no need to asynchronously free the LLEs in that case to begin with, but we might as well use the lltable interfaces consistently. Noticed by code inspection; I believe lltable_calc_llheader() failures do not generally happen in practice. Reviewed by: bz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34832	2022-04-08 11:47:25 -04:00
John Baldwin	f7236dd068	change_mpath_route: Remove write-only nh variable. While here, cleanup the style of the function prologue by moving an assignment out of the middle of two variable declaration blocks.	2022-04-06 16:45:28 -07:00
John Baldwin	371c917b0b	unlink_nhgrp: Remove write-only variable. Possibly one could assert that ret should always be 0 here (that is, that there was always an index found in the bitmask). That should be true since a bitmask index is allocated before the nhgrp is inserted in the ctl->gr_head list in link_nhgrp.	2022-04-06 16:45:27 -07:00
Warner Losh	e606e5d157	sysctl_dumpentry: move error to inner scope Sponsored by: Netflix	2022-04-04 22:30:50 -06:00
Warner Losh	5de5b5a34d	route_ctl: eliminate write only variables ifa and nh Sponsored by: Netflix	2022-04-04 22:30:48 -06:00
Warner Losh	7f9c3339a4	get_nhop: eliminate write only variable gateway Sponsored by: Netflix	2022-04-04 22:30:47 -06:00
Gordon Bergling	d792dc7ebb	net(4): Fix a typo in a source code comment - s/accomodate/accommodate/ MFC after: 3 days	2022-04-02 14:57:06 +02:00
Gordon Bergling	cba46da538	net(3): Fix a typo in a source code comment - s/verion/version/ MFC after: 3 days	2022-04-02 10:53:40 +02:00
Gordon Bergling	f8d292b665	net(3): Fix a typo in a source code comment - s/Multilik/Multilink/ Obtained from: NetBSD MFC after: 3 days	2022-04-02 09:41:10 +02:00
Gordon Bergling	23677398ca	net(3): Fix a typo in a source code comment - s/paramenters/parameters/ MFC after: 3 days	2022-04-02 09:24:48 +02:00
Kristof Provost	9bb06778f8	pf: support listing ethernet anchors Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-30 10:28:19 +02:00
Gordon Bergling	bef80a7285	vxlan(4): Fix two typos in sysctl descriptions - s/fowarding/forwarding/ MFC after: 3 days	2022-03-28 19:35:34 +02:00
Mateusz Guzik	bd7762c869	pf: add a rule rb tree with md5 sum used as key. This gets rid of the quadratic rule traversal when "keep_counters" is set. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:45:03 +00:00
Mateusz Guzik	1a3e98a5b8	pf: pre-compute rule hash Makes it cheaper to compare rules when "keep_counters" is set. This also sets up keeping them in a RB tree. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:44:52 +00:00
Mateusz Guzik	93f8c38c03	pf: add pf_config_lock For now only protects rule creation/destruction, but will allow gradually reducing the scope of rules lock when changing the rules. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:44:46 +00:00
Alexander V. Chernikov	1b8b69508b	routing: copy nexthop fib when changing existing nexthop MFC after: 1 day	2022-03-28 11:32:30 +00:00
Gordon Bergling	ef88adc527	pf(4): Fix a typo in a source code comment - s/seaching/searching/ MFC after: 3 days	2022-03-27 19:57:49 +02:00
Kristof Provost	0bf7acd6b7	if_epair: build fix `66acf7685b` failed to build on riscv (and mips). This is because the atomic_testandset_int() (and friends) functions do not exist there. Happily those platforms do have the long variant, so switch to that. PR: 262571 MFC after: 3 days	2022-03-17 06:43:47 +01:00
Michael Gmelin	66acf7685b	if_epair: fix race condition on multi-core systems As an unwanted side effect of the performance improvements in `24f0bfbad5`, epair interfaces stop forwarding traffic on higher load levels when running on multi-core systems. This happens due to a race condition in the logic that decides when to place work in the task queue(s) responsible for processing the content of ring buffers. In order to fix this, a field named state is added to the epair_queue structure. This field is used by the affected functions to signal each other that something happened in the underlying ring buffers that might require work to be scheduled in task queue(s), replacing the existing logic, which relied on checking if ring buffers are empty or not. epair_menq() does: - set BIT_MBUF_QUEUED - queue mbuf - if testandset BIT_QUEUE_TASK: enqueue task epair_tx_start_deferred() does: - swap ring buffers - process mbufs - clear BIT_QUEUE_TASK - if testandclear BIT_MBUF_QUEUED enqueue task PR: 262571 Reported by: Johan Hendriks <joh.hendriks@gmail.com> MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34569	2022-03-16 23:08:55 +01:00
Kristof Provost	8a42005d1e	pf: support basic L3 filtering in the Ethernet rules Allow filtering based on the source or destination IP/IPv6 address in the Ethernet layer rules. Reviewed by: pauamma_gundo.com (man), debdrup (man) Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34482	2022-03-14 22:42:37 +01:00
Mateusz Guzik	f11b6505f1	pf: add PF_UNLNKDRULES_ASSERT Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-10 17:20:41 +00:00
Vincenzo Maffione	09a1893398	netmap: fix refcount bug in netmap allocator Symptom: when a single extmem memory region is provided to netmap multiple times, for multiple interfaces, the memory region is never released by netmap once all the existing file descriptors are closed. Fix the relevant condition in netmap_mem_drop(): release the memory when the last user of netmap_adapter is gone, rather then when the last user of netmap_mem_d is gone. MFC after: 2 weeks	2022-03-06 16:39:16 +00:00
Santiago Martinez	52bcdc5b80	if_epair: fix build with RSS and INET or INET6 disabled Reviewed by: kp MFC after: 1 week	2022-03-03 18:31:26 +01:00
Kristof Provost	b590f17a11	pf: support masking mac addresses When filtering Ethernet packets allow rules to specify a mac address with a mask. This indicates which bits of the specified address are significant. This allows users to do things like filter based on device manufacturer. Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-02 17:00:08 +01:00
Kristof Provost	c5131afee3	pf: add anchor support for ether rules Support anchors in ether rules. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32482	2022-03-02 17:00:07 +01:00
Kristof Provost	fb330f3931	pf: support dummynet on L2 rules Allow packets to be tagged with dummynet information. Note that we do not apply dummynet shaping on the L2 traffic, but instead mark it for dummynet processing in the L3 code. This is the same approach as we take for ALTQ. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32222	2022-03-02 17:00:06 +01:00
Kristof Provost	20c4899a8e	pf: Do not hold PF_RULES_RLOCK while processing Ethernet rules Avoid the overhead of acquiring a (read) RULES lock when processing the Ethernet rules. We can get away with that because when rules are modified they're staged in V_pf_keth_inactive. We take care to ensure the swap to V_pf_keth is atomic, so that pf_test_eth_rule() always sees either the old rules, or the new ruleset. We need to take care not to delete the old ruleset until we're sure no pf_test_eth_rule() is still running with those. We accomplish that by using NET_EPOCH_CALL() to actually free the old rules. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31739	2022-03-02 17:00:03 +01:00
Kristof Provost	e732e742b3	pf: Initial Ethernet level filtering code This is the kernel side of stateless Ethernel level filtering for pf. The primary use case for this is to enable captive portal functionality to allow/deny access by MAC address, rather than per IP address. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31737	2022-03-02 17:00:03 +01:00
Kristof Provost	36637dd19d	bridge: Don't share broadcast packets if_bridge duplicates broadcast packets with m_copypacket(), which creates shared packets. In certain circumstances these packets can be processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as part of the checksum verification. That may lead to incorrect packets being transmitted. Use m_dup() to create independent mbufs instead. Reported by: Richard Russo <toast@ruka.org> Reviewed by: donner, afedorov MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D34319	2022-02-21 19:03:44 +01:00
Mateusz Guzik	430e0e409c	vnet: add CURVNET_ASSERT_SET for !VIMAGE Reported by: ler Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-02-19 21:00:00 +00:00
Mateusz Guzik	75cde1f872	vnet: add CURVNET_ASSERT_SET Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34312	2022-02-19 13:10:01 +00:00
Li-Wen Hsu	7442b63231	if_epair: Use ANSI C definition This fixes -Werror=strict-prototypes from gcc9 Sponsored by: The FreeBSD Foundation	2022-02-15 21:45:22 +08:00
Kristof Provost	24f0bfbad5	if_epair: implement fanout Allow multiple cores to be used to process if_epair traffic. We do this (if RSS is enabled) based on the RSS hash of the incoming packet. This allows us to distribute the load over multiple cores, rather than sending everything to the same one. We also switch from swi_sched() to taskqueues, which also contributes to better throughput. Benchmark results: With net.isr.maxthreads=-1 Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1) Before 627 Kpps After (no RSS) 1.198 Mpps After (RSS) 3.148 Mpps Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1) Before 7.705 Kpps After (no RSS) 1.017 Mpps After (RSS) 2.083 Mpps MFC after: 3 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D33731	2022-02-15 09:03:24 +01:00
Kristof Provost	78bc3d5e17	vlan: allow net.link.vlan.mtag_pcp to be set per vnet The primary reason for this change is to facilitate testing. MFC after: 1 week	2022-02-14 22:51:10 +01:00
Aleksandr Fedorov	ceaf442ff2	if_vxlan(4): Allow netmap_generic to intercept RX packets. Netmap (generic) intercepts the if_input method to handle RX packets. Call ifp->if_input() instead of netisr_dispatch(). Add stricter check for incoming packet length. This change is very useful with bhyve + vale + if_vxlan. Reviewed by: vmaffione (mentor), kib, np, donner Approved by: vmaffione (mentor), kib, np, donner MFC after: 2 weeks Sponsored by: vstack.com Differential Revision: https://reviews.freebsd.org/D30638	2022-02-06 15:27:46 +03:00
Kristof Provost	4daa31c108	pflog: align header to 4 bytes, not 8 `6d4baa0d01` incorrectly rounded the lenght of the pflog header up to 8 bytes, rather than 4. PR: 261566 Reported by: Guy Harris <gharris@sonic.net> MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-02-01 18:17:44 +01:00
Mark Johnston	773e3a71b2	pf: Initialize pf_kpool mutexes earlier There are some error paths in ioctl handlers that will call pf_krule_free() before the rule's rpool.mtx field is initialized, causing a panic with INVARIANTS enabled. Fix the problem by introducing pf_krule_alloc() and initializing the mutex there. This does mean that the rule->krule and pool->kpool conversion functions need to stop zeroing the input structure, but I don't see a nicer way to handle this except perhaps by guarding the mtx_destroy() with a mtx_initialized() check. Constify some related functions while here and add a regression test based on a syzkaller reproducer. Reported by: syzbot+77cd12872691d219c158@syzkaller.appspotmail.com Reviewed by: kp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34115	2022-01-31 16:14:00 -05:00
Gleb Smirnoff	964b8f8b99	ifnet: garbage collect unused function ifaddr_byindex(). Last use was removed in `5adea417d4`.	2022-01-28 09:51:52 -08:00
Gleb Smirnoff	6abb5043a6	rtsock: always set m_pkthdr.rcvif when queueing on netisr netisr uses global workstreams and after dequeueing an mbuf it uses rcvif to get the VNET of the mbuf. Of course, this is not needed when kernel is compiled without VIMAGE. It came out that routing socket does not set rcvif if compiled without VIMAGE. Make this assignment not depending on VIMAGE option. Fixes: `6871de9363`	2022-01-27 09:41:31 -08:00
Gleb Smirnoff	6871de9363	netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	e1882428dc	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show `e6abef0918`	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	91f44749c6	ifnet: make if_index global Now that ifindex is static to if.c we can unvirtualize it. For lifetime of an ifnet its index never changes. To avoid leaking foreign interfaces the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI filter their returned value on curvnet. Since if_vmove() no longer changes the if_index, inline ifindex_alloc() and ifindex_free() into if_alloc() and if_free() respectively. API wise the only change is that now minimum interface index can be greater than 1. The holes in interface indexes were always allowed. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33672	2022-01-26 21:58:44 -08:00
Hans Petter Selasky	c8f2c290e4	Add definitions for TLS receive tags using the existing send tag infrastructure. Although send tags are strictly used for transmit, the name might be changed in the future to be more generic. The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the network driver itself when creating the TLS RX decryption offload filter. TLS receive tags have a modify callback to tell the network driver about the progress of decryption. Currently decryption is done IP packet by IP packet, even if the IP packet contains a partial TLS record. The modify callback allows the network driver to keep track of TCP sequence numbers pointing to the beginning of TLS records after TCP packet reassembly. These callbacks only happen when encrypted or partially decrypted data is received and are used to verify the decryptions starting point for the hardware. Typically the hardware will guess where TLS headers start and needs help from the software to know if the guess was correct. This is the purpose of the modify callback. Differential Revision: https://reviews.freebsd.org/D32356 Discussed with: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking	2022-01-26 12:55:00 +01:00
Gleb Smirnoff	6d1808f051	if_clone: correctly destroy a clone from a different vnet Try to live with cruel reality fact - if_vmove doesn't move an interface from previous vnet cloning infrastructure to the new one. Let's admit this as design feature and make it work better. * Delete two blocks of code that would fallback to vnet0, if a cloner isn't found. They didn't do any good job and also whole idea of treating vnet0 as special one is wrong. * When deleting a cloned interface, lookup its cloner using it's home vnet. With this change simple sequence works correctly: ifconfig foo0 create jail -c name=jj persist vnet vnet.interface=foo0 jexec jj ifconfig foo0 destroy Differential revision: https://reviews.freebsd.org/D33942	2022-01-24 21:07:16 -08:00
Gleb Smirnoff	54712fc423	if_vmove: improve restoration in cloner's ifgroup membership * Do a single call into if_clone.c instead of two. The cloner can't disappear since the interface sits on its list. * Make restoration smarter - check that cloner with same name exists in the new vnet. Differential revision: https://reviews.freebsd.org/D33941	2022-01-24 21:06:59 -08:00
Eric Joyner	213e91399b	iflib: Allow drivers to determine which queue to TX on Adds a new function pointer to struct if_txrx in order to allow drivers to set their own function that will determine which queue a packet should be sent on. Since this includes a kernel ABI change, bump the __FreeBSD_version as well. (This motivation behind this is to allow the driver to examine the UP in the VLAN tag and determine which queue to TX on based on that, in support of HW TX traffic shaping.) Signed-off-by: Eric Joyner <erj@FreeBSD.org> Reviewed by: kbowling@, stallamr@netapp.com Tested by: jeffrey.e.pieper@intel.com Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D31485	2022-01-24 18:22:02 -08:00
Vincenzo Maffione	e0e1240528	netmap: fix LOR in iflib_netmap_register In iflib_device_register(), the CTX_LOCK is acquired first and then IFNET_WLOCK is acquired by ether_ifattach(). However, in netmap_hw_reg() we do the opposite: IFNET_RLOCK is acquired first, and then CTX_LOCK is acquired by iflib_netmap_register(). Fix this LOR issue by wrapping the CTX_LOCK/UNLOCK calls in iflib_device_register with an additional IFNET_WLOCK. This is safe since the IFNET_WLOCK is recursive. MFC after: 1 month	2022-01-14 21:09:04 +00:00
Kristof Provost	5f5e32f1b3	pf: protect the rpool from races The roundrobin pool stores its state in the rule, which could potentially lead to invalid addresses being returned. For example, thread A just executed PF_AINC(&rpool->counter) and immediately afterwards thread B executes PF_ACPY(naddr, &rpool->counter) (i.e. after the pf_match_addr() check of rpool->counter). Lock the rpool with its own mutex to prevent these races. The performance impact of this is expected to be low, as each rule has its own lock, and the lock is also only relevant when state is being created (so only for the initial packets of a connection, not for all traffic). See also: https://redmine.pfsense.org/issues/12660 Reviewed by: glebius MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33874	2022-01-14 10:30:33 +01:00
Alexander Motin	618d49f5ca	Revert "iflib: Relax timer period from 0.5 to 0.5-0.75s." I've noticed relations between iflib_timer() vs ixl_admin_timer(). Both scheduled at the same 2Hz rate, but the second is rescheduling the first each time, so if the first get any slower, it won't be executed at all. Revert this until deeper investigation. This reverts commit `90bc1cf657`.	2022-01-10 09:40:38 -05:00
Alexander Motin	90bc1cf657	iflib: Relax timer period from 0.5 to 0.5-0.75s. While there switch it from hardclock ticks to milliseconds. MFC after: 2 weeks	2022-01-09 20:32:50 -05:00

1 2 3 4 5 ...

4865 Commits