Commit Graph

4838 Commits

Author SHA1 Message Date
Kristof Provost
e732e742b3 pf: Initial Ethernet level filtering code
This is the kernel side of stateless Ethernel level filtering for pf.

The primary use case for this is to enable captive portal functionality
to allow/deny access by MAC address, rather than per IP address.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31737
2022-03-02 17:00:03 +01:00
Kristof Provost
36637dd19d bridge: Don't share broadcast packets
if_bridge duplicates broadcast packets with m_copypacket(), which
creates shared packets. In certain circumstances these packets can be
processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as
part of the checksum verification. That may lead to incorrect packets
being transmitted.

Use m_dup() to create independent mbufs instead.

Reported by:	Richard Russo <toast@ruka.org>
Reviewed by:	donner, afedorov
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D34319
2022-02-21 19:03:44 +01:00
Mateusz Guzik
430e0e409c vnet: add CURVNET_ASSERT_SET for !VIMAGE
Reported by:	ler
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-02-19 21:00:00 +00:00
Mateusz Guzik
75cde1f872 vnet: add CURVNET_ASSERT_SET
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34312
2022-02-19 13:10:01 +00:00
Li-Wen Hsu
7442b63231
if_epair: Use ANSI C definition
This fixes -Werror=strict-prototypes from gcc9

Sponsored by:	The FreeBSD Foundation
2022-02-15 21:45:22 +08:00
Kristof Provost
24f0bfbad5 if_epair: implement fanout
Allow multiple cores to be used to process if_epair traffic. We do this
(if RSS is enabled) based on the RSS hash of the incoming packet. This
allows us to distribute the load over multiple cores, rather than
sending everything to the same one.

We also switch from swi_sched() to taskqueues, which also contributes to
better throughput.

Benchmark results:
With net.isr.maxthreads=-1

Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1)

Before          627 Kpps
After (no RSS)  1.198 Mpps
After (RSS)     3.148 Mpps

Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1)

Before          7.705 Kpps
After (no RSS)  1.017 Mpps
After (RSS)     2.083 Mpps

MFC after:	3 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D33731
2022-02-15 09:03:24 +01:00
Kristof Provost
78bc3d5e17 vlan: allow net.link.vlan.mtag_pcp to be set per vnet
The primary reason for this change is to facilitate testing.

MFC after:	1 week
2022-02-14 22:51:10 +01:00
Aleksandr Fedorov
ceaf442ff2 if_vxlan(4): Allow netmap_generic to intercept RX packets.
Netmap (generic) intercepts the if_input method to handle RX packets.

Call ifp->if_input() instead of netisr_dispatch().
Add stricter check for incoming packet length.

This change is very useful with bhyve + vale + if_vxlan.

Reviewed by:	vmaffione (mentor), kib, np, donner
Approved by:	vmaffione (mentor), kib, np, donner
MFC after:	2 weeks
Sponsored by:	vstack.com
Differential Revision:	https://reviews.freebsd.org/D30638
2022-02-06 15:27:46 +03:00
Kristof Provost
4daa31c108 pflog: align header to 4 bytes, not 8
6d4baa0d01 incorrectly rounded the lenght of the pflog header up to 8
bytes, rather than 4.

PR:		261566
Reported by:	Guy Harris <gharris@sonic.net>
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-02-01 18:17:44 +01:00
Mark Johnston
773e3a71b2 pf: Initialize pf_kpool mutexes earlier
There are some error paths in ioctl handlers that will call
pf_krule_free() before the rule's rpool.mtx field is initialized,
causing a panic with INVARIANTS enabled.

Fix the problem by introducing pf_krule_alloc() and initializing the
mutex there.  This does mean that the rule->krule and pool->kpool
conversion functions need to stop zeroing the input structure, but I
don't see a nicer way to handle this except perhaps by guarding the
mtx_destroy() with a mtx_initialized() check.

Constify some related functions while here and add a regression test
based on a syzkaller reproducer.

Reported by:	syzbot+77cd12872691d219c158@syzkaller.appspotmail.com
Reviewed by:	kp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34115
2022-01-31 16:14:00 -05:00
Gleb Smirnoff
964b8f8b99 ifnet: garbage collect unused function ifaddr_byindex().
Last use was removed in 5adea417d4.
2022-01-28 09:51:52 -08:00
Gleb Smirnoff
6abb5043a6 rtsock: always set m_pkthdr.rcvif when queueing on netisr
netisr uses global workstreams and after dequeueing an mbuf it
uses rcvif to get the VNET of the mbuf.  Of course, this is not
needed when kernel is compiled without VIMAGE.  It came out that
routing socket does not set rcvif if compiled without VIMAGE.
Make this assignment not depending on VIMAGE option.

Fixes:	6871de9363
2022-01-27 09:41:31 -08:00
Gleb Smirnoff
6871de9363 netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs
Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33268
2022-01-26 21:58:50 -08:00
Gleb Smirnoff
e1882428dc ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33266
Fun note:		git show e6abef0918
2022-01-26 21:58:50 -08:00
Gleb Smirnoff
91f44749c6 ifnet: make if_index global
Now that ifindex is static to if.c we can unvirtualize it.  For lifetime
of an ifnet its index never changes.  To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet.  Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.

API wise the only change is that now minimum interface index can be
greater than 1.  The holes in interface indexes were always allowed.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33672
2022-01-26 21:58:44 -08:00
Hans Petter Selasky
c8f2c290e4 Add definitions for TLS receive tags using the existing send tag infrastructure.
Although send tags are strictly used for transmit, the name might be changed
in the future to be more generic.

The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any
VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the
network driver itself when creating the TLS RX decryption offload filter.

TLS receive tags have a modify callback to tell the network driver about
the progress of decryption. Currently decryption is done IP packet by IP
packet, even if the IP packet contains a partial TLS record. The modify
callback allows the network driver to keep track of TCP sequence numbers
pointing to the beginning of TLS records after TCP packet reassembly.
These callbacks only happen when encrypted or partially decrypted data is
received and are used to verify the decryptions starting point for the
hardware. Typically the hardware will guess where TLS headers start and
needs help from the software to know if the guess was correct. This is
the purpose of the modify callback.

Differential Revision:	https://reviews.freebsd.org/D32356
Discussed with:	jhb@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-01-26 12:55:00 +01:00
Gleb Smirnoff
6d1808f051 if_clone: correctly destroy a clone from a different vnet
Try to live with cruel reality fact - if_vmove doesn't move an
interface from previous vnet cloning infrastructure to the new
one.  Let's admit this as design feature and make it work better.

* Delete two blocks of code that would fallback to vnet0, if a
  cloner isn't found.  They didn't do any good job and also whole
  idea of treating vnet0 as special one is wrong.
* When deleting a cloned interface, lookup its cloner using it's
  home vnet.

With this change simple sequence works correctly:

  ifconfig foo0 create
  jail -c name=jj persist vnet vnet.interface=foo0
  jexec jj ifconfig foo0 destroy

Differential revision:	https://reviews.freebsd.org/D33942
2022-01-24 21:07:16 -08:00
Gleb Smirnoff
54712fc423 if_vmove: improve restoration in cloner's ifgroup membership
* Do a single call into if_clone.c instead of two.  The cloner
  can't disappear since the interface sits on its list.
* Make restoration smarter - check that cloner with same name
  exists in the new vnet.

Differential revision:	https://reviews.freebsd.org/D33941
2022-01-24 21:06:59 -08:00
Eric Joyner
213e91399b
iflib: Allow drivers to determine which queue to TX on
Adds a new function pointer to struct if_txrx in order to allow
drivers to set their own function that will determine which queue
a packet should be sent on.

Since this includes a kernel ABI change, bump the __FreeBSD_version
as well.

(This motivation behind this is to allow the driver to examine the
UP in the VLAN tag and determine which queue to TX on based on
that, in support of HW TX traffic shaping.)

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	kbowling@, stallamr@netapp.com
Tested by:	jeffrey.e.pieper@intel.com
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D31485
2022-01-24 18:22:02 -08:00
Vincenzo Maffione
e0e1240528 netmap: fix LOR in iflib_netmap_register
In iflib_device_register(), the CTX_LOCK is acquired first and then
IFNET_WLOCK is acquired by ether_ifattach(). However, in netmap_hw_reg()
we do the opposite: IFNET_RLOCK is acquired first, and then CTX_LOCK
is acquired by iflib_netmap_register(). Fix this LOR issue by wrapping
the CTX_LOCK/UNLOCK calls in iflib_device_register with an additional
IFNET_WLOCK. This is safe since the IFNET_WLOCK is recursive.

MFC after:	1 month
2022-01-14 21:09:04 +00:00
Kristof Provost
5f5e32f1b3 pf: protect the rpool from races
The roundrobin pool stores its state in the rule, which could
potentially lead to invalid addresses being returned.

For example, thread A just executed PF_AINC(&rpool->counter) and
immediately afterwards thread B executes PF_ACPY(naddr, &rpool->counter)
(i.e. after the pf_match_addr() check of rpool->counter).

Lock the rpool with its own mutex to prevent these races. The
performance impact of this is expected to be low, as each rule has its
own lock, and the lock is also only relevant when state is being created
(so only for the initial packets of a connection, not for all traffic).

See also:	https://redmine.pfsense.org/issues/12660
Reviewed by:	glebius
MFC after:	3 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33874
2022-01-14 10:30:33 +01:00
Alexander Motin
618d49f5ca Revert "iflib: Relax timer period from 0.5 to 0.5-0.75s."
I've noticed relations between iflib_timer() vs ixl_admin_timer().
Both scheduled at the same 2Hz rate, but the second is rescheduling
the first each time, so if the first get any slower, it won't be
executed at all.  Revert this until deeper investigation.

This reverts commit 90bc1cf657.
2022-01-10 09:40:38 -05:00
Alexander Motin
90bc1cf657 iflib: Relax timer period from 0.5 to 0.5-0.75s.
While there switch it from hardclock ticks to milliseconds.

MFC after:	2 weeks
2022-01-09 20:32:50 -05:00
Ryan Stone
5adea417d4 Fix ifa refcount leak in ifa_ifwithnet()
In 4f6c66cc9c, ifa_ifwithnet() was changed to no longer
ifa_ref() the returned ifaddr, and instead the caller was required
to stay in the net_epoch for as long as they wanted the ifaddr
to remain valid.  However, this missed the case where an AF_LINK
lookup would call ifaddr_byindex(), which still does ifa_ref()
the ifaddr.  This would cause a refcount leak.

Fix this by inlining the relevant parts of ifaddr_byindex() here,
with the ifa_ref() call removed.  This also avoids an unnecessary
entry and exit from the net_epoch for this case.

I've audited all in-tree consumers of ifa_ifwithnet() that could
possibly perform an AF_LINK lookup and confirmed that none of them
will expect the ifaddr to have a reference that they need to
release.

MFC after: 2 months
Sponsored by: Dell Inc
Differential Revision:	https://reviews.freebsd.org/D28705
Reviewed by: melifaro
2022-01-06 15:04:24 -05:00
Ed Maste
a6668e31aa Fix kernel build without INET and INET6
Reviewed by:	brooks, melifaro
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33718
2022-01-05 09:41:38 -05:00
Gleb Smirnoff
644ca0846d domains: make domain_init() initialize only global state
Now that each module handles its global and VNET initialization
itself, there is no VNET related stuff left to do in domain_init().

Differential revision:	https://reviews.freebsd.org/D33541
2022-01-03 10:15:22 -08:00
Gleb Smirnoff
89128ff3e4 protocols: init with standard SYSINIT(9) or VNET_SYSINIT
The historical BSD network stack loop that rolls over domains and
over protocols has no advantages over more modern SYSINIT(9).
While doing the sweep, split global and per-VNET initializers.

Getting rid of pr_init allows to achieve several things:
o Get rid of ifdef's that protect against double foo_init() when
  both INET and INET6 are compiled in.
o Isolate initializers statically to the module they init.
o Makes code easier to understand and maintain.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D33537
2022-01-03 10:15:21 -08:00
Ed Maste
818952c638 Fix kernel build without INET6
Reported by:	Gary Jennejohn
Fixes:		ff3a85d324 ("[lltable] Add per-family lltable ...")
Sponsored by:	The FreeBSD Foundation
2021-12-30 18:40:46 -05:00
Stefan Eßer
e2650af157 Make CPU_SET macros compliant with other implementations
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.

Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.

The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).

The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.

This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.

One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.

Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.

The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.

Reviewed by:	kib
MFC after:	2 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D33451
2021-12-30 12:20:32 +01:00
Alexander V. Chernikov
63f7f3921b routing: Add unified level-based logging support for the routing subsystem.
Summary: MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D33664
2021-12-29 21:30:18 +00:00
Alexander V. Chernikov
823a08d740 nhops: split nh_family into nh_upper_family and nh_neigh_family.
With IPv4 over IPv6 nexthops and IP->MPLS support, there is a need
 to distingush "upper" e.g. traffic family and "neighbor" e.g. LLE/gateway
 address family. Store them explicitly in the private part of the nexthop data.

While here, store nhop fibnum in nhop_prip datastructure to make it self-contained.

MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D33663
2021-12-29 21:03:19 +00:00
Alexander V. Chernikov
ff3a85d324 [lltable] Add per-family lltable getters.
Introduce a new function, lltable_get(), to retrieve lltable pointer
 for the specified interface and family.
Use it to avoid all-iftable list traversal when adding or deleting
 ARP/ND records.

Differential Revision: https://reviews.freebsd.org/D33660
MFC after:	2 weeks
2021-12-29 20:57:15 +00:00
Vincenzo Maffione
4561c4f0ca net: iflib: sync isc_capenable to if_capenable
On SIOCSIFCAP, some bits in ifp->if_capenable may be toggled.
When this happens, apply the same change to isc_capenable, which
is the iflib private copy of if_capenable (for a subset of the
IFCAP_* bits). In this way the iflib drivers can check the bits
using isc_capenable rather than if_capenable. This is convenient
because the latter access requires an additional indirection
through the ifp, and it is also less likely to be in cache.

PR:		260068
Reviewed by:	kbowling, gallatin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33156
2021-12-28 10:55:21 +00:00
Kristof Provost
e7809dceb5 pf: make if_pfsync.h self-contained
Reviewed by:	imp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33504
2021-12-17 12:38:35 +01:00
Kristof Provost
dc04fa802d pf: make if_pflog.h self-contained
Reviewed by:	imp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33503
2021-12-17 12:38:35 +01:00
Kristof Provost
e9167358e4 net: make if_bridgevar.h self-contained
Reviewed by:	imp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33502
2021-12-17 12:38:35 +01:00
Kristof Provost
f4096a7c8a net: make ethernet.h self-contained
Reviewed by:	imp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33501
2021-12-17 12:38:35 +01:00
Kristof Provost
c658610b92 pf: make pfvar.h self-contained
Ensure that the pfvar.h header can be included without including any
other headers.

Reviewed by:	imp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33499
2021-12-17 12:38:34 +01:00
Kristof Provost
b29c145cc1 if_stf: make if_stf.h self-contained
Ensure that the if_stf.h header can be included without including any
other headers.

Reviewed by:	imp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33498
2021-12-17 12:38:34 +01:00
Warner Losh
c6df6f5322 Create wrapper for Giant taken for newbus
Create a wrapper for newbus to take giant and for busses to take it too.
bus_topo_lock() should be called before interacting with newbus routines
and unlocked with bus_topo_unlock(). If you need the topology lock for
some reason, bus_topo_mtx() will provide that.

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D31831
2021-12-09 17:04:45 -07:00
Mateusz Guzik
e735fa3212 net/if.c: plug set-but-not-unused vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 20:39:40 +00:00
Gleb Smirnoff
7e0bba4d80 ifnet: make V_if_index static to if.c
This requires moving net.link.generic sysctl declaration from if_mib.c
to if.c.  Ideally if_mib.c needs just to be merged to if.c, but they
have different license texts.

Differential revision:	https://reviews.freebsd.org/D33263
2021-12-06 09:32:31 -08:00
Gleb Smirnoff
d74b7baeb0 ifnet_byindex() actually requires network epoch
Sweep over potentially unsafe calls to ifnet_byindex() and wrap them
in epoch.  Most of the code touched remains unsafe, as the returned
pointer is being used after epoch exit.  Mark that with a comment.

Validate the index argument inside the function, reducing argument
validation requirement from the callers and making V_if_index
private to if.c.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D33263
2021-12-06 09:32:31 -08:00
Gleb Smirnoff
7b40b00fad ifnet: merge ifindex_alloc(), ifnet_setbyindex(), if_grow() and call magic
Now it is possible to just merge all this complexity into single
linear function.  Note that IFNET_WLOCK() is a sleepable lock, so
we can M_WAITOK and epoch_wait_preempt().

Reviewed by:		melifaro, bz, kp
Differential revision:	https://reviews.freebsd.org/D33262
2021-12-06 09:32:31 -08:00
Gleb Smirnoff
6ff4cac2ee ifnet: initial if_grow() shall always succeed
So let's just call malloc() directly.  This also avoids hidden
doubling of default V_if_indexlim.

Reviewed by:		melifaro, bz, kp
Differential revision:	https://reviews.freebsd.org/D33261
2021-12-06 09:32:31 -08:00
Gleb Smirnoff
450394af27 ifnet: use ck_pr(3) store & load setting ifnet pointer in ifindex
The lockless access to the array is protected by the network epoch.

Reviewed by:		bz, kp
Differential revision:	https://reviews.freebsd.org/D33260
2021-12-06 09:32:30 -08:00
Gleb Smirnoff
8062e5759c ifnet: allocate index at the end of if_alloc_domain()
Now that if_alloc_domain() never fails and actually doesn't
expose ifnet to outside we can eliminate IFNET_HOLD and two
step index allocation.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33259
2021-12-06 09:32:30 -08:00
Gleb Smirnoff
ad2a0aec29 nhop: hash ifnet pointer instead of if_index
Yet another problem created by VIMAGE/if_vmove/epair design that
relocates ifnet between vnets and changes if_index.  Since if_index
changes, nhop hash values also changes, unlink_nhop() isn't able to
find entry in hash and leaks the nhop.  Since nhop references ifnet,
the latter is also leaked.  As result running network tests leaks
memory on every single test that creates vnet jail.

While here, rewrite whole hash_priv() to use static initializer,
per Alexander's suggestion.

Reviewed by:	melifaro
2021-12-04 10:05:46 -08:00
Kristof Provost
6d4baa0d01 if_pflog: fix packet length
There were two issues with the new pflog packet length.
The first is that the length is expected to be a multiple of
sizeof(long), but we'd assumed it had to be a multiple of
sizeof(uint32_t).

The second is that there's some broken software out there (such as
Wireshark) that makes incorrect assumptions about the amount of padding.
That is, Wireshark assumes there's always three bytes of padding, rather
than however much is needed to get to a multiple of sizeof(long).

Fix this by adding extra padding, and a fake field to maintain
Wireshark's assumption.

Reported by:	Ozkan KIRIK <ozkan.kirik@gmail.com>
Tested by:	Ozkan KIRIK <ozkan.kirik@gmail.com>
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33236
2021-12-04 08:42:55 +01:00
Cy Schubert
db0ac6ded6 Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
2021-12-02 14:45:04 -08:00