4819 Commits

Author SHA1 Message Date
Alexander V. Chernikov
0b79b007eb [lltable] Restructure nd6 code.
Factor out lltable locking logic from lltable_try_set_entry_addr()
 into a separate lltable_acquire_wlock(), so the latter can be used
 in other parts of the code w/o duplication.

Create nd6_try_set_entry_addr() to avoid code duplication in nd6.c
 and nd6_nbr.c.

Move lle creation logic from nd6_resolve_slow() into a separate
 nd6_get_llentry() to simplify the former.

These changes serve as a pre-requisite for implementing
 RFC8950 (IPv4 prefixes with IPv6 nexthops).

Differential Revision: https://reviews.freebsd.org/D31432
MFC after:	2 weeks
2021-08-07 09:59:11 +00:00
Alexander V. Chernikov
f3a3b06121 [lltable] Unify datapath feedback mechamism.
Use newly-create llentry_request_feedback(),
 llentry_mark_used() and llentry_get_hittime() to
 request datapatch usage check and fetch the results
 in the same fashion both in IPv4 and IPv6.

While here, simplify llentry_provide_feedback() wrapper
 by eliminating 1 condition check.

MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D31390
2021-08-04 22:52:43 +00:00
Alexander V. Chernikov
5b42b494d5 Fix typo in rib_unsibscribe<_locked>().
Submitted by:	Zhenlei Huang<zlei.huang at gmail.com>
Differential Revision: https://reviews.freebsd.org/D31356
2021-08-01 13:29:52 +00:00
Alexander V. Chernikov
054948bd81 [multipath][nhops] Fix random crashes with high route churn rate.
When certain multipath route begins flapping really fast, it may
 result in creating multiple identical nexthop groups. The code
 responsible for unlinking unused nexthop groups had an implicit
 assumption that there could be only one nexthop group for the
 same combination of nexthops with weights. This assumption resulted
 in always unlinking the first "identical" group, instead of the
 desired one. Such action, in turn, produced a used-but-unlinked
 nhg along with freed-and-linked nhg, ending up in random crashes.

Similarly, it is possible that multiple identical nexthops gets
 created in the case of high route churn, resulting in the same
 problem when deleting one of such nexthops.

Fix by matching the nexthop/nexhop group pointer when deleting the item.

Reported by:	avg
MFC after:	1 week
2021-08-01 10:07:37 +00:00
Kristof Provost
b69019c14c pf: remove DIOCGETSTATESNV
While nvlists are very useful in maximising flexibility for future
extensions their performance is simply unacceptably bad for the
getstates feature, where we can easily want to export a million states
or more.

The DIOCGETSTATESNV call has been MFCd, but has not hit a release on any
branch, so we can still remove it everywhere.

Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31099
2021-07-30 11:45:28 +02:00
Bryan Drewery
7cbf1de38e debugnet: Fix false-positive assertions for dp_state
debugnet_handle_arp:
  An assertion is present to ensure the pcb is only modified when the state is
  DN_STATE_INIT. Because debugnet_arp_gw() is asynchronous it is possible for
  ARP replies to come in after the gateway address is known and the state
  already changed.

debugnet_handle_ip:
  Similarly it is possible for packets to come in, from the expected
  server, during the gateway mac discovery phase.  This can happen from
  testing disconnects / reconnects in quick succession.  This later
  causes some acks to be sent back but hit an assertion because the
  state is wrong.

Reviewed by:	cem, debugnet_handle_arp: markj, vangyzen
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D31327
2021-07-28 16:34:14 -07:00
Kristof Provost
01ad0c0079 net: disallow MTU changes on bridge member interfaces
if_bridge member interfaces should always have the same MTU as the
bridge itself, so disallow MTU changes on interfaces that are part of an
if_bridge.

Reviewed by:	donner
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31304
2021-07-28 22:03:30 +02:00
Kristof Provost
3330649382 if_bridge: allow MTU changes
if_bridge used to only allow MTU changes if the new MTU matched that of
all member interfaces. This doesn't really make much sense, in that we
really shouldn't be allowed to change the MTU of bridge member in the
first place.

Instead we now change the MTU of all member interfaces. If one fails we
revert all interfaces back to the original MTU.

We do not address the issue where bridge member interface MTUs can be
changed here.

Reviewed by:	donner
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31288
2021-07-28 22:01:12 +02:00
Roy Marples
7045b1603b socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by:	philip (network), kbowling (transport), gbe (manpages)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26652
2021-07-28 09:35:09 -07:00
Kristof Provost
9ef8cd0b79 vlan: deduplicate bpf_setpcp() and pf_ieee8021q_setpcp()
These two fuctions were identical, so move them into the common
vlan_set_pcp() function, exposed in the if_vlan_var.h header.

Reviewed by:	donner
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31275
2021-07-26 23:13:31 +02:00
Luiz Otavio O Souza
1e7fe2fbb9 bpf: Add an ioctl to set the VLAN Priority on packets sent by bpf
This allows the use of VLAN PCP in dhclient, which is required for
certain ISPs (such as Orange.fr).

Reviewed by:	bcr (man page)
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31263
2021-07-26 23:13:31 +02:00
Mateusz Guzik
02cf67ccf6 pf: switch rule counters to pf_counter_u64
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-25 10:22:17 +02:00
Mateusz Guzik
d40d4b3ed7 pf: switch kif counters to pf_counter_u64
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-25 10:22:17 +02:00
Mateusz Guzik
fc4c42ce0b pf: switch pf_status.fcounters to pf_counter_u64
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-25 10:22:16 +02:00
Mateusz Guzik
defdcdd564 pf: add hybrid 32- an 64- bit counters
Numerous counters got migrated from straight uint64_t to the counter(9)
API. Unfortunately the implementation comes with a significiant
performance hit on some platforms and cannot be easily fixed.

Work around the problem by implementing a pf-specific variant.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-25 10:22:16 +02:00
Mateusz Guzik
d9cc6ea270 pf: hide struct pf_kstatus behind ifdef _KERNEL
Reviewed by:    kp
Sponsored by:   Rubicon Communications, LLC ("Netgate")
2021-07-23 17:34:43 +00:00
Mark Johnston
0dcef81de9 Add required sysctl name length checks to various handlers
Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-07-23 10:47:13 -04:00
Kyle Evans
51221b68fb tuntap: clean up cc --analyze
One complaint of a dead-store, smack it with a __diagused.
2021-07-21 19:14:43 -05:00
Kristof Provost
32271c4d38 pf: clean up syncookie callout on vnet shutdown
Ensure that we cancel any outstanding callouts for syncookies when we
terminate the vnet.

MFC after:	1 week
Sponsored by:	Modirum MDPay
2021-07-20 21:13:25 +02:00
Mateusz Guzik
907257d696 pf: embed a pointer to the lock in struct pf_kstate
This shaves calculation which in particular helps on arm.

Note using the & hack instead would still be more work.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-20 16:11:31 +00:00
Kristof Provost
231e83d342 pf: syncookie ioctl interface
Kernel side implementation to allow switching between on and off modes,
and allow this configuration to be retrieved.

MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31139
2021-07-20 10:36:13 +02:00
Kristof Provost
8e1864ed07 pf: syncookie support
Import OpenBSD's syncookie support for pf. This feature help pf resist
TCP SYN floods by only creating states once the remote host completes
the TCP handshake rather than when the initial SYN packet is received.

This is accomplished by using the initial sequence numbers to encode a
cookie (hence the name) in the SYN+ACK response and verifying this on
receipt of the client ACK.

Reviewed by:	kbowling
Obtained from:	OpenBSD
MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31138
2021-07-20 10:36:13 +02:00
Mateusz Guzik
9009d36afd pf: shrink struct pf_kstate
Makes room for a pointer.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-19 14:54:49 +02:00
Mateusz Guzik
f9aa757d8d pf: add a comment to pf_kstate concerning compat with pf_state_cmp
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-19 14:54:49 +02:00
Kristof Provost
ef950daa35 pf: match keyword support
Support the 'match' keyword.
Note that support is limited to adding queuing information, so without
ALTQ support in the kernel setting match rules is pointless.

For the avoidance of doubt: this is NOT full support for the match
keyword as found in OpenBSD's pf. That could potentially be built on top
of this, but this commit is NOT that.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31115
2021-07-17 12:01:08 +02:00
Kristof Provost
c6bf20a2a4 pf: add DIOCGETSTATESV2
Add a new version of the DIOCGETSTATES call, which extends the struct to
include the original interface information.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31097
2021-07-09 10:29:53 +02:00
Mateusz Guzik
19d6e29b87 pf: add pf_find_state_all_exists
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-08 14:00:55 +00:00
Kristof Provost
211cddf9e3 pf: rename pf_state to pf_kstate
Indicate that this is a kernel-only structure, and make it easier to
distinguish from others used to communicate with userspace.

Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31096
2021-07-08 10:31:43 +02:00
Mateusz Guzik
a56888534d iflib: use m_gethdr_raw
Reviewed by:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31081
2021-07-07 11:05:46 +00:00
Mateusz Guzik
f649cff587 pf: padalign global locks found in pf.c
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-05 09:56:54 +00:00
Mateusz Guzik
dc1ab04e4c pf: allow table stats clearing and reading with ruleset rlock
Instead serialize against these operations with a dedicated lock.

Prior to the change, When pushing 17 mln pps of traffic, calling
DIOCRGETTSTATS in a loop would restrict throughput to about 7 mln.  With
the change there is no slowdown.

Reviewed by:	kp (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-05 10:42:01 +02:00
Mateusz Guzik
f92c21a28c pf: depessimize table handling
Creating tables and zeroing their counters induces excessive IPIs (14
per table), which in turns kills single- and multi-threaded performance.

Work around the problem by extending per-CPU counters with a general
counter populated on "zeroing" requests -- it stores the currently found
sum. Then requests to report the current value are the sum of per-CPU
counters subtracted by the saved value.

Sample timings when loading a config with 100k tables on a 104-way box:

stock:

pfctl -f tables100000.conf  0.39s user 69.37s system 99% cpu 1:09.76 total
pfctl -f tables100000.conf  0.40s user 68.14s system 99% cpu 1:08.54 total

patched:

pfctl -f tables100000.conf  0.35s user 6.41s system 99% cpu 6.771 total
pfctl -f tables100000.conf  0.48s user 6.47s system 99% cpu 6.949 total

Reviewed by:	kp (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-05 10:42:01 +02:00
Mateusz Guzik
bad5f0b6c2 iflib: switch bare zone_mbuf use to m_free_raw
Reviewed by:	kbowling
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30961
2021-07-02 08:30:22 +00:00
Mateusz Guzik
55cc305dfc pf: revert: Use counter(9) for pf_state byte/packet tracking
stats are not shared and consequently per-CPU counters only waste
memory.

No slowdown was measured when passing over 20M pps.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-29 07:24:53 +00:00
Mateusz Guzik
803dfe3da0 pf: deduplicate V_pf_state_z handling with pfsync
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-29 07:24:53 +00:00
Mateusz Guzik
e6dd0e2e8d pf: assert that sizeof(struct pf_state) <= 312
To prevent accidentally going over a threshold which makes UMA fit only
12 objects per page instead of 13.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-28 15:49:20 +00:00
Mateusz Guzik
d09388d013 pf: add pf_release_staten and use it in pf_unlink_state
Saves one atomic op.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-28 15:49:20 +00:00
Marcin Wojtas
58632fa7a3 iflib: Add a new quirk
ENETC NIC found in LS1028A has a bug where clearing TX pidx/cidx
causes the ring to hang after being re-enabled.
Add a new flag, if set iflib will preserve the indices during restart.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: gallatin, erj
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30728
2021-06-24 13:00:56 +02:00
Florian Florensa
f13da24715 net/bpf: Fix writing of buffer bigger than PAGESIZE
When allocating the mbuf we used m_get2 which fails
if len is superior to MJUMPAGESIZE, if its the case,
use m_getjcl instead.

Reviewed by:	kp@
PR:		205164
Pull Request:	https://github.com/freebsd/freebsd-src/pull/131
2021-06-23 10:39:18 -06:00
Rozhuk Ivan
a75819461e devctl: add ADDR_ADD and ADDR_DEL devctl event for IFNET
Add devd event on network iface address add/remove.  Can be used to
automate actions on any address change.

Reviewed by:		imp@ (and minor style tweaks)
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30840
2021-06-23 10:26:56 -06:00
Rozhuk Ivan
4fb3e0bb94 devctl: add RENAME devctl event for IFNET
Add devd event on network iface rename.

Reviewed by:		imp@,asomers@
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30839
2021-06-23 10:20:58 -06:00
George V. Neville-Neil
c6b2d024d7 Retore the vnet before returning an error.
Obtained from:	Kanndula, Dheeraj <Dheeraj.Kandula@netapp.com>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D30741
2021-06-21 10:46:20 -04:00
Kristof Provost
d38630f619 pf: store L4 headers in pf_pdesc
Rather than pointers to the headers store full copies. This brings us
slightly closer to what OpenBSD does, and also makes more sense than
storing pointers to stack variable copies of the headers.

Reviewed by:	donner, scottl
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30719
2021-06-14 14:22:06 +02:00
Bjoern A. Zeeb
a3c2c06bc9 Make LINT NOINET and NOIP kernel builds warning free.
Apply #ifdef INET or #if defined(INET6) || defined(INET) to make
universe NOINET and NOIP LINT kernels warning free as well again.
2021-06-06 14:03:06 +00:00
Kyle Evans
2d741f33bd kern: ether_gen_addr: randomize on default hostuuid, too
Currently, this will still hash the default (all zero) hostuuid and
potentially arrive at a MAC address that has a high chance of collision
if another interface of the same name appears in the same broadcast
domain on another host without a hostuuid, e.g., some virtual machine
setups.

Instead of using the default hostuuid, just treat it as a failure and
generate a random LA unicast MAC address.

Reviewed by:	bz, gbe, imp, kbowling, kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29788
2021-06-01 22:59:21 -05:00
Kristof Provost
ec7b47fc81 pf: Move provider declaration to pf.h
This simplifies life a bit, by not requiring us to repease the
declaration for every file where we want static probe points.

It also makes the gcc6 build happy.
2021-06-01 09:02:05 +02:00
Kristof Provost
d0fdf2b28f pf: Track the original kif for floating states
Track (and display) the interface that created a state, even if it's a
floating state (and thus uses virtual interface 'all').

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30245
2021-05-20 12:49:27 +02:00
Kristof Provost
0592a4c83d pf: Add DIOCGETSTATESNV
Add DIOCGETSTATESNV, an nvlist-based alternative to DIOCGETSTATES.

MFC after:      1 week
Sponsored by:   Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30243
2021-05-20 12:49:27 +02:00
Kristof Provost
1732afaa0d pf: Add DIOCGETSTATENV
Add DIOCGETSTATENV, an nvlist-based alternative to DIOCGETSTATE.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30242
2021-05-20 12:49:26 +02:00
Alexander V. Chernikov
76cfc6fa0d Fix a use after free in update_rtm_from_rc().
update_rtm_from_rc() calls update_rtm_from_info() internally.
The latter one may update provided prtm pointer with a new rtm.
Reassign rtm from prtm afeter calling update_rtm_from_info() to
 avoid touching the freed rtm.

PR:		255871
Submitted by:	lylgood@foxmail.com
MFC after:	3 days
2021-05-14 16:06:41 +00:00