4746 Commits

Author SHA1 Message Date
John Baldwin
c782ea8bb5 Add a switch structure for send tags.
Move the type and function pointers for operations on existing send
tags (modify, query, next, free) out of 'struct ifnet' and into a new
'struct if_snd_tag_sw'.  A pointer to this structure is added to the
generic part of send tags and is initialized by m_snd_tag_init()
(which now accepts a switch structure as a new argument in place of
the type).

Previously, device driver ifnet methods switched on the type to call
type-specific functions.  Now, those type-specific functions are saved
in the switch structure and invoked directly.  In addition, this more
gracefully permits multiple implementations of the same tag within a
driver.  In particular, NIC TLS for future Chelsio adapters will use a
different implementation than the existing NIC TLS support for T6
adapters.

Reviewed by:	gallatin, hselasky, kib (older version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31572
2021-09-14 11:43:41 -07:00
Mark Johnston
b1746faad6 debugnet: Include some required headers
Don't depend on pollution from net/vnet.h.

PR:		258496
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-14 11:02:45 -04:00
Kristof Provost
b64f7ce98f pf: qid and pqid can be uint16_t
tag2name() returns a uint16_t, so we don't need to use uint32_t for the
qid (or pqid). This reduces the size of struct pf_kstate slightly. That
in turn buys us space to add extra fields for dummynet later.

Happily these fields are not exposed to user space (there are user space
versions of them, but they can just stay uint32_t), so there's no ABI
breakage in modifying this.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31873
2021-09-10 17:07:57 +02:00
Mark Johnston
b1e6a792d6 net: Enter a net epoch around protocol if_up/down notifications
When traversing a list of interface addresses, we need to be in a net
epoch section, and protocol ctlinput routines need a stable reference to
the address.

Reported by:	syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com
Reviewed by:	kp, melifaro
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31889
2021-09-10 09:07:40 -04:00
Alexander V. Chernikov
4b631fc832 routing: fix source address selection rules for IPv4 over IPv6.
Current logic always selects an IFA of the same family from the
 outgoing interfaces. In IPv4 over IPv6 setup there can be just
 single non-127.0.0.1 ifa, attached to the loopback interface.

Create a separate rt_getifa_family() to handle entire ifa selection
 for the IPv4 over IPv6.

Differential Revision: https://reviews.freebsd.org/D31868
MFC after:	1 week
2021-09-07 21:41:05 +00:00
Kristof Provost
bb25e36e13 pf: remove unused function prototype
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-09-07 16:38:49 +02:00
Kristof Provost
312f5f8a4f altq: mark callouts as mpsafe
There's no reason to acquire the Giant lock while executing the ALTQ
callouts.

While here also remove a few backwards compatibility defines for long
obsolete FreeBSD versions.

Reviewed by:	mav
Suggested by:	mav
Differential Revision:	https://reviews.freebsd.org/D31835
2021-09-04 17:26:10 +02:00
Kristof Provost
4cab80a8df pf: Add counters for syncookies
Count when we send a syncookie, receive a valid syncookie or detect a
synflood.

Reviewed by:	kbowling
MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31713
2021-09-01 12:02:19 +02:00
Alexander V. Chernikov
0a3a377aee routing: Disallow zero nexthop weights in nexthop groups.
Adding such nexthops breaks calc_min_mpath_slots() assumptions,
 thus resulting in the incorrect nexthop group creation and
 eventually leading to panic.
Reported by:	avg
MFC after:	1 week
2021-09-01 07:16:24 +00:00
Alexander V. Chernikov
639d7abec6 routing: simplify malloc flags in alloc_nhgrp().
MFC after:	1 week
2021-08-31 08:14:16 +00:00
Alexander V. Chernikov
f84c30106e routing: Fix newly-added rt_get_inet[6]_parent() api.
Correctly handle the case when no default route is present.

Reported by:	Konrad <konrad.kreciwilk at korbank.pl>
2021-08-30 21:10:37 +00:00
Alexander V. Chernikov
d98954e229 routing: Bring back the ability to specify transmit interface via its name.
Some software references outgoing interfaces by specifying name instead of
 index.

Use rti_ifp from rt_addrinfo if provided instead of always using
 address interface when constructing nexthop.

PR: 		255678
Reported by:	martin.larsson2 at gmail.com
MFC after:	1 week
2021-08-29 20:05:14 +00:00
Kristof Provost
2b10cf85f8 pf: Introduce nvlist variant of DIOCGETSTATUS
Make it possible to extend the GETSTATUS call (e.g. when we want to add
new counters, such as for syncookie support) by introducing an
nvlist-based alternative.

MFC after:	1 week
Sponsored by:   Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31694
2021-08-29 14:59:04 +02:00
Luiz Otavio O Souza
eb680a63de if_bridge: add ALTQ support
Similar to the recent addition of ALTQ support to if_vlan.

Reviewed by:	donner
Obtained from:	pfsense
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31675
2021-08-26 11:23:44 +02:00
Luiz Otavio O Souza
2e5ff01d0a if_vlan: add the ALTQ support to if_vlan.
Inspired by the iflib implementation, allow ALTQ to be used with if_vlan
interfaces.

Reviewed by:	donner
Obtained from:	pfsense
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31647
2021-08-25 08:56:45 +02:00
Kristof Provost
159258afb5 altq: Fix panics on rmc_restart()
rmc_restart() is called from a timer, but can trigger traffic. This
means the curvnet context will not be set.
Use the vnet associated with the interface we're currently processing to
set it. We also have to enter net_epoch here, for the same reason.

Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31642
2021-08-23 21:35:41 +02:00
Zhenlei Huang
62e1a437f3 routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).
Implement kernel support for RFC 5549/8950.

* Relax control plane restrictions and allow specifying IPv6 gateways
 for IPv4 routes. This behavior is controlled by the
 net.route.rib_route_ipv6_nexthop sysctl (on by default).

* Always pass final destination in ro->ro_dst in ip_forward().

* Use ro->ro_dst to exract packet family inside if_output() routines.
 Consistently use RO_GET_FAMILY() macro to handle ro=NULL case.

* Pass extracted family to nd6_resolve() to get the LLE with proper encap.
 It leverages recent lltable changes committed in c541bd368f86.

Presence of the functionality can be checked using ipv4_rfc5549_support feature(3).
Example usage:
  route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0

Differential Revision: https://reviews.freebsd.org/D30398
MFC after:	2 weeks
2021-08-22 22:56:08 +00:00
Vincenzo Maffione
98399ab06f netmap: import changes from upstream
- make sure rings are disabled during resets
 - introduce netmap_update_hostrings_mode(), with support
   for multiple host rings
 - always initialize ni_bufs_head in netmap_if
      ni_bufs_head was not properly initialized when no external buffers were
      requestedx and contained the ni_bufs_head from the last request. This
      was causing spurious buffer frees when alternating between apps that
      used external buffers and apps that did not use them.
 - check na validitity under lock on detach
 - netmap_mem: fix leak on error path
 - nm_dispatch: fix compilation on Raspberry Pi

MFC after:	2 weeks
2021-08-22 09:31:05 +00:00
Alexander V. Chernikov
c541bd368f lltable: Add support for "child" LLEs holding encap for IPv4oIPv6 entries.
Currently we use pre-calculated headers inside LLE entries as prepend data
 for `if_output` functions. Using these headers allows saving some
 CPU cycles/memory accesses on the fast path.

However, this approach makes adding L2 header for IPv4 traffic with IPv6
 nexthops more complex, as it is not possible to store multiple
 pre-calculated headers inside lle. Additionally, the solution space is
 limited by the fact that PCB caching saves LLEs in addition to the nexthop.

Thus, add support for creating special "child" LLEs for the purpose of holding
 custom family encaps and store mbufs pending resolution. To simplify handling
 of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE.
 Such LLEs are not visible when iterating LLE table. Their lifecycle is bound
 to the "parent" LLE - it is not possible to delete "child" when parent is alive.
 Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state
 machine used by the standard LLEs.

nd6_lookup() and nd6_resolve() now accepts an additional argument, family,
 allowing to return such child LLEs. This change uses `LLE_SF()` macro which
 packs family and flags in a single int field. This is done to simplify merging
 back to stable/. Once this code lands, most of the cases will be converted to
 use a dedicated `family` parameter.

Differential Revision: https://reviews.freebsd.org/D31379
MFC after:	2 weeks
2021-08-21 17:34:35 +00:00
Luiz Otavio O Souza
c138424148 lagg: don't update link layer addresses on destroy
When the lagg is being destroyed it is not necessary update the
lladdr of all the lagg members every time we update the primary
interface.

Reviewed by:	scottl
Obtained from:	pfSense
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31586
2021-08-19 10:49:32 +02:00
Franco Fichtner
bb250fae9e gre: simplify RSS ifdefs
Use the early break to avoid else definitions. When RSS gains a
runtime option previous constructs would duplicate and convolute
the existing code.

While here init flowid and skip magic numbers and late default
assignment.

Reviewed by:	melifaro, kbowling
Obtained from:	OPNsense
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31584
2021-08-18 10:05:29 -07:00
Kristof Provost
a051ca72e2 Introduce m_get3()
Introduce m_get3() which is similar to m_get2(), but can allocate up to
MJUM16BYTES bytes (m_get2() can only allocate up to MJUMPAGESIZE).

This simplifies the bpf improvement in f13da24715.

Suggested by:	glebius
Differential Revision:	https://reviews.freebsd.org/D31455
2021-08-18 08:48:27 +02:00
Stephan de Wit
66fa12d8fb iflib: emulate counters in netmap mode
When iflib devices are in netmap mode the driver
counters are no longer updated making it look from
userspace tools that traffic has stopped.

Reported by:	Franco Fichtner <franco@opnsense.org>
Reviewed by:	vmaffione, iflib (erj, gallatin)
Obtained from:	OPNsense
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31550
2021-08-18 00:17:43 -07:00
Alexander V. Chernikov
36e15b717e routing: Fix crashes with dpdk_lpm[46] algo.
When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know
 the nexthop of the "parent" prefix to update its internal state.
The glue code, which utilises RIB as a backing route store, uses
 fib[46]_lookup_rt() for the prefix destination after its deletion
 to fetch the desired nexthop.
This approach does not work when deleting less-specific prefixes
 with most-specific ones are still present. For example, if
 10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting
 10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search
 result instead of 10.0.0.0/22. This, in turn, results in the failed
 datastructure update: part of the deleted /23 prefix will still
 contain the reference to an old nexthop. This leads to the
 use-after-free behaviour, ending with the eventual crashes.

Fix the logic flaw by properly fetching the prefix "parent" via
 newly-created rt_get_inet[6]_parent() helpers.

Differential Revision: https://reviews.freebsd.org/D31546
PR:	256882,256833
MFC after:	1 week
2021-08-17 20:46:22 +00:00
Mark Johnston
24fe461284 ether: Add a KMSAN check for transmitted frames
This helps ensure that outbound packet data is initialized per KMSAN.

Sponsored by:	The FreeBSD Foundation
2021-08-11 16:33:41 -04:00
Ed Maste
9feff969a0 Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).

Sponsored by:	The FreeBSD Foundation
2021-08-08 10:42:24 -04:00
Alexander V. Chernikov
9748eb7427 Simplify nhop operations in ip_output().
Consistently use `nh` instead of always dereferencing
 ro->ro_nh inside the if block.
Always use nexthop mtu, as it provides guarantee that mtu is accurate.
Pass `nh` pointer to rt_update_ro_flags() to allow upcoming uses
 of updating ro flags based on different nexthop.

Differential Revision: https://reviews.freebsd.org/D31451
Reviewed by:	kp
MFC after: 2 weeks
2021-08-08 09:19:27 +00:00
Alexander V. Chernikov
0b79b007eb [lltable] Restructure nd6 code.
Factor out lltable locking logic from lltable_try_set_entry_addr()
 into a separate lltable_acquire_wlock(), so the latter can be used
 in other parts of the code w/o duplication.

Create nd6_try_set_entry_addr() to avoid code duplication in nd6.c
 and nd6_nbr.c.

Move lle creation logic from nd6_resolve_slow() into a separate
 nd6_get_llentry() to simplify the former.

These changes serve as a pre-requisite for implementing
 RFC8950 (IPv4 prefixes with IPv6 nexthops).

Differential Revision: https://reviews.freebsd.org/D31432
MFC after:	2 weeks
2021-08-07 09:59:11 +00:00
Alexander V. Chernikov
f3a3b06121 [lltable] Unify datapath feedback mechamism.
Use newly-create llentry_request_feedback(),
 llentry_mark_used() and llentry_get_hittime() to
 request datapatch usage check and fetch the results
 in the same fashion both in IPv4 and IPv6.

While here, simplify llentry_provide_feedback() wrapper
 by eliminating 1 condition check.

MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D31390
2021-08-04 22:52:43 +00:00
Alexander V. Chernikov
5b42b494d5 Fix typo in rib_unsibscribe<_locked>().
Submitted by:	Zhenlei Huang<zlei.huang at gmail.com>
Differential Revision: https://reviews.freebsd.org/D31356
2021-08-01 13:29:52 +00:00
Alexander V. Chernikov
054948bd81 [multipath][nhops] Fix random crashes with high route churn rate.
When certain multipath route begins flapping really fast, it may
 result in creating multiple identical nexthop groups. The code
 responsible for unlinking unused nexthop groups had an implicit
 assumption that there could be only one nexthop group for the
 same combination of nexthops with weights. This assumption resulted
 in always unlinking the first "identical" group, instead of the
 desired one. Such action, in turn, produced a used-but-unlinked
 nhg along with freed-and-linked nhg, ending up in random crashes.

Similarly, it is possible that multiple identical nexthops gets
 created in the case of high route churn, resulting in the same
 problem when deleting one of such nexthops.

Fix by matching the nexthop/nexhop group pointer when deleting the item.

Reported by:	avg
MFC after:	1 week
2021-08-01 10:07:37 +00:00
Kristof Provost
b69019c14c pf: remove DIOCGETSTATESNV
While nvlists are very useful in maximising flexibility for future
extensions their performance is simply unacceptably bad for the
getstates feature, where we can easily want to export a million states
or more.

The DIOCGETSTATESNV call has been MFCd, but has not hit a release on any
branch, so we can still remove it everywhere.

Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31099
2021-07-30 11:45:28 +02:00
Bryan Drewery
7cbf1de38e debugnet: Fix false-positive assertions for dp_state
debugnet_handle_arp:
  An assertion is present to ensure the pcb is only modified when the state is
  DN_STATE_INIT. Because debugnet_arp_gw() is asynchronous it is possible for
  ARP replies to come in after the gateway address is known and the state
  already changed.

debugnet_handle_ip:
  Similarly it is possible for packets to come in, from the expected
  server, during the gateway mac discovery phase.  This can happen from
  testing disconnects / reconnects in quick succession.  This later
  causes some acks to be sent back but hit an assertion because the
  state is wrong.

Reviewed by:	cem, debugnet_handle_arp: markj, vangyzen
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D31327
2021-07-28 16:34:14 -07:00
Kristof Provost
01ad0c0079 net: disallow MTU changes on bridge member interfaces
if_bridge member interfaces should always have the same MTU as the
bridge itself, so disallow MTU changes on interfaces that are part of an
if_bridge.

Reviewed by:	donner
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31304
2021-07-28 22:03:30 +02:00
Kristof Provost
3330649382 if_bridge: allow MTU changes
if_bridge used to only allow MTU changes if the new MTU matched that of
all member interfaces. This doesn't really make much sense, in that we
really shouldn't be allowed to change the MTU of bridge member in the
first place.

Instead we now change the MTU of all member interfaces. If one fails we
revert all interfaces back to the original MTU.

We do not address the issue where bridge member interface MTUs can be
changed here.

Reviewed by:	donner
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31288
2021-07-28 22:01:12 +02:00
Roy Marples
7045b1603b socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by:	philip (network), kbowling (transport), gbe (manpages)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26652
2021-07-28 09:35:09 -07:00
Kristof Provost
9ef8cd0b79 vlan: deduplicate bpf_setpcp() and pf_ieee8021q_setpcp()
These two fuctions were identical, so move them into the common
vlan_set_pcp() function, exposed in the if_vlan_var.h header.

Reviewed by:	donner
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31275
2021-07-26 23:13:31 +02:00
Luiz Otavio O Souza
1e7fe2fbb9 bpf: Add an ioctl to set the VLAN Priority on packets sent by bpf
This allows the use of VLAN PCP in dhclient, which is required for
certain ISPs (such as Orange.fr).

Reviewed by:	bcr (man page)
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31263
2021-07-26 23:13:31 +02:00
Mateusz Guzik
02cf67ccf6 pf: switch rule counters to pf_counter_u64
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-25 10:22:17 +02:00
Mateusz Guzik
d40d4b3ed7 pf: switch kif counters to pf_counter_u64
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-25 10:22:17 +02:00
Mateusz Guzik
fc4c42ce0b pf: switch pf_status.fcounters to pf_counter_u64
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-25 10:22:16 +02:00
Mateusz Guzik
defdcdd564 pf: add hybrid 32- an 64- bit counters
Numerous counters got migrated from straight uint64_t to the counter(9)
API. Unfortunately the implementation comes with a significiant
performance hit on some platforms and cannot be easily fixed.

Work around the problem by implementing a pf-specific variant.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-25 10:22:16 +02:00
Mateusz Guzik
d9cc6ea270 pf: hide struct pf_kstatus behind ifdef _KERNEL
Reviewed by:    kp
Sponsored by:   Rubicon Communications, LLC ("Netgate")
2021-07-23 17:34:43 +00:00
Mark Johnston
0dcef81de9 Add required sysctl name length checks to various handlers
Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-07-23 10:47:13 -04:00
Kyle Evans
51221b68fb tuntap: clean up cc --analyze
One complaint of a dead-store, smack it with a __diagused.
2021-07-21 19:14:43 -05:00
Kristof Provost
32271c4d38 pf: clean up syncookie callout on vnet shutdown
Ensure that we cancel any outstanding callouts for syncookies when we
terminate the vnet.

MFC after:	1 week
Sponsored by:	Modirum MDPay
2021-07-20 21:13:25 +02:00
Mateusz Guzik
907257d696 pf: embed a pointer to the lock in struct pf_kstate
This shaves calculation which in particular helps on arm.

Note using the & hack instead would still be more work.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-20 16:11:31 +00:00
Kristof Provost
231e83d342 pf: syncookie ioctl interface
Kernel side implementation to allow switching between on and off modes,
and allow this configuration to be retrieved.

MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31139
2021-07-20 10:36:13 +02:00
Kristof Provost
8e1864ed07 pf: syncookie support
Import OpenBSD's syncookie support for pf. This feature help pf resist
TCP SYN floods by only creating states once the remote host completes
the TCP handshake rather than when the initial SYN packet is received.

This is accomplished by using the initial sequence numbers to encode a
cookie (hence the name) in the SYN+ACK response and verifying this on
receipt of the client ACK.

Reviewed by:	kbowling
Obtained from:	OpenBSD
MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31138
2021-07-20 10:36:13 +02:00
Mateusz Guzik
9009d36afd pf: shrink struct pf_kstate
Makes room for a pointer.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-07-19 14:54:49 +02:00