Commit Graph

4639 Commits

Author SHA1 Message Date
Alexander V. Chernikov
33cb3cb2e3 Fix rib generation count for fib algo.
Currently, PCB caching mechanism relies on the rib generation
 counter (rnh_gen) to invalidate cached nhops/LLE entries.

With certain fib algorithms, it is now possible that the
 datapath lookup state applies RIB changes with some delay.
In that scenario, PCB cache will invalidate on the RIB change,
 but the new lookup may result in the same nexthop being returned.
When fib algo finally gets in sync with the RIB changes, PCB cache
 will not receive any notification and will end up caching the stale data.

To fix this, introduce additional counter, rnh_gen_rib, which is used
 only when FIB_ALGO is enabled.
This counter is incremented by the control plane. Each time when fib algo
 synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value.

Differential Revision: https://reviews.freebsd.org/D29812
Reviewed by:	donner
MFC after:	2 weeks
2021-04-20 22:02:41 +00:00
Alexander V. Chernikov
b31fbebeb3 Relax rtsock message restrictions.
Address multiple issues with strict rtsock message validation.

D28668 "normalisation" approach was based on the assumption that
 we always have at least "standard" sockaddr len.
It turned out to be false - certain older applications like quagga
 or routed abuse sin[6]_len field and set it to the offset to the
 first fully-zero bit in the mask. It is impossible to normalise
 such sockaddrs without reallocation.

With that in mind, change the approach to use a distinct memory
 buffer for the altered sockaddrs. This allows supporting the older
 software while maintaining the guarantee on the "standard" sockaddrs.

PR:	255273,255089
Differential Revision:	https://reviews.freebsd.org/D29826
MFC after:	3 days
2021-04-20 21:34:19 +00:00
Alexander V. Chernikov
758c9d54d4 Improve error reporting in rtsock.c
MFC after:	3 days
2021-04-19 20:36:41 +00:00
Kristof Provost
42ec75f83a pf: Optionally attempt to preserve rule counter values across ruleset updates
Usually rule counters are reset to zero on every update of the ruleset.
With keepcounters set pf will attempt to find matching rules between old
and new rulesets and preserve the rule counters.

MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29780
2021-04-19 14:31:47 +02:00
Kristof Provost
4f1f67e888 pf: PFRULE_REFS should not be user-visible
Split the PFRULE_REFS flag from the rule_flag field. PFRULE_REFS is a
kernel-internal flag and should not be exposed to or read from
userspace.

MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29778
2021-04-19 14:31:47 +02:00
Jonah Caplan
0e4025bffa bridgestp: validate timer values in config BPDU
IEEE Std 802.1D-2004 Section 17.14 defines permitted ranges for timers.
Incoming BPDU messages should be checked against the permitted ranges.
The rest of 17.14 appears to be enforced already.

PR:		254924
Reviewed by:	kp, donner
Differential Revision:	https://reviews.freebsd.org/D29782
2021-04-19 12:09:18 +02:00
Alexander V. Chernikov
0abb6ff590 fib algo: do not reallocate datapath index for datapath ptr update.
Fib algo uses a per-family array indexed by the fibnum to store
 lookup function pointers and per-fib data.

Each algorithm rebuild currently requires re-allocating this array
 to support atomic change of two pointers.

As in reality most of the changes actually involve changing only
 data pointer, add a shortcut performing in-flight pointer update.

MFC after:	2 weeks
2021-04-18 16:12:13 +01:00
Alexander V. Chernikov
e2f79d9e51 Fib algo: extend KPI by allowing algo to set datapath pointers.
Some algorithms may require updating datapath and control plane
 algo pointers after the (batched) updates.

Export fib_set_datapath_ptr() to allow setting the new datapath
 function or data pointer from the algo.
Add fib_set_algo_ptr() to allow updating algo control plane
 pointer from the algo.
Add fib_epoch_call() epoch(9) wrapper to simplify freeing old
 datapath state.

Reviewed by:		zec
Differential Revision: https://reviews.freebsd.org/D29799
MFC after:		1 week
2021-04-18 16:12:12 +01:00
Alexander V. Chernikov
6b8ef0d428 Add batched update support for the fib algo.
Initial fib algo implementation was build on a very simple set of
 principles w.r.t updates:

1) algorithm is ether able to apply the change synchronously (DIR24-8)
 or requires full rebuild (bsearch, lradix).
2) framework falls back to rebuild on every error (memory allocation,
 nhg limit, other internal algo errors, etc).

This changes brings the new "intermediate" concept - batched updates.
Algotirhm can indicate that the particular update has to be handled in
 batched fashion (FLM_BATCH).
The framework will write this update and other updates to the temporary
 buffer instead of pushing them to the algo callback.
Depending on the update rate, the framework will batch 50..1024 ms of updates
 and submit them to a different algo callback.

This functionality is handy for the slow-to-rebuild algorithms like DXR.

Differential Revision:	https://reviews.freebsd.org/D29588
Reviewed by:	zec
MFC after:	2 weeks
2021-04-14 23:54:11 +01:00
Tai-hwa Liang
d9b61e7153 if_firewire: fixing panic upon packet reception for VNET build
netisr_dispatch_src() needs valid VNET pointer or firewire_input() will panic
when receiving a packet.

Reviewed by:	glebius
MFC after:	2 weeks
2021-04-13 22:59:58 +00:00
Kurosawa Takahiro
2aa21096c7 pf: Implement the NAT source port selection of MAP-E Customer Edge
MAP-E (RFC 7597) requires special care for selecting source ports
in NAT operation on the Customer Edge because a part of bits of the port
numbers are used by the Border Relay to distinguish another side of the
IPv4-over-IPv6 tunnel.

PR:		254577
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D29468
2021-04-13 10:53:18 +02:00
Alexander V. Chernikov
afbb64f1d8 Fix vlan creation for the older ifconfig(8) binaries.
Reported by:	allanjude
MFC after:	immediately
2021-04-11 18:13:09 +01:00
Alexander V. Chernikov
7f5f3fcc32 Fix direct route installation with net/bird.
Slighly relax the gateway validation rules imposed by the
 2fe5a79425, by requiring only first 8 bytes (everyhing
 before sdl_data to be present in the AF_LINK gateway.

Reported by:	olivier
2021-04-10 16:31:16 +01:00
Alexander V. Chernikov
63dceebe68 Appease -Wsign-compare in radix.c
Differential Revision:	https://reviews.freebsd.org/D29661
Submitted by:	zec
MFC after	2 weeks
2021-04-10 13:48:25 +00:00
Alexander V. Chernikov
caf2f62765 Allow to specify debugnet fib in sysctl/tunable.
Differential Revision:	https://reviews.freebsd.org/D29593
Reviewed by:		donner
MFC after:		2 weeks
2021-04-10 13:47:49 +00:00
Kristof Provost
d710367d11 pf: Implement nvlist variant of DIOCGETRULE
MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29559
2021-04-10 11:16:01 +02:00
Kristof Provost
5c62eded5a pf: Introduce nvlist variant of DIOCADDRULE
This will make future extensions of the API much easier.
The intent is to remove support for DIOCADDRULE in FreeBSD 14.

Reviewed by:	markj (previous version), glebius (previous version)
MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29557
2021-04-10 11:16:00 +02:00
Alexander V. Chernikov
ee2cf2b360 Implement better rebuild-delay fib algo policy.
The intent is to better handle time intervals with large amount of RIB
updates (e.g. BGP peer going up or down), while still keeping low sync
delay for the rest scenarios.

The implementation is the following: updates are bucketed into the
buckets of size 50ms. If the number of updates within a current bucket
 exceeds the threshold of 500 routes/sec (e.g. 10 updates per bucket
interval), the update is delayed for another 50ms. This can be repeated
 until the maximum update delay (1 sec) is reached.

All 3 variables are runtime tunables:

* net.route.algo.fib_max_sync_delay_ms: 1000
* net.route.algo.bucket_change_threshold_rate: 500
* net.route.algo.bucket_time_ms: 50

Differential Review:	https://reviews.freebsd.org/D29588
MFC after:		2 weeks
2021-04-09 21:33:03 +01:00
Alexander V. Chernikov
9e5243d7b6 Enforce check for using the return result for ifa?_try_ref().
Suggested by:	hps
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29504
2021-04-05 03:35:19 +01:00
Kristof Provost
4967f672ef pf: Remove unused variable rt_listid from struct pf_krule
Reviewed by:	donner
MFC after:	4 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29639
2021-04-08 13:24:35 +02:00
Mark Johnston
274579831b capsicum: Limit socket operations in capability mode
Capsicum did not prevent certain privileged networking operations,
specifically creation of raw sockets and network configuration ioctls.
However, these facilities can be used to circumvent some of the
restrictions that capability mode is supposed to enforce.

Add capability mode checks to disallow network configuration ioctls and
creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET
internet sockets.

Reviewed by:	oshogbo
Discussed with:	emaste
Reported by:	manu
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29423
2021-04-07 14:32:56 -04:00
Vincenzo Maffione
361e950180 iflib: add support for netmap offsets
Follow-up change to a6d768d845.
This change adds iflib support for netmap offsets, enabling
applications to use offsets on any driver backed by iflib.
2021-04-05 07:54:47 +00:00
Vincenzo Maffione
9bad2638cc netmap: restore commit a56e6334d1
The fix in a56e6334d1
was accidentally reverted by commit 45c67e8f6b.
2021-04-02 10:45:47 +00:00
Vincenzo Maffione
45c67e8f6b netmap: several typo fixes
No functional changes intended.
2021-04-02 07:01:20 +00:00
Konstantin Belousov
baacf70137 vxlan: correct interface MTU when using hw offloads
Otherwise it breaks when offloading like checksum or TSO are used,
because second (encapsulated) ip_output() processing passes fragments of
the encapsulated packet down to the hardware interface.

Diagnosed by:	hselasky
Reviewed by:	np
Sponsored by:	Nvidia Networking / Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29501
2021-03-31 14:38:26 +03:00
Konstantin Belousov
e243367b64 mbuf: add a way to mark flowid as calculated from the internal headers
In some settings offload might calculate hash from decapsulated packet.
Reserve a bit in packet header rsstype to indicate that.

Add m_adj_decap() that acts similarly to m_adj, but also either clear
flowid if it is not marked as inner, or transfer it to the decapsulated
header, clearing inner indicator. It depends on the internals of m_adj()
that reuses the argument packet header for the result.

Use m_adj_decap() for decapsulating vxlan(4) and gif(4) input packets.

Reviewed by:	ae, hselasky, np
Sponsored by:	Nvidia Networking / Mellanox Technologies
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28773
2021-03-31 14:38:26 +03:00
Alexander V. Chernikov
0c2a0e0380 Fix typo in the 9fa8d1582b.
Reported by:	cy
2021-03-29 23:42:48 +00:00
Alexander V. Chernikov
9fa8d1582b Put bandaid for nhgrp_dump_sysctl() malloc KASSERT().
Recent rtsock changes widened epoch and covered nhgrp_dump_sysctl(),
  resulting in `netstat -4On` triggering with KASSERT.

MFC after:	1 day
2021-03-29 23:12:11 +00:00
Alexander V. Chernikov
0f30a36ded Rename variables inside nexhtop group consider_resize() code.
No functional changes.

MFC after: 3 days
2021-03-29 23:06:13 +00:00
Alexander V. Chernikov
9095dc7da4 Fix nexhtop group index array scaling.
The current code has the limit of 127 nexthop groups due to the
 wrongly-checked bitmask_copy() return value.

PR: 254303
Reported by:	Aleks <a.ivanov at veesp.com>
MFC after: 1 day
2021-03-29 23:00:17 +00:00
Vincenzo Maffione
660a47cb99 netmap: monitor: add a flag to distinguish packet direction
The netmap monitor intercepts any TX/RX packets on the monitored
port. However, before this change there was no way to tell
whether an intercepted packet was being transmitted or received
on the monitored port.
A TXMON flag in the netmap slot has been added for this purpose.
2021-03-29 16:32:54 +00:00
Vincenzo Maffione
a6d768d845 netmap: add kernel support for the "offsets" feature
This feature enables applications to ask netmap to transmit or
receive packets starting at a user-specified offset from the
beginning of the netmap buffer. This is meant to ease those
packet manipulation operations such as pushing or popping packet
headers, that may be useful to implement software switches,
routers and other packet processors.
To use the feature, drivers (e.g., iflib, vtnet, etc.) must have
explicit support. This change does not add support for any driver,
but introduces the necessary kernel changes. However, offsets support
is already included for VALE ports and pipes.
2021-03-29 16:29:01 +00:00
you@x
21d0c01226 netmap: iflib: add nm_config callback
This per-driver callback is invoked by netmap when it wants
to align the number of TX/RX netmap rings and/or the number of
TX/RX netmap slots to the actual state configured in the hardware.
The alignment happens when netmap mode is switched on (with no
active netmap file descriptors for that netmap port), or when
collecting netmap port information.

MFC after:	1 week
2021-03-29 09:31:18 +00:00
Alexander V. Chernikov
6f43c72b47 Zero struct weightened_nhop fields in nhgrp_get_addition_group().
`struct weightened_nhop` has spare 32bit between the fields due to
 the alignment (on amd64).
Not zeroing these spare bits results in duplicating nhop groups
 in the kernel due to the way how comparison works.

MFC after:	1 day
2021-03-20 08:26:03 +00:00
Alexander V. Chernikov
24cd2796cf Fix !VNET build broken by 66f138563b. 2021-03-25 00:31:08 +00:00
Alexander V. Chernikov
66f138563b Plug nexthop group refcount leak.
In case with batch route delete via rib_walk_del(), when
 some paths from the multipath route gets deleted, old
 multipath group were not freed.

PR:    254496
Reported by:   Zhenlei Huang <zlei.huang@gmail.com>
MFC after:     1 day
2021-03-24 23:52:18 +00:00
Alexander V. Chernikov
c00e2f573b Fix build for non-vnet non-multipath kernels broken by
a0308e48ec.
2021-03-23 23:35:23 +00:00
Alexander V. Chernikov
a0308e48ec Fix panic when destroying interface with ECMP routes.
Reported by:	Zhenlei Huang <zlei.huang at gmail.com>
PR:		254496
MFC after:	immediately
2021-03-23 22:03:20 +00:00
Adrian Chadd
25bfa44860 Add device and ifnet logging methods, similar to device_printf / if_printf
* device_printf() is effectively a printf
* if_printf() is effectively a LOG_INFO

This allows subsystems to log device/netif stuff using different log levels,
rather than having to invent their own way to prefix unit/netif  names.

Differential Revision: https://reviews.freebsd.org/D29320
Reviewed by: imp
2021-03-22 00:02:34 +00:00
Alexander V. Chernikov
2476178e6b Fix kassert panic when inserting multipath routes from multiple threads.
Reported by:	Marco Zec <zec at fer.hr>
MFC after:	immediately
2021-03-21 18:15:29 +00:00
Kyle Evans
f187d6dfbf base: remove if_wg(4) and associated utilities, manpage
After length decisions, we've decided that the if_wg(4) driver and
related work is not yet ready to live in the tree.  This driver has
larger security implications than many, and thus will be held to
more scrutiny than other drivers.

Please also see the related message sent to the freebsd-hackers@
and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on
2021/03/16, with the subject line "Removing WireGuard Support From Base"
for additional context.
2021-03-17 09:14:48 -05:00
Alexander V. Chernikov
e4ac3f7463 Fix fib algo rebuild delay calculation.
Submitted by:	Marco Zec <zec at fer.hr>
MFC after:	3 days
2021-03-15 21:09:07 +00:00
Kyle Evans
74ae3f3e33 if_wg: import latest fixup work from the wireguard-freebsd project
This is the culmination of about a week of work from three developers to
fix a number of functional and security issues.  This patch consists of
work done by the following folks:

- Jason A. Donenfeld <Jason@zx2c4.com>
- Matt Dunwoodie <ncon@noconroy.net>
- Kyle Evans <kevans@FreeBSD.org>

Notable changes include:
- Packets are now correctly staged for processing once the handshake has
  completed, resulting in less packet loss in the interim.
- Various race conditions have been resolved, particularly w.r.t. socket
  and packet lifetime (panics)
- Various tests have been added to assure correct functionality and
  tooling conformance
- Many security issues have been addressed
- if_wg now maintains jail-friendly semantics: sockets are created in
  the interface's home vnet so that it can act as the sole network
  connection for a jail
- if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0
- if_wg now exports via ioctl a format that is future proof and
  complete.  It is additionally supported by the upstream
  wireguard-tools (which we plan to merge in to base soon)
- if_wg now conforms to the WireGuard protocol and is more closely
  aligned with security auditing guidelines

Note that the driver has been rebased away from using iflib.  iflib
poses a number of challenges for a cloned device trying to operate in a
vnet that are non-trivial to solve and adds complexity to the
implementation for little gain.

The crypto implementation that was previously added to the tree was a
super complex integration of what previously appeared in an old out of
tree Linux module, which has been reduced to crypto.c containing simple
boring reference implementations.  This is part of a near-to-mid term
goal to work with FreeBSD kernel crypto folks and take advantage of or
improve accelerated crypto already offered elsewhere.

There's additional test suite effort underway out-of-tree taking
advantage of the aforementioned jail-friendly semantics to test a number
of real-world topologies, based on netns.sh.

Also note that this is still a work in progress; work going further will
be much smaller in nature.

MFC after:	1 month (maybe)
2021-03-14 23:52:04 -05:00
Gordon Bergling
5666643a95 Fix some common typos in comments
- occured -> occurred
- normaly -> normally
- controling -> controlling
- fileds -> fields
- insterted -> inserted
- outputing -> outputting

MFC after:	1 week
2021-03-13 18:26:15 +01:00
Kristof Provost
cecfaf9bed pf: Fully remove interrupt events on vnet cleanup
swi_remove() removes the software interrupt handler but does not remove
the associated interrupt event.
This is visible when creating and remove a vnet jail in `procstat -t
12`.

We can remove it manually with intr_event_destroy().

PR:		254171
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29211
2021-03-12 12:12:43 +01:00
Wei Hu
a491581f3f Hyper-V: hn: Enable vSwitch RSC support in hn netvsc driver
Receive Segment Coalescing (RSC) in the vSwitch is a feature available in
Windows Server 2019 hosts and later. It reduces the per packet processing
overhead by coalescing multiple TCP segments when possible. This happens
mostly when TCP traffics are among different guests on same host.
This patch adds netvsc driver support for this feature.

The patch also updates NVS version to 6.1 as needed for RSC
enablement.

MFC after:	2 weeks
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D29075
2021-03-12 04:35:16 +00:00
Kristof Provost
5e9dae8e14 pf: Factor out pf_krule_free()
Reviewed by:	melifaro@
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D29194
2021-03-11 10:39:43 +01:00
Alexander V. Chernikov
b1d63265ac Flush remaining routes from the routing table during VNET shutdown.
Summary:
This fixes rtentry leak for the cloned interfaces created inside the
 VNET.

PR:	253998
Reported by:	rashey at superbox.pl
MFC after:	3 days

Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
Thus, any route table operations are too late to schedule.
As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.

Test Plan:
```
set_skip:set_skip_group_lo  ->  passed  [0.053s]
tail -n 200 /var/log/messages | grep rtentry
```

Reviewers: #network, kp, bz

Reviewed By: kp

Subscribers: imp, ae

Differential Revision: https://reviews.freebsd.org/D29116
2021-03-10 21:10:14 +00:00
Kyle Evans
0dd691b412 iflib: allow clone detach if not yet init
If we hit an error during init, then we'll unwind our state and attempt
to detach the device -- don't block it.

This was discovered by creating a wg0 with missing parameters; said
failure ended up leaving this orphaned device in place and ended up
panicking the system upon enumeration of the dev.* sysctl space.

Reviewed by:	gallatin, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29145
2021-03-09 13:49:13 -06:00
Mark Johnston
ffe3def903 iflib: Make if_shared_ctx_t a pointer to const
This structure is shared among multiple instances of a driver, so we
should ensure that it doesn't somehow get treated as if there's a
separate instance per interface.  This is especially important for
software-only drivers like wg.

DEVICE_REGISTER() still returns a void * and so the per-driver sctx
structures are not yet defined with the const qualifier.

Reviewed by:	gallatin, erj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29102
2021-03-08 12:39:06 -05:00