epairs currently shuttle all transmitted packets through a single global
taskqueue thread. To hand packets over to the taskqueue thread, each
epair maintains a pair of ring buffers and a lockless scheme for
notifying the thread of pending work. The implementation can lead to
lost wakeups, causing to-be-transmitted packets to end up stuck in the
queue.
Rather than extending the existing scheme, simply replace it with a
linked list protected by a mutex, and use the mutex to synchronize
wakeups of the taskqueue thread. This appears to give equivalent or
better throughput with >= 16 producer threads and eliminates the lost
wakeups.
Reviewed by: kp
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38843
The m_flags field of struct mbuf is 24 bits wide and so gets truncated
in a couple of places in the epair code. Instead of preserving the
entire flag set, just remember whether M_BCAST or M_MCAST is set.
MFC after: 1 week
Sponsored by: Klara, Inc.
Do not pass the pointer to our valid mbuf to pfil(9). Pass an
uninitialized one only. This was unsafe with the old KPI, too,
but for some reason didn't fail.
Fixes: caf32b260a
* Make nhop_set_blackhole() set all necessary properties for the
nexthop
* Make nexthops blackhole/reject based on the rtm_type netlink
property instead of using rtflags.
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after: 3 days
Don't define ovpn_find_peer_by_ip() if INET is not set, and do the same
for ovpn_find_peer_by_ip6() and INET6.
Reported by: mjg
Sponsored by: Rubicon Communications, LLC ("Netgate")
These two functions are intended to be used only when allocating or
destroying vnet instances.
No functional change intended.
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D37955
add_route_flags() uses `rt` prefix data to lookup the the current
rtentry from the routing table. Update rib_add_route_px() to
always pass rtentry regardless of the op_flags.
Reported by: Stefan Grundmann <sg2342@googlemail.com>
MFC after: 1 day
The 0b70e3e78b changed the original design of a single entry point
into pfil(9) chains providing separate functions for the filtering
points that always provide mbufs and know the direction of a flow.
The motivation was to reduce branching. The logical continuation
would be to do the same for the filtering points that always provide
a memory pointer and retire the single entry point.
o Hooks now provide two functions: one for mbufs and optional for
memory pointers.
o pfil_hook_args() has a new member and pfil_add_hook() has a
requirement to zero out uninitialized data. Bump PFIL_VERSION.
o As it was before, a hook function for a memory pointer may realloc
into an mbuf. Such mbuf would be returned via a pointer that must
be provided in argument.
o The only hook that supports memory pointers is ipfw:default-link.
It is rewritten to provide two functions.
o All remaining uses of pfil_run_hooks() are converted to
pfil_mem_in().
o Transparent union of pfil_packet_t and tricks to fix pointer
alignment are retired. Internal pfil_realloc() reduces down to
m_devget() and thus is retired, too.
Reviewed by: mjg, ocochard
Differential revision: https://reviews.freebsd.org/D37977
Summary:
Clean up style issues from IfAPI additions.
Casts to (struct ifnet *) made sense when `if_t` was a `void *`, but
since it's a `struct ifnet *` it no longer makes sense. Fix whitespace
errors, among others.
Reviewed by: kib, glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38499
Summary:
As a stopgap measure add basic accessors for the if_capabilities2 and
if_capenable2 members to further hide the ifnet details.
Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, kib
Differential Revision: https://reviews.freebsd.org/D38487
Summary:
Add the following accessors needed by infiniband drivers:
* if_getaddrlen()
* if_setbroadcastaddr()
* if_resolvemulti()
With these accessors, and additional changes on the drivers' side, an
amd64 kernel can be compiled with `struct ifnet` completely hidden.
Reviewed by: melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38488
Summary:
Add 2 new APIs for supporting recent mbuf changes:
* 36e0a362ac added the m_snd_tag_alloc() wrapper around
if_snd_tag_alloc(). Push this down to the ifnet level.
* 4d7a1361ef adds the m_rcvif_serialize()/m_rcvif_restore() KPIs to
serialize and restore an ifnet pointer. Add the necessary wrapper to
get the index generation for this.
Reviewed By: jhb
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38340
Summary:
Sometimes it's useful to iterate over all interfaces in the current
VNET, as the linuxulator does in several places.
Unlike other iterators in the IfAPI this propagates any error received
up to the caller, instead of returning a count.
Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38348
Summary:
The only user of the ALTQ_IS_ENABLED() in a driver checks against the
ifnet queue. Abstract that all out and present the interface to check
if ALTQ is enabled on the interface.
Sponsored by: Juniper Networks, Inc.
Reviewed By: glebius
Differential Revision: https://reviews.freebsd.org/D38204
Summary:
Firewire is the only device driver that accesses the l2com member, all
other accesses are handled within the netstack itself.
Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38203
Summary:
* if_setreassignfn for wireguard.
* if_getinputfn() and if_getstartfn() for various drivers. Use the
function descriptor typedefs for these and the setters.
* vlantrunk accessor. This is used by VLAN_CAPABILITIES() used by
several drivers, as well as directly by mxge(4).
* if_pcp member accessor, used by cxgbe.
* accessors for netmap adapter.
Sponsored by: Juniper Networks, Inc.
Reviewed By: glebius
Differential Revision: https://reviews.freebsd.org/D38202
Summary:
Keep TOEDEV() macro for backwards compatibility, and add a SETTOEDEV()
macro to complement with the new accessors.
Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D38199
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header. <net/if_var.h> will stop including the
header in the future.
Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
Summary:
Port the MAC modules to use the IfAPI APIs as part of this.
Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D38197
Summary:
Hide more netstack by making the BPF_TAP macros real functions in the
netstack. "struct ifnet" is used in the header instead of "if_t" to
keep header pollution down.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38103
When ALTQ is enabled ifnet accessors already need to be called, largely
defeating the purpose of the inline. To that extent, make the ALTQ form
functions in the netstack proper, and make them always available.
Reviewed By: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38104
Ever since gtaskqueue_drain() was added to iflib_stop(), a kernel panic
occurs when the ice(4) driver is in recovery mode. Queues are not
initialized in this mode, so gt_taskqueue is not initialized, and
gtaskqueue_drain() will panic.
Fix this by only doing a drain if an RX queue's gt_taskqueue is
initialized.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: erj@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D37892
As part of the effort to hide the internals of the ifnet struct, convert
the DEBUGNET_SET() macro to use an accessor instead of directly touching
the methods member.
Reviewed by: glebius (older version)
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38105
Hide the ifnet structure definition, no user serviceable parts inside,
it's a netstack implementation detail. Include it temporarily in
<net/if_var.h> until all drivers are updated to use the accessors
exclusively.
Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38046
Some drivers, like iflib drivers, take a 'context' argument instead of a
ifnet argument, as a single interface may have multiple contexts.
Follow this scheme by passing the context argument down. Most drivers
will likely pass 'ifp' as the context.
Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38102
Currently `close(2)` erroneously return `EOPNOTSUPP` for `PF_ROUTE` sockets.
It happened after making rtsock socket implementation self-contained (
36b10ac2cd ). Rtsock code marks socket as connected in `rts_attach()`.
`soclose()` tries to disconnect such socket using `.pr_disconnect` callback.
Rtsock does not implement this callback, resulting in the default method being
substituted. This default method returns `ENOTSUPP`, failing `soclose()` logic.
This diff restores the previous behaviour by adding custom `pr_disconnect()`
returning `ENOTCONN`.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D38059
Basic scenario: we have a closed connection (In TCPS_FIN_WAIT_2), and
get a new connection (i.e. SYN) re-using the tuple.
Without syncookies we look at the SYN, and completely unlink the old,
closed state on the SYN.
With syncookies we send a generated SYN|ACK back, and drop the SYN,
never looking at the state table.
So when the ACK (i.e. the third step in the three way handshake for
connection setup) turns up, we’ve not actually removed the old state, so
we find it, and don’t do the syncookie dance, or allow the new
connection to get set up.
Explicitly check for this in pf_test_state_tcp(). If we find a state in
TCPS_FIN_WAIT_2 and the syncookie is valid we delete the existing state
so we can set up the new state.
Note that when we verify the syncookie in pf_test_state_tcp() we don't
decrement the number of half-open connections to avoid an incorrect
double decrement.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37919
<net/if.h> should be a fully user-facing header, so these APIs don't
belong there. Revert and will find another approach.
This reverts commit fe33e0ab83.
Fixes: fe33e0ab83
Sponsored by: Juniper Networks, Inc.
Summary:
The "public" KPI for ifnet belongs in net/if.h, with net/if_var.h being
implementation details for the netstack. This is the next step in
enforcing that separation.
Reviewed by: melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38030
Summary:
A common pattern has been to:
if (foo)
caps = IFCAP_FOO;
ifp->if_capenable &= ~IFCAP_FOO;
ifp->if_capenable |= caps;
which in the new order of things would be:
if (foo)
caps = IF_FOO;
if_setcapenablebits(ifp, 0, IFCAP_FOO);
if_setcapenablebits(ifp, caps, 0);
This change streamlines this into:
if (foo)
caps = IF_FOO;
if_setcapenablebits(ifp, caps, IFCAP_FOO);
Reviewed by: melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37993
<net/if_var.h> should be a kernel-only header, but it's included
elsewhere. Until that's addressed expose if_t to userspace to fix the
build.
Fixes: be4315dcbb
Sponsored by: Juniper Networks, Inc.
Summary:
<net/if_var.h> should really be used by the netstack only, not by
drivers. Eventually all the accessors will be moved to <net/if.h> as
well, but for now just move the typedef while the KPI gets sorted and
drivers get converted.
Sponsored by: Juniper Networks, Inc.
Reviewed By: melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D37784
IFCAP2_XXX constants are integers, they do not need shift for the
definition. But their usage as bitmask for if_capenable2 does require
shift. Add convenience macro IFCAP2_BIT() for consumers.
Fix the only existing consumer, mlx5(4) RXTLS enable bits.
Reported by: jhb
Reviewed by: jhb, jhibbits, hselasky
Coverity CID: 1501659
Sponsored by: NVIDIA networking
Differential revision: https://reviews.freebsd.org/D37862
This reverts commit 92f0cf77db.
This change was incorrect, at least because it uses ovpn_kpeer's tree
for multipbe RB_TREEs.
This is a performance change, not a functional one, so we can revert
this until it can be fixed.
Reported by: Gert Doering <gert@greenie.muc.de>
Sponsored by: Rubicon Communications, LLC ("Netgate")
Summary:
Add the following accessors to hide some more netstack details:
* if_get/setcapabilities2 and *bits analogue
* if_setdname
* if_getxname
* if_transmit - wrapper for call to ifp->if_transmit()
- This required changing the existing if_transmit to
if_transmit_default, since that's its purpose.
* if_getalloctype
* if_getindex
* if_foreach_addr_type - Like if_foreach_lladdr() but for any address
family type. Used by some drivers to iterate over all AF_INET
addresses.
* if_init() - wrapper for ifp->if_init() call
* if_setinputfn
* if_setsndtagallocfn
* if_togglehwassist
Reviewers: #transport, #network, glebius, melifaro
Reviewed by: #network, melifaro
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37664
In non-INVARIANTS kernels, hide the warning message printed by debugnet
when an interface MTU is configured or link state changes, and debugnet
cannot infer the number of mbuf clusters to reserve. The warning isn't
really actionable and mostly serves to confuse users.
Reviewed by: vangyzen, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D34393
The detach of the interface and group were leaving pfi_ifnet memory
behind. Check if the kif still has references, and clean it up if it
doesn't
On interface detach, the group deletion was notified first and then a
change notification was sent. This would recreate the group in the kif
layer. Reorder the change to before the delete.
PR: 257218
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37569
Move the use of the `offsetof(struct ovpn_counters, fieldname) /
sizeof(uint64_t)` construct into a macro.
This removes a fair bit of code duplication and should make things a
little easier to read.
Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37607
When we remove a peer userspace can no longer retrieve its counters. To
ensure that userspace can get a full count of the entire session we now
include the counters in the deletion message.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37606
Introduce two more RB_TREEs so that we can look up peers by their peer
id (already present) or vpn4 or vpn6 address.
This removes the last linear scan of the peer list.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37605
OpenVPN will introduce a mechanism to retrieve per-peer statistics.
Start tracking those so we can return them to userspace when queried.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37603
OpenVPN userspace no longer uses the ioctl interface to send control
packets. It instead uses the socket directly.
The use of OVPN_SEND_PKT was never released, so we can remove this
without worrying about compatibility.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37602
A comment at the beginning of the function notes that we may be
transmitting multiple fragments as distinct packets. So, the function
loops over all fragments, transmitting each mbuf chain. If if_transmit
fails, we need to free all of the fragments, but m_freem() only frees an
mbuf chain - it doesn't follow m_nextpkt.
Change the error handler to free each untransmitted packet fragment, and
count each fragment as a separate error since we increment OPACKETS once
per fragment when transmission is successful.
Reviewed by: zlei, kp
MFC after: 1 week
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D37635
* Add link-state change notifications by subscribing to ifnet_link_event.
In the Linux netlink model, link state is reported in 2 places: first is
the IFLA_OPERSTATE, which stores state per RFC2863.
The second is an IFF_LOWER_UP interface flag. As many applications rely
on the latter, reserve 1 bit from if_flags, named as IFF_NETLINK_1.
This flag is mapped to IFF_LOWER_UP in the netlink headers. This is done
to avoid making applications think this flag is actually
supported / presented in non-netlink outputs.
* Add flag change notifications, by hooking into rt_ifmsg().
In the netlink model, notification should include the bitmask for the
change flags. Update rt_ifmsg() to include such bitmask.
Differential Revision: https://reviews.freebsd.org/D37597
Extend peer deleted notifications (which are the only type right now) to
include the reason the peer was deleted. This can be either because
userspace requested it, or because the peer timed out.
Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37583
Store user-supplied source protocol in the nexthops and nexthop groups.
Protocol specification help routing daemons like bird to quickly
identify self-originated routes after the crash or restart.
Example:
```
10.2.0.0/24 via 10.0.0.2 dev vtnet0 proto bird
10.3.0.0/24 proto bird
nexthop via 10.0.0.2 dev vtnet0 weight 3
nexthop via 10.0.0.3 dev vtnet0 weight 4
```
There is a need to store client metadata in nexthops and nexthop groups.
This metadata is immutable and participate in nhop/nhg comparison.
Nexthops KPI already supports its: nexthop creation pattern is
```
nhop_alloc()
nhop_set_...()
...
nhop_get_nhop()
```
This change provides a similar pattern for the nexthop groups.
Specifically, it adds nhgrp_alloc(), nhgrp_get_nhgrp() and
nhgrp_set_uidx().
MFC after: 2 weeks
Replace the fixed-sized array by an RB_TREE. This should both speed up
lookups and remove the 128 peer limit.
Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37524
This avoids having to undo it before invoking NetGraph's orphan input
hook.
Reviewed by: ae, melifaro
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D37510
Rather than passing control packets through the ioctl interface allow
them to pass through the normal UDP socket flow.
This simplifies both kernel and userspace, and matches the approach
taken (or the one that will be taken) on the Linux side of things.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37317
We reference count to ensure we don't release the socket while we still
have data in flight. That means that we can end up releasing the socket
from ovpn_encrypt_tx_cb().
We must have a vnet context set when calling sorele() (which asserts
this from within sofree()), so move the CURVNET_SET()/CURVNET_RESTORE()
to ensure this is the case.
While here also add a couple of assertions to make this more obvious,
and to ease future debugging.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37326
We need to explicitly list AES-128-GCM as an allowed cipher for that
mode to work. While here also add AES-192-GCM. That brings our supported
cipher list in line with other openvpn/dco platforms.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Work is ongoing to add support for pfsync over IPv6. This required some
changes to allow for differentiating between the two families in a more
generic way.
This patch converts the relevant ioctls to using nvlists, making future
extensions (such as supporting IPv6 addresses) easier.
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D36277
Allow pf (l2) to be used to redirect ethernet packets to a different
interface.
The intended use case is to send 802.1x challenges out to a side
interface, to enable AT&T links to function with pfSense as a gateway,
rather than the AT&T provided hardware.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37193
This commit brings back the driver from FreeBSD commit
f187d6dfbf plus subsequent fixes from
upstream.
Relative to upstream this commit includes a few other small fixes such
as additional INET and INET6 #ifdef's, #include cleanups, and updates
for recent API changes in main.
Reviewed by: pauamma, gbe, kevans, emaste
Obtained from: git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36909
Change the default for net.link.bridge.pfil_member and
net.link.bridge.pfil_bridge to zero.
That is, default to not calling layer 3 firewalls on the bridge or its
member interfaces.
With either of these enabled the bridge will, during L2 processing,
remove the Ethernet header from packets, feed them to L3 firewalls,
re-add the Ethernet header and send them out.
Not only does this interact very poorly with firewalls which defer
packets, or reassemble and refragment IPv6, it also causes considerable
confusion for users, because the firewall gets called in unexpected
ways.
For example, a bridge which contains a bhyve tap and the host's LAN
interface. We'd expect traffic between the LAN and bhyve VM to pass, no
matter what (layer 3) firewall rules are set on the host. That's not the
case as long as pfil_bridge or pfil_member are set.
Reviewed by: Zhenlei Huang
MFC: never
Differential Revision: https://reviews.freebsd.org/D37009
Allow the choice between asynchronous and synchronous netisr and crypto
calls. These have performance implications, but depend on the specific
setup and OCF back-end.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37017
For v2, iflib will parse packet headers before queueing a packet.
This commit also adds a new field in the structure that holds parsed
header information from packets; it stores the IP ToS/traffic class
field found in the IPv4/IPv6 header.
To help, it will only partially parse header packets before queueing
them by using a new header parsing function that does less than the
current parsing header function; for our purposes we only need up to the
minimal IP header in order to get the IP ToS infromation and don't need
to pull up more data.
For now, v1 and v2 co-exist in this patch; v1 still offers a
less-invasive method where none of the packet is parsed in iflib before
queueing.
This also bumps the sys/param.h version.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Tested by: IntelNetworking
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D34742
Fully working openvpn(8) --iroute support needs real subnet config
on ovpn(4) interfaces (IFF_BROADCAST), while client-side/p2p
configs need IFF_POINTOPOINT setting. So make this configurable.
Reviewed by: kp
ovpn_encrypt_tx_cb() calls ovpn_encap() to transmit a packet, then adds
the length of the packet to the "tunnel_bytes_sent" counter. However,
after ovpn_encap() returns 0, the mbuf chain may have been freed, so the
load of m->m_pkthdr.len may be a use-after-free.
Reported by: markj
Sponsored by: Rubicon Communications, LLC ("Netgate")
Rather than using a per-cpu state counter, and adding in the CPU id we
can atomically increment the number.
This has the advantage of removing the assumption that the CPU ID fits
in 8 bits.
Event: Aberdeen Hackathon 2022
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D36915
netisr_dispatch() can fail, especially when under high traffic loads.
This isn't a fatal error, so simply don't check the return value.
Sponsored by: Rubicon Communications, LLC ("Netgate")
The vxlan interface encapsulates the Ethernet frame by prepending IP/UDP
and vxlan headers. For statistics, only the payload, i.e. the
encapsulated (inner) frame should be counted.
Event: Aberdeen Hackathon 2022
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D36855
If the crypto callback is asynchronous we're no longer in net_epoch,
which ovpn_encap() (and ip_output() it calls) expect.
Ensure we've entered the epoch.
Do the same thing for the rx path.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Use time_t rather than uint32_t to represent the timestamps. That means
we have 64 bits rather than 32 on all platforms except i386, avoiding
the Y2K38 issues on most platforms.
Reviewed by: Zhenlei Huang
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D36837
r325506 (3cf8254f1e) extended struct pkthdr to add packet timestamp in
mbuf(9) chain. For example, cxgbe(4) and mlx5en(4) support this feature.
Use the timestamp for bpf(4) if it is available.
Reviewed by: hselasky, kib, np
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36868
Netlinks is a communication protocol currently used in Linux kernel to modify,
read and subscribe for nearly all networking state. Interfaces, addresses, routes,
firewall, fibs, vnets, etc are controlled via netlink.
It is async, TLV-based protocol, providing 1-1 and 1-many communications.
The current implementation supports the subset of NETLINK_ROUTE
family. To be more specific, the following is supported:
* Dumps:
- routes
- nexthops / nexthop groups
- interfaces
- interface addresses
- neighbors (arp/ndp)
* Notifications:
- interface arrival/departure
- interface address arrival/departure
- route addition/deletion
* Modifications:
- adding/deleting routes
- adding/deleting nexthops/nexthops groups
- adding/deleting neghbors
- adding/deleting interfaces (basic support only)
* Rtsock interaction
- route events are bridged both ways
The implementation also supports the NETLINK_GENERIC family framework.
Implementation notes:
Netlink is implemented via loadable/unloadable kernel module,
not touching many kernel parts.
Each netlink socket uses dedicated taskqueue to support async operations
that can sleep, such as interface creation. All message processing is
performed within these taskqueues.
Compatibility:
Most of the Netlink data models specified above maps to FreeBSD concepts
nicely. Unmodified ip(8) binary correctly works with
interfaces, addresses, routes, nexthops and nexthop groups. Some
software such as net/bird require header-only modifications to compile
and work with FreeBSD netlink.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D36002
MFC after: 2 months
* Factor out queue selection (epair_select_queue()) and mbuf
preparation (epair_prepare_mbuf()) from epair_menq(). It simplifies
epair_menq() implementation and reduces the amount of dependencies
on the neighbouring epair.
* Use dedicated epair_set_state() instead of 2-lines copy-paste
* Factor out unit selection code (epair_handle_unit()) from
epair_clone_create(). It simplifies the clone creation logic.
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D36689
When the tunneled (IPv6) traffic had traffic class bits set (but only >=
16) the packet got lost on the receive side.
This happened because the address family check in ovpn_get_af() failed
to mask correctly, so the version check didn't match, causing us to drop
the packet.
While here also extend the existing 6-in-6 test case to trigger this
issue.
PR: 266598
Sponsored by: Rubicon Communications, LLC ("Netgate")