Remove unused arguments from dom_rtattach/dom_rtdetach functions and make
them return/accept 'struct rib_head' instead of 'void **'.
Declare inet/inet6 implementations in the relevant _var.h headers similar
to domifattach / domifdetach.
Add rib_subscribe_internal() function to accept subscriptions to the rnh
directly.
Differential Revision: https://reviews.freebsd.org/D26053
The lagg_clone_destroy() handles detach and waiting for ifconfig callers
to drain already.
This narrows the race for 2 panics that the tests triggered. Both were a
consequence of adding a port to the lagg device after it had already detached
from all of its ports. The link state task would run after lagg_clone_destroy()
free'd the lagg softc.
kernel:trap_fatal+0xa4
kernel:trap_pfault+0x61
kernel:trap+0x316
kernel:witness_checkorder+0x6d
kernel:_sx_xlock+0x72
if_lagg.ko:lagg_port_state+0x3b
kernel:if_down+0x144
kernel:if_detach+0x659
if_tap.ko:tap_destroy+0x46
kernel:if_clone_destroyif+0x1b7
kernel:if_clone_destroy+0x8d
kernel:ifioctl+0x29c
kernel:kern_ioctl+0x2bd
kernel:sys_ioctl+0x16d
kernel:amd64_syscall+0x337
kernel:trap_fatal+0xa4
kernel:trap_pfault+0x61
kernel:trap+0x316
kernel:witness_checkorder+0x6d
kernel:_sx_xlock+0x72
if_lagg.ko:lagg_port_state+0x3b
kernel:do_link_state_change+0x9b
kernel:taskqueue_run_locked+0x10b
kernel:taskqueue_run+0x49
kernel:ithread_loop+0x19c
kernel:fork_exit+0x83
PR: 244168
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D25284
After moving the route control plane code from net/route.c,
all rtzone users ended up being in net/route_ctl.c.
Move uma(9) rtzone setup/teardown code to net/route_ctl.c as well
to have everything in a single place.
While here, remove custom initializers from the zone.
It was added originally to avoid setup/teardown of costy per-cpu couters.
With these counters removed, the only remaining job was avoiding rte mutex
setup/teardown. Mutex setup is relatively cheap. Additionally, this mutex
will soon be removed. With that in mind, there is no sense in keeping
custom zone callbacks.
Differential Revision: https://reviews.freebsd.org/D26051
It is possible for rn_delete() to return NULL. If this happens, then set
*perror to ESRCH, as is done in the rest of the function.
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D25871
For drivers with IFLIB_HAS_RXCQ set, there is a separate completion
queue. In this case, the netmap rxsync routine needs to update
rxq->ifr_cq_cidx in the same way it is updated by iflib_rxeof().
This improves the situation for vmx(4) and bnxt(4) drivers, which
use iflib and have the IFLIB_HAS_RXCQ bit set.
PR: 248494
MFC after: 3 weeks
First, fix the initialization of the fl->ifl_rxd_idxs array,
which was affected by an off-by-one bug.
Once there, refactor the function to use better names for
local variables, optimize the variable assignments, and
merge the bus_dmamap_sync() inner loop with the outer one.
PR: 248494
MFC after: 3 weeks
Netmap only uses free list 0 to keep it consistent with its
one-to-one mapping between each netmap ring and a device RX
(or TX) queue.
However, the current iflib_netmap_rxsync() routine was
mistakenly updating the ifl_cidx field of both free lists.
PR: 248494
MFC after: 2 weeks
These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches. They no longer serve any
purpose since UMA now provides their functionality by default. Remove
them to simplyify the kernel memory allocator interfaces a bit.
Reviewed by: cem, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25937
Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
in6_rtrequest, rtrequest_fib> and their uses and switch to
to rib_action(). This is part of the new routing KPI.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25546
In case the network device has a RX or TX control queue, the correct
number of TX/RX descriptors is contained in the second entry of the
isc_ntxd (or isc_nrxd) array, rather than in the first entry.
This case is correctly handled by iflib_device_register() and
iflib_pseudo_register(), but not by iflib_netmap_attach().
If the first entry is larger than the second, this can result in a
panic. This change fixes the bug by introducing two helper functions
that also lead to some code simplification.
PR: 247647
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D25541
Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
in6_rtrequest, rtrequest_fib> and their uses and switch to
to rib_action(). This is part of the new routing KPI.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25546
While it doesn't trigger INVARIANTS or WITNESS on head it does in stable/12.
There's also no reason for it, as we can easily report the out of memory error
to the caller (i.e. userspace). All of these can already fail.
PR: 248046
MFC after: 3 days
if_attach() -> if_attach_internal() will call if_attachdomain1(ifp) any time
an ethernet interface is setup *after*
SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_FIRST. This eventually leads to
nd6_ifattach() -> nd6_setmtu0() stashing off ifp->if_mtu in ndi->maxmtu
*before* ifp->if_mtu has been properly set in some scenarios, e.g., USB
ethernet adapter plugged in later on.
For interfaces that are created in early boot, we don't have this issue as
domains aren't constructed enough for them to attach and thus it gets
deferred to domainifattach at SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_SECOND
*after* the mtu has been set earlier in ether_ifattach().
PR: 248005
Submitted by: Mathew <mjanelle blackberry com>
MFC after: 1 week
Old subscription model allowed only single customer.
Switch inet6 to the new subscription api and eliminate the old model.
Differential Revision: https://reviews.freebsd.org/D25615
Subscriptions are planned to be used by modules such as route lookup engines.
In that case that's the module task to properly unsibscribe before detach.
However, the in-kernel customer - inet6 wants to track default route changes.
To avoid having inet6 store per-fib subscriptions, handle automatic
destruction internally.
Differential Revision: https://reviews.freebsd.org/D25614
Now nhop_ref_object() unconditionally acquires a reference, and the new
nhop_try_ref_object() uses refcount_acquire_if_not_zero() to
conditionally acquire a reference. Since the former is cheaper, use it
when we know that the initial counter value is non-zero. No functional
change intended.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D25535
- Get rid of the ifl_vm_addrs array. It is not used by any existing
consumer, so we are just dirtying a couple of cache lines for no
reason.
- Use uma_zalloc(fl->ifl_zone) instead of m_cljget(). Otherwise
m_cljget() is doing unnecessary work to look up the correct zone, when
iflib already knows what that zone is.
- ifl_gen is only used when INVARIANTS is on, so make that more clear.
- Fix some style nits and inconsistencies.
Reviewed by: gallatin
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25490
When refilling an rx freelist, make sure we only update the hardware
producer index if at least one cluster was allocated. Otherwise the
NIC is programmed to write a previously used cluster, typically
resulting in a use-after-free when packet data is written by the
hardware.
Also make sure that we don't update the fragment index cursor if the
last allocation attempt didn't succeed. For at least Intel drivers,
iflib assumes that the consumer index and fragment index cursor stay in
lockstep, but this assumption was violated in the face of cluster
allocation failures.
Reported and tested by: pho
Reviewed by: gallatin, hselasky
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25489
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
over pointer validness and requiring on-stack data copying.
With no callers remaining, remove fib[46]_lookup_nh_ functions.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25445
The new function operates similarly to ifconfig_lagg_get_lagg_status and
likewise is accompanied by a function to free the bridge status data structure.
I have included in this patch the relocation of some strings describing STP
parameters and the PV2ID macro from ifconfig into net/if_bridgevar.h as they
are useful for consumers of libifconfig.
Reviewed by: kp, melifaro, mmacy
Approved by: mmacy (mentor)
MFC after: 1 week
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D25460
In the current iflib_netmap_rxsync, there is nothing that prevents
kring->nr_hwtail to overrun kring->nr_hwcur during the descriptor
import phase. This may cause errors in netmap applications, such as:
em1 RX0: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail'
h 795 c 795 t 282 rh 795 rc 795 rt 282 hc 282 ht 282
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25252
- a cloneattach failure will not currently be handled correctly,
jump to the right target
- pseudo devices are all treat as if they're ethernet devices -
this often doesn't make sense
MFC after: 1 week
Sponsored by: Netgate, Inc.
Differential Revision: https://reviews.freebsd.org/D25083
This OID was added in r17352 but the write path of IFDATA_LINKSPECIFIC
seems unused as there are no in-base writers, and as far as I can tell
we had issues with this code before, see PR 219472. Drop the write path
to make the handler read-only as described in comments and man-pages.
It can be marked as MPSAFE now.
Reviewed by: bdragon, kib, melifaro, wollman
Approved by: kib (mentor)
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D25348
r286700 added the "lacp_fast_timeout" option to `ifconfig', but we forgot to
include the new option in the string used to decode the option bits. Add
"LACP_FAST_TIMO" to LAGG_OPT_BITS.
Also, s/LAGG_OPT_LACP_TIMEOUT/LAGG_OPT_LACP_FAST_TIMO/g , to be clearer that
the flag indicates "Fast Timeout" mode.
Reported by: Greg Foster <gfoster at panasas dot com>
Reviewed by: jpaetzel
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D25239
This simplifies the code and allows to further split rtentry and nexthop,
removing one of the blockers for multipath code introduction, described in
D24141.
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D25192
In the receive interrupt routine, always call netmap_rx_irq().
The latter function will return != NM_IRQ_PASS if netmap is not
active on that specific receive queue, so that the driver can go
on with iflib_rxeof(). Note that netmap supports partial opening,
where only a subset of the RX or TX rings can be open in netmap mode.
Checking the IFCAP_NETMAP flag is not enough to make sure that the
queue is indeed in netmap mode.
Moreover, in case netmap_rx_irq() returns NM_IRQ_RESCHED, it means
that netmap expects the driver to call netmap_rx_irq() again as soon
as possible. Currently, this may happen when the device is attached
to a VALE switch.
Reviewed by: gallatin
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25167
At this point, AES is the more common name for Rijndael128. setkey(8)
will still accept the old name, and old constants remain for
compatiblity.
Reviewed by: cem, bcr (manpages)
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24964
After r339550 tunneling interfaces have started handle appearing and
disappearing of ingress IP address on the host system.
When such interfaces are moving into VNET jail, they lose ability to
properly handle ifaddr_event_ext event. And this leads to need to
reconfigure tunnel to make it working again.
Since moving an interface into VNET jail leads to removing of all IP
addresses, it looks consistent, that tunnel configuration should also
be cleared. This is what will do if_reassing method.
Reported by: John W. O'Brien <john saltant com>
MFC after: 1 week
Currently there is no easy way of subscribing for the routing table changes.
The only existing way is to set ifa_rtrequest callback in the each protocol
ifaddr, which is not convenient or extandable.
This change provides generic notification subscription mechanism, that will
replace current ifa_rtrequest one and allow other applications such as
accelerated routing lookup modules subscribe for the changes.
In particular, this change provides 2 hooks: 1) synchronous one
(RIB_NOTIFY_IMMEDIATE), called under RIB_WLOCK, which ensures exact
ordering of the changes and 2) async one, (RIB_NOTIFY_DELAYED)
that is called after the change w/o holding locks. The latter one does not
provide any notification ordering guarantee.
Differential Revision: https://reviews.freebsd.org/D25070
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
returned in case of successful operation result. There are two problems with
this appoach. First is that it doesn't provide enough information for the
upcoming multipath changes, where rtentry refers to a new nexthop group,
and there is no way of guessing which paths were added during the change.
Second is that some rtentry fields can change during notification and
protecting from it by requiring customers to unlock rtentry is not desired.
Additionally, as the consumers such as rtsock do know which operation they
request in advance, making explicit add/change/del versions of the functions
makes sense, especially given the functions don't share a lot of code.
With that in mind, introduce rib_cmd_info notification structure and
rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
It will be used in upcoming generalized notifications.
* Move definitions of the new functions and some other functions/structures
used for the routing table manipulation to a separate header file,
net/route/route_ctl.h. net/route.h is a frequently used file included in
~140 places in kernel, and 90% of the users don't need these definitions.
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D25067
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
returned in case of successful operation result. There are two problems with
this appoach. First is that it doesn't provide enough information for the
upcoming multipath changes, where rtentry refers to a new nexthop group,
and there is no way of guessing which paths were added during the change.
Second is that some rtentry fields can change during notification and
protecting from it by requiring customers to unlock rtentry is not desired.
Additionally, as the consumers such as rtsock do know which operation they
request in advance, making explicit add/change/del versions of the functions
makes sense, especially given the functions don't share a lot of code.
With that in mind, introduce rib_cmd_info notification structure and
rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
It will be used in upcoming generalized notifications.
* Move definitions of the new functions and some other functions/structures
used for the routing table manipulation to a separate header file,
net/route/route_ctl.h. net/route.h is a frequently used file included in
~140 places in kernel, and 90% of the users don't need these definitions.
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D25067
This fixes ipsec.ko to include all of IPSEC_DEBUG.
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25046