Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33266
Fun note: git show e6abef09187a
Now that ifindex is static to if.c we can unvirtualize it. For lifetime
of an ifnet its index never changes. To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet. Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.
API wise the only change is that now minimum interface index can be
greater than 1. The holes in interface indexes were always allowed.
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33672
* Do a single call into if_clone.c instead of two. The cloner
can't disappear since the interface sits on its list.
* Make restoration smarter - check that cloner with same name
exists in the new vnet.
Differential revision: https://reviews.freebsd.org/D33941
In 4f6c66cc9c75c8, ifa_ifwithnet() was changed to no longer
ifa_ref() the returned ifaddr, and instead the caller was required
to stay in the net_epoch for as long as they wanted the ifaddr
to remain valid. However, this missed the case where an AF_LINK
lookup would call ifaddr_byindex(), which still does ifa_ref()
the ifaddr. This would cause a refcount leak.
Fix this by inlining the relevant parts of ifaddr_byindex() here,
with the ifa_ref() call removed. This also avoids an unnecessary
entry and exit from the net_epoch for this case.
I've audited all in-tree consumers of ifa_ifwithnet() that could
possibly perform an AF_LINK lookup and confirmed that none of them
will expect the ifaddr to have a reference that they need to
release.
MFC after: 2 months
Sponsored by: Dell Inc
Differential Revision: https://reviews.freebsd.org/D28705
Reviewed by: melifaro
This requires moving net.link.generic sysctl declaration from if_mib.c
to if.c. Ideally if_mib.c needs just to be merged to if.c, but they
have different license texts.
Differential revision: https://reviews.freebsd.org/D33263
Sweep over potentially unsafe calls to ifnet_byindex() and wrap them
in epoch. Most of the code touched remains unsafe, as the returned
pointer is being used after epoch exit. Mark that with a comment.
Validate the index argument inside the function, reducing argument
validation requirement from the callers and making V_if_index
private to if.c.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D33263
Now it is possible to just merge all this complexity into single
linear function. Note that IFNET_WLOCK() is a sleepable lock, so
we can M_WAITOK and epoch_wait_preempt().
Reviewed by: melifaro, bz, kp
Differential revision: https://reviews.freebsd.org/D33262
So let's just call malloc() directly. This also avoids hidden
doubling of default V_if_indexlim.
Reviewed by: melifaro, bz, kp
Differential revision: https://reviews.freebsd.org/D33261
Now that if_alloc_domain() never fails and actually doesn't
expose ifnet to outside we can eliminate IFNET_HOLD and two
step index allocation.
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33259
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
The last consumer of if_com_alloc() is firewire. It never fails
to allocate. Most likely the if_com_alloc() KPI will go away
together with if_fwip(), less likely new consumers of if_com_alloc()
will be added, but they would need to follow the no fail KPI.
With this change if_index can become static. There is nothing
that if_debug.c would want to isolate from if.c. Potentially
if.c wants to share everything with if_debug.c.
Move Bjoern's copyright to if.c.
Reviewed by: bz
When traversing a list of interface addresses, we need to be in a net
epoch section, and protocol ctlinput routines need a stable reference to
the address.
Reported by: syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com
Reviewed by: kp, melifaro
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31889
if_bridge member interfaces should always have the same MTU as the
bridge itself, so disallow MTU changes on interfaces that are part of an
if_bridge.
Reviewed by: donner
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31304
A successful copyinstr() call guarantees that the returned string is
nul-terminated. Furthermore, the removed check would harmlessly compare
an uninitialized byte with '\0' if the new name is shorter than
IFNAMESIZ - 1.
Reported by: KMSAN
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
While refactoring an earlier series of changes during review, the
'saved_data' variable stopped being used at the bottom of if_ioctl().
Suggested by: brooks
Reviewed by: brooks, imp, kib
Fixes: d17e0940f79f Rework compat shims in ifioctl().
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D30197
Centralize logic for handling compat ioctls into two blocks of code at
the start and end of the ioctl routine. This avoids the conversion
logic being spread out both in multiple blocks in ifioctl as well as
various helper functions.
Reviewed by: brooks, kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D29891
This reverts a portion of 274579831b61 ("capsicum: Limit socket
operations in capability mode") as at least rtsol and dhcpcd rely on
being able to configure network interfaces while in capability mode.
Reported by: bapt, Greg V
Sponsored by: The FreeBSD Foundation
Capsicum did not prevent certain privileged networking operations,
specifically creation of raw sockets and network configuration ioctls.
However, these facilities can be used to circumvent some of the
restrictions that capability mode is supposed to enforce.
Add capability mode checks to disallow network configuration ioctls and
creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET
internet sockets.
Reviewed by: oshogbo
Discussed with: emaste
Reported by: manu
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29423
* device_printf() is effectively a printf
* if_printf() is effectively a LOG_INFO
This allows subsystems to log device/netif stuff using different log levels,
rather than having to invent their own way to prefix unit/netif names.
Differential Revision: https://reviews.freebsd.org/D29320
Reviewed by: imp
Drain the callbacks upon if_deregister_com_alloc() such that the
if_com_free[type] won't be nullified before if_destroy().
Taking fwip(4) as an example, before this fix, kldunload if_fwip will
go through the following:
1. fwip_detach()
2. if_free() -> schedule if_destroy() through NET_EPOCH_CALL
3. fwip_detach() returns
4. firewire_modevent(MOD_UNLOAD) -> if_deregister_com_alloc()
5. kernel complains about:
Warning: memory type fw_com leaked memory on destroy (1 allocations, 64 bytes leaked).
6. EPOCH runs if_destroy() -> if_free_internal()i
By this time, if_com_free[if_alloctype] is NULL since it's already
nullified by if_deregister_com_alloc(); hence, firewire_free() won't
have a chance to release the allocated fw_com.
Reviewed by: hselasky, glebius
MFC after: 2 weeks
When we have an ifp pointer and the code is running inside epoch,
epoch guarantees the pointer will not be freed.
However, the following case can still happen:
* in thread 1 we drop to refcount=0 for ifp and schedule its deletion.
* in thread 2 we use this ifp and reference it
* destroy callout kicks in
* unhappy user reports a bug
This can happen with the current implementation of ifnet_byindex_ref(),
as we're not holding any locks preventing ifnet deletion by a parallel thread.
To address it, add if_try_ref(), allowing to return failure when
referencing ifp with refcount=0.
Additionally, enforce existing if_ref() is with KASSERT to provide a
cleaner error in such scenarios.
Finally, fix ifnet_byindex_ref() by using if_try_ref() and returning NULL
if the latter fails.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28836
More and more code migrates from lock-based protection to the NET_EPOCH
umbrella. It requires some logic changes, including, notably, refcount
handling.
When we have an `ifa` pointer and we're running inside epoch we're
guaranteed that this pointer will not be freed.
However, the following case can still happen:
* in thread 1 we drop to 0 refcount for ifa and schedule its deletion.
* in thread 2 we use this ifa and reference it
* destroy callout kicks in
* unhappy user reports bug
To address it, new `ifa_try_ref()` function is added, allowing to return
failure when we try to reference `ifa` with 0 refcount.
Additionally, existing `ifa_ref()` is enforced with `KASSERT` to provide
cleaner error in such scenarious.
Reviewed By: rstone, donner
Differential Revision: https://reviews.freebsd.org/D28639
MFC after: 1 week
Widen the ifnet_detach_sxlock to cover the entire vnet sysuninit code.
This ensures that we can't end up having the vnet_sysuninit free the UDP
pcb while the detach code is running and trying to purge the UDP pcb.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28530
it gets unused.
Currently if_purgeifaddrs() uses in6_purgeaddr() to remove IPv6
ifaddrs. in6_purgeaddr() does not trrigger prefix removal if
number of linked ifas goes to 0, as this is a low-level function.
As a result, if_purgeifaddrs() purges all IPv4/IPv6 addresses but
keeps corresponding IPv6 prefixes.
Fix this by creating higher-level wrapper which handles unused
prefix usecase and use it in if_purgeifaddrs().
Differential revision: https://reviews.freebsd.org/D28128
Revert the mitigation code for the vnet/epair cleanup race (done in r365457).
r368237 introduced a more reliable fix.
MFC after: 2 weeks
Sponsored by: Modirum MDPay
When destroying a vnet and an epair (with one end in the vnet) we often
panicked. This was the result of the destruction of the epair, which destroys
both ends simultaneously, happening while vnet_if_return() was moving the
struct ifnet to its home vnet. This can result in a freed ifnet being re-added
to the home vnet V_ifnet list. That in turn panics the next time the ifnet is
used.
Prevent this race by ensuring that vnet_if_return() cannot run at the same time
as if_detach() or epair_clone_destroy().
PR: 238870, 234985, 244703, 250870
MFC after: 2 weeks
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D27378
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
When we terminate a vnet (i.e. jail) we move interfaces back to their home
vnet. We need to protect our access to the V_ifnet CK_LIST.
We could enter NET_EPOCH, but if_detach_internal() (called from if_vmove())
waits for net epoch callback completion. That's not possible from NET_EPOCH.
Instead, we take the IFNET_WLOCK, build a list of the interfaces that need to
move and, once we've released the lock, move them back to their home vnet.
We cannot hold the IFNET_WLOCK() during if_vmove(), because that results in a
LOR between ifnet_sx, in_multi_sx and iflib ctx lock.
Separate out moving the ifp into or out of V_ifnet, so we can hold the lock as
we do the list manipulation, but do not hold it as we if_vmove().
Reviewed by: melifaro
MFC after: 2 weeks
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D27279
It no longer serves any purpose, as evidenced by the fact that we never take it
without ifnet_sxlock.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D27278
For interfaces that do not support SIOCGIFMEDIA (for which there are
quite a few) the only fallback is to query the interface for
if_data->ifi_link_state. While it's possible to get at if_data for an
interface via getifaddrs(3) or sysctl, both are heavy weight mechanisms.
SIOCGIFDATA is a simple ioctl to retrieve this fast with very little
resource use in comparison. This implementation mirrors that of other
similar ioctls in FreeBSD.
Submitted by: Roy Marples <roy@marples.name>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D26538
There's a race where dying vnets move their interfaces back to their original
vnet, and if_epair cleanup (where deleting one interface also deletes the other
end of the epair). This is commonly triggered by the pf tests, but also by
cleanup of vnet jails.
As we've not yet been able to fix the root cause of the issue work around the
panic by not dereferencing a NULL softc in epair_qflush() and by not
re-attaching DYING interfaces.
This isn't a full fix, but makes a very common panic far less likely.
PR: 244703, 238870
Reviewed by: lutz_donnerhacke.de
MFC after: 4 days
Differential Revision: https://reviews.freebsd.org/D26324