The VFS conventions is that VOP_LOOKUP() methods do not need to handle
ISDOTDOT lookups for VV_ROOT vnodes (since they cannot, after all). Nullfs
bypasses VOP_LOOKUP() to lower filesystem, and there, due to user actions,
it is possible to get into situation where
- upper vnode does not have VV_ROOT set
- lower vnode is root
- ISDOTDOT is requested
User just needs to nullfs-mount non-root of some filesystem, and then move
some directory under mount, out of mount, using lower filesystem.
In this case, nullfs cannot do much, but we still should and can ensure
internal kernel structures are consistent. Avoid ISDOTDOT lookup forwarding
when VV_ROOT is set on lower dvp, return somewhat arbitrary ENOENT.
PR: 253593
Reported by: Gregor Koscak <elogin41@gmail.com>
Test by: Patrick Sullivan <sulli00777@gmail.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
A lot of firmware files have a "-" in the name. That "-" is a problem
when dealing with shell variables or loader (e.g., auto-loading .ko).
It may thus often be convenient to generate firmware kernel object files
with s/-/_/g in the name. In order to automatically find them from
drivers using LinuxKPI also substitue the '-' for a '_' like we do
for '/' and '.' already.
Reviewed-by: hselasky, manu (ok)
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29514
Rename the last remaining bits depending on ifnet from linux/netdevice.h
instead of using the compat macros. This helps clearing up
struct netdevice being struct ifnet from linux/netdevice.h.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky, kib
X-D-R: D29366
Differential Revision: https://reviews.freebsd.org/D29497
Under geom(4) nvme_ns_bio_process() is on the path where sleep
is prohibited as g_io_shedule_down() calls THREAD_NO_SLEEPNG()
before geom->start().
Reviewed By: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29539
Allow the calculation of the mu adjustment factor to underflow instead of
rejecting the VOI sample from the digest and logging an error. This trades off
some (currently unquantified) additional centroid error in exchange for better
fidelity of the distribution's density, which is the right trade off at the
moment until follow up work to better handle and track accumulated error can be
undertaken.
Obtained from: Netflix
MFC after: immediately
The loops for Chacha20 and Chacha20+Poly1305 which encrypted/decrypted
full blocks of data used the minimum of the input and output segment
lengths to determine the size of the next chunk ('todo') to pass to
Chacha20_ctr32(). However, the input and output segments could extend
past the end of the ciphertext region into the tag (e.g. if a "plain"
single mbuf contained an entire TLS record). If the length of the tag
plus the length of the last partial block together were at least as
large as a full Chacha20 block (64 bytes), then an extra block was
encrypted/decrypted overlapping with the tag. Fix this by also
capping the amount of data to encrypt/decrypt by the amount of
remaining data in the ciphertext region ('resid').
Reported by: gallatin
Reviewed by: cem, gallatin, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29517
Commit fdc9b2d50f replaced a couple of while loops with LIST_FOREACH()
loops. This patch factors the body of that loop out into a separate
function called nfscl_checkown().
This prepares the code for future changes to use a hash table of
lists for open searches via file handle.
This patch should not result in a semantics change.
MFC after: 2 weeks
There is no change in the source of the stats (t4_get_port_stats or
t4_get_vi_stats) but the per-port callout is gone.
Sponsored by: Chelsio Communications
Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D29527
ULE uses this topology to try and preserve locality when migrating
threads between CPUs and when performing work stealing. Ensure that on
NUMA systems it will at least take the NUMA topology into account.
Reviewed by: bdragon, jhibbits (previous version)
Tested by: bdragon
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28580
Provide a histogram output to check, if the hashsize or
bucketlimit could be optimized. Also add some basic sanity
checks around the accounting of the hash utilization.
MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29506
As accessing the tcp hostcache happens frequently on some
classes of servers, it was recommended to use atomic_add/subtract
rather than (per-CPU distributed) counters, which have to be
summed up at high cost to cache efficiency.
PR: 254333
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Reviewed By: #transport, tuexen, jtl
Differential Revision: https://reviews.freebsd.org/D29522
This fixes double IVHD_SETUP_INTR calls on the same IOMMU device.
Sponsored by: The FreeBSD Foundation
MFC with: 74ada297e8
Reported by: Oleg Ginzburg <olevole@olevole.ru>
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29521
This only works on single-CPU G4 systems, and more work is needed for
dual-CPU systems. That said, platform sleep does not work, and this is
currently only used for PMU-based CPU speed change.
The elimination of the platform_smp_timebase_sync() call is so that the
timebase sync rendezvous can be enhanced to perform better
synchronization, which requires a full rendezvous. This would be
impossible to do on this single-threaded run.
Rename cpu_sleep() to mpc745x_sleep() to denote what it's actually
intended for. This function is very G4-specific, and will not work on
any other CPU. This will afterward eliminate a
platform_smp_timebase_sync() call by directly updating the timebase
instead.
Addressing the underlying root cause for cache_count to
show unexpectedly high values, by protecting all arithmetic on
that global variable by using counter(9).
PR: 254333
Reviewed By: tuexen, #transport
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29510
This fixes a panic due to stale so->so_proto if t4_tom is unloaded and
one or more connections that were previously offloaded are still around
in TIME_WAIT state.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29503
Explicitly drain the sbuf after completing each hash bucket
to minimize the work performed while holding the hash
bucket lock.
PR: 254333
MFC after: 2 weeks
Reviewed By: tuexen, jhb, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29483
While exporting large amounts of data to a sysctl
request, datastructures may need to be locked.
Exporting the sbuf_drain function allows the
coordination between drain events and held
locks, to avoid stalls.
PR: 254333
Reviewed By: jhb
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29481
Otherwise it breaks when offloading like checksum or TSO are used,
because second (encapsulated) ip_output() processing passes fragments of
the encapsulated packet down to the hardware interface.
Diagnosed by: hselasky
Reviewed by: np
Sponsored by: Nvidia Networking / Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29501
In some settings offload might calculate hash from decapsulated packet.
Reserve a bit in packet header rsstype to indicate that.
Add m_adj_decap() that acts similarly to m_adj, but also either clear
flowid if it is not marked as inner, or transfer it to the decapsulated
header, clearing inner indicator. It depends on the internals of m_adj()
that reuses the argument packet header for the result.
Use m_adj_decap() for decapsulating vxlan(4) and gif(4) input packets.
Reviewed by: ae, hselasky, np
Sponsored by: Nvidia Networking / Mellanox Technologies
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28773
The POWER7 subword atomics were not using the correct instructions for
byte and halfword stores in the atomic_fcmpset code.
This only affects builds with custom CFLAGS that have explicitly enabled
ISA_206_ATOMICS.
--Eliminate a big ifdef that encompassed all currently-supported
architectures except mips and powerpc32. This applied to the case
in which we've allocated a superpage but the pager-populated range
is insufficient for a superpage mapping. For platforms that don't
support superpages the check should be inexpensive as we shouldn't
get a superpage in the first place. Make the normal-page fallback
logic identical for all platforms and provide a simple implementation
of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc.
--Apply the logic for handling pmap_enter() failure if a superpage
mapping can't be supported due to additional protection policy.
Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case,
and note Intel PKU on amd64 as the first example of such protection
policy.
Reviewed by: kib, markj, bdragon
Differential Revision: https://reviews.freebsd.org/D29439
ipi_msg_count is inaccessible outside this file and is never read.
It was introduced in the original SMP support code in r178628 and was never
actually used anywhere.
Remove it to slightly improve IPI performance.
Submitted by: jhibbits
MFC after: 1 week
The NFSv4.1 (and 4.2 on 13) server incorrectly binds
a new TCP connection to the back channel when first
used by an RPC with a Sequence op in it (almost all of them).
RFC5661 specifies that only the fore channel should be bound.
This was done because early clients (including FreeBSD)
did not do the required BindConnectionToSession RPC.
Unfortunately, this breaks the Linux client when the
"nconnects" mount option is used, since the server
may do a callback on the incorrect TCP connection.
This patch converts the server behaviour to that
required by the RFC. It also makes the server test/indicate
failure of the back channel more aggressively.
Until this patch is applied to the server, the
"nconnects" mount option is not recommended for a Linux
NFSv4.1/4.2 client mount to the FreeBSD server.
Reported by: bcodding@redhat.com
Tested by: bcodding@redhat.com
PR: 254560
MFC after: 1 week
Now that superpages for HPT MMU has landed, finish implementation of
pmap_mincore by adding support for superpages.
Submitted by: Fernando Eckhardt Valle <fernando.valle@eldorado.org.br>
Reviewed by: bdragon, luporl
MFC after: 1 week
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D29230
This is required for small CCBs support, where we need to track
whether the CCB was allocated from an UMA zone or not. There are
no (intended) functional changes with the current source.
Reviewed By: imp
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D29484
There are now options for specifying the debug port via tunable
(hw.fdt.dbgport) and device tree (/chosen/freebsd-dbgpath), so this can
be usefully included in GENERIC.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29152
The two functions in linux/inetdevice.h are highly FreeBSD/ifnet
specific. This is a result of struct net_device being mapped to
struct ifnet.
The only known consumer of these functions are two files in the
ofed/infiniband code.
As a first step of cleaning up copy linux/inetdevice.h to
rdma/ib_addr_freebsd.h. (It stayed a separate file to preserve
copyright and license of the original file; otherwise it could be
merged into ib_addr.h where more EPOCH/vnet/.. are already used).
Slightly rename the function to not conflict with LinuxKPI
in the future.
Remove the three last, now unneeded includes of inetdevice.h and
zap linux/inetdevice.h to an empty header file with only the forward
include to netdevice.h remaining.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky, kib
X-D-R: D29366 (extracted as further cleanup)
Differential Revision: https://reviews.freebsd.org/D29434
The remote protocol allows for implementations to report more specific
reasons for the break in execution back to the client [1]. This is
entirely optional, so it is only implemented for amd64, arm64, and i386
at the moment.
[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Stop-Reply-Packets.html
Reviewed by: jhb
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 51
Differential Revision: https://reviews.freebsd.org/D29174
Handle the 'z' and 'Z' remote packets for manipulating hardware
watchpoints.
This could be expanded quite easily to support hardware or software
breakpoints as well.
https://sourceware.org/gdb/onlinedocs/gdb/Packets.html
Reviewed by: cem, markj
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 51
Differential Revision: https://reviews.freebsd.org/D29173
LLVM12 complains if you change the symbol binding:
error: mfs_root_end changed binding to STB_WEAK [-Werror,-Winline-asm]
error: mfs_root changed binding to STB_WEAK [-Werror,-Winline-asm]
We are inspecting PCBs of divert sockets under NET_EPOCH section,
but PCB could be already detached and we should check INP_FREED flag
when we took INP_RLOCK.
PR: 254478
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29420
On an INVARIANTS kernel on 32-bit Book-E, we were panicing when running
the libproc tests. This was caused by extra pv entries being generated
accidentally by the pmap icache invalidation code.
Use the same VA (i.e. 0) when freeing the temporary mapping, instead of
some arbitrary address within the zero page.
Failure to do this was causing kernel-side icache syncing to leak
PVE entries when invalidating icache for a non page-aligned address, which
would later result in pages erroneously showing up as mapped to vm_page.
This bug was introduced in r347354 in 2019.
Reviewed by: jhibbits (in irc)
Sponsored by: Tag1 Consulting, Inc.