pmap_enter(PMAP_ENTER_LARGEPAGE) may return KERN_PROTECTION_FAILURE due to
PKRU inconsistency. Handle it in the call place from vm_fault_populate(),
and in places which decode errors from vm_fault_populate()/
vm_fault_allocate().
Reviewed by: jah, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29442
Some filesystems assume that they can copy a name component, with length
bounded by NAME_MAX, into a dirent buffer of size MAXNAMLEN. These
constants have the same value; add a compile-time assertion to that
effect.
Reported by: Alexey Kulaev <alex.qart@gmail.com>
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29431
`struct weightened_nhop` has spare 32bit between the fields due to
the alignment (on amd64).
Not zeroing these spare bits results in duplicating nhop groups
in the kernel due to the way how comparison works.
MFC after: 1 day
Upstream commit message:
Support running FreeBSD buildworld on Arm-based macOS hosts
Arm-based Macs are like FreeBSD and provide a full 64-bit stat from the
start, so have no stat64 variants. Thus, define stat64 and fstat64 as
aliases for the normal versions.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Closes#11771
MFC after: 1 week
This avoids some atomics by using counter_u64 for TX and relying on
existing single-threading (single ithread per rxq) for RX.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29383
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues. Currently it only holds a work queue but will
include additional state in future changes.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29382
This change serves two purposes.
First, we take advantage of the compiler provided endian definitions to
eliminate some long-standing duplication between the different versions
of this header. __BYTE_ORDER__ has been defined since GCC 4.6, so there
is no need to rely on platform defaults or e.g. __MIPSEB__ to determine
endianness. A new common sub-header is added, but there should be no
changes to the visibility of these definitions.
Second, this eliminates the hand-rolled __bswapNN() routines, again in
favor of the compiler builtins. This was done already for x86 in
e6ff6154d2. The benefit here is that we no longer have to maintain our
own implementations on each arch, and can instead rely on the compiler
to emit appropriate instructions or libcalls, as available. This should
result in equivalent or better code generation. Notably 32-bit arm will
start using the `rev` instruction for these routines, which is available
on armv6+.
PR: 236920
Reviewed by: arichardson, imp
Tested by: bdragon (BE powerpc)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D29012
Remove unused #includes of LinuxKPI headers noticed while trying to
solve LinuxKPI struct net_device and related functions.
Neither netdevice.h nor inetdevice.h nor notifier.h seem to be needed.
This takes cxgbe(4) out of the picture of D29366.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: np
X-D-R: D29366 (extracted as further cleanup)
Differential Revision: https://reviews.freebsd.org/D29432
Remove unused #includes of a LinuxKPI header noticed while trying to
solve LinuxKPI struct net_device and related functions.
This takes qlnxr out of the picture of D29366.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
X-D-R: D29366 (extracted as further cleanup)
Remove linux/inetdevice.h as neither of the two inline functions there
are used here.
Sposored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky
X-D-R: D29366 (extracted as further cleanup)
Differential Revision: https://reviews.freebsd.org/D29428
Introduce struct netdev_notifier_info as a container to pass
net_device to the callback functions.
Adjust netdev_notifier_info_to_dev() to return the net_device field.
Add explicit casts from ifp to ni->dev even though currently
struct net_device is defined to struct ifnet. This is needed in
preparation for untangling this and improving the net_device compat
code.
Obtained-from: bz_iwlwifi
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29365
Add a net_ratelimit() compat implementation based on ppsratecheck().
Add a sysctl to allow tuning of the number of messages.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29399
When debugging leaked MSI/MSI-X vectors through LinuxKPI I found
the informational printf unhelpful. Rather than just stating we
leaked also tell how many MSI or MSI-X vectors we leak.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: jhb
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29394
net80211 changed a while back to support per-VAP config for things
rather than it being global. This is to support firmware NICs that
support per-VAP flags and configuration where the firmware will figure
out how to combine them.
However, it introduced a fun timing issue - those changes used to happen
to the shared ic state before newstate() was called, but now they're
also tasks and they can happen /after/.
This isn't a problem for ath(4), but it exposed some interesting
timing and config bugs here. Notably, I saw short slot NOT being
configured in 5GHz mode during some associations, so 5GHz stuff
would hang or behave poorly. Other times the follow-up auth has
the right config, so it didn't hang.
So for now, just flip this over to using the per-VAP flags which
are correct when newstate() is called. net80211 should also have
those flags synch'ed to the global ic state before newstate() runs
and that can come in a subsequent commit.
Whilst here also fix plcp to be consistently logged as a hex value.
Tested:
* iwn(4) Intel 6205, STA mode, both 2GHz and 5GHz
Differential Revision: https://reviews.freebsd.org/D29379
Reviewed by: bz
For filters which implement accf_create, the setsockopt(2) handler
caches the filter name in the socket, but it also incorrectly frees the
buffer containing the copy, leaving a dangling pointer. Note that no
accept filters provided in the base system are susceptible to this, as
they don't implement accf_create.
Reported by: Alexey Kulaev <alex.qart@gmail.com>
Discussed with: emaste
Security: kernel use-after-free
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
The hw.cxgbe.kern_tls tunable was used for this in the past and if it
was set then all T6 adapters would be configured for NIC TLS operation
and could not be reconfigured for TOE without a reload. With this
change ifconfig can be used to manipulate toe and txtls caps like any
other caps. hw.cxgbe.kern_tls continues to work as usual but its
effects are not permanent any more.
* Enable nic_ktls_ofld in the default configuration file and use the
firmware instead of direct register manipulation to apply/rollback
NIC TLS configuration. This allows the driver to switch the hardware
between TOE and NIC TLS mode in a safe manner. Note that the
configuration is adapter-wide and not per-port.
* Remove the kern_tls config file as it works with 100G T6 cards only
and leads to firmware crashes with 25G cards. The configurations
included with the driver (with the exception of the FPGA configs) are
supposed to work with all adapters.
Reported by: Veeresh U.K. at Chelsio
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D29291
PowerISA 2.07B says that the low-order p-12 bits of the real page number
contained in ARPN and LP fields of a PTE must be 0s and are ignored
by the hardware (Book III-S, 5.7.7.1), where 2^p is the actual page size
in bytes, but we were clearing only the LP field.
This worked on bare metal and QEMU with KVM, that ignore these bits,
but caused a kernel panic on QEMU with TCG, that expects them to be
cleared.
This fixes running FreeBSD with HPT superpages enabled on QEMU
with TCG.
MFC after: 2 weeks
Sponsored by: Eldorado Research Institute (eldorado.org.br)
The computed IPOIB_CM_RX_SG is too small. It doesn't account for fallback
to mbuf clusters when jumbo frames are not available and it also doesn't
account for the packet header and trailer mbuf.
This causes a memory overwrite situation when IPOIB_CM is configured.
While at it add a kernel assert to ensure the mapping array is not overwritten.
PR: 254474
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
In case with batch route delete via rib_walk_del(), when
some paths from the multipath route gets deleted, old
multipath group were not freed.
PR: 254496
Reported by: Zhenlei Huang <zlei.huang@gmail.com>
MFC after: 1 day
Similar to commit 3ead60236f ("Generalize bus_space(9) and atomic(9)
sanitizer interceptors"), use a more generic scheme for interposing
sanitizer implementations of routines like memcpy().
No functional change intended.
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
upper_32_bits() and lower_32_bits() are defined twice in this file.
With the extra conditinal removed on LinuxKPI in 3b1ecc9fa1
they are also included from there already. Use the LinuxKPI version
and remove the two local ones.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: hselasky
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29392
We are not aware of any out-of-tree consumers anymore
which would need KPI support for before Linux version 5.
Update the two in-tree consumers to use the new KPI.
This allows us to remove the extra version check and
will also give access to {lower,upper}_32_bits() unconditionally.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: hselasky, rlibby, rstone
MFC-after: 2 weeks
X-MFC: to 13 only
Differential Revision: https://reviews.freebsd.org/D29391
Add stubs for struct lockdep_map and three accessor functions
used by iwlwifi.
Obtained-from: bz_iwlwifi
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky, emaste
Differential Revision: https://reviews.freebsd.org/D29398
brcm80211 include pci_ids.h directly while historically we were tracking
IDs in pci.h. Move the current set of IDs from pci.h to pci_ids.h and
while here add IDs for Realtek and Broadcom as well as a network class
as needed by their wireless drivers.
We still include pci_ids.h from pci.h so this should not change anything.
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29400
Add various protocol IDs found in various wireless drivers.
Also add ETH_FRAME_LEN and struct ethhdr.
Obtained-from: bz_iwlwifi
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29397
Add ERFKILL and EBADE found in iwlwifi and brcmfmac wireless drivers.
While here add a comment above the block of error numbers above 500 to
document expectations.
Obtained-from: bz_iwlwifi
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky, emaste
Differential Revision: https://reviews.freebsd.org/D29396
In the notifier event callback function rather than casting directly
to the expected type use the proper accessor function as the mlx drivers
already do.
This is preparational work to allow us to improve the struct net_device
is struct ifnet compat code shortcut in the future.
Obtained-from: bz_iwlwifi
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29364
ieee80211_node.h uses LIST_HEAD() which LinuxKPI redefines and this
can lead to problems (see comment there). Make sure the net80211
header file is handled correctly by adding it to the list of files
to include before re-defining the macro.
Also add header files needed as dependencies.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: philip, hselasky
Differential Revision: https://reviews.freebsd.org/D29336
Both linux/random.h and net80211 have a function named
get_random_bytes(). With overlapping files included these collide.
Arguably the function could be renamed in linuxkpi but the generic
name should also not be used in net80211 so rename it there.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: philip, adrian
Differential Revision: https://reviews.freebsd.org/D29335
ipf_proxy_check() returns -1 for an error and 0 or 1 for success.
ipf_proxy_check()'s callers check for error and if the return code
is 0, they change it to 1 prior to returning to their callers. Simply
by returning -1 or 1 we reduce complexity and cycles burned changing
0 to 1.
MFC after: 1 week
documention.
Commit SVN r364219 / Git 8a0edc914f changed random(9) to be a shim around
prng32(9) and inadvertently caused random(9) to begin returning numbers in the
range [0,2^32-1] instead of [0,2^31-1], where the latter has been the documented
range for decades.
The increased output range has been identified as the source of numerous bugs in
code written against the historical output range e.g. ipfw "prob" rules and
stats(3) are known to be affected, and a non-exhaustive audit of the tree
identified other random(9) consumers which are also likely affected.
As random(9) is deprecated and slated for eventual removal in 14.0, consumers
should gradually be audited and migrated to prng(9).
Submitted by: Loic Prylli <lprylli@netflix.com>
Obtained from: Netflix
Reviewed by: cem, delphij, imp
MFC after: 1 day
MFC to: stable/13, releng/13.0
Differential Revision: https://reviews.freebsd.org/D29385
During a recent NFSv4 testing event a test server caused a hang
where "umount -N" failed. The renew thread was sleeping on "nfsv4lck"
and the "umount" was sleeping, waiting for the renew thread to
terminate.
This is the second of two patches that is hoped to fix the renew thread
so that it will terminate when "umount -N" is done on the mount.
This patch adds a 5second timeout on the msleep()s and checks for
the forced dismount flag so that the renew thread will
wake up and see the forced dismount flag. Normally a wakeup()
will occur in less than 5seconds, but if a premature return from
msleep() does occur, it will simply loop around and msleep() again.
The patch also adds the "mp" argument to nfsv4_lock() so that it
will return when the forced dismount flag is set.
While here, replace the nfsmsleep() wrapper that was used for portability
with the actual msleep() call.
MFC after: 2 weeks
Just like with the packet counters move the timekeeping information into
dn_cfg. This reduces the global name space use for dummynet and will
make subsequent work to add vnet support and re-use in pf easier.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Different Revision: https://reviews.freebsd.org/D29246
Move the packets counters into the dn_cfg struct. This reduces the
global name space use for dummynet and will make future work for things
like vnet support and re-use in pf easier.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29245
PR: 254419
Reviewed by: gallatin, kp
Tested by: Igor A. Valkov <viaprog@gmail.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29378
Only attempt to fetch the configuration data and connect the shared
ring once the frontend has switched to the 'Connected' state. This
seems to be inline with what Linux netback does, and is required to
make newer versions of NetBSD netfront work, since NetBSD only
publishes the required configuration before switching to the Connected
state.
MFC after: 1 week
Sponsored by: Citrix Systems R&D
Make it easy to define interceptors for new sanitizer runtimes, rather
than assuming KCSAN. Lay a bit of groundwork for KASAN and KMSAN.
When a sanitizer is compiled in, atomic(9) and bus_space(9) definitions
in atomic_san.h are used by default instead of the inline
implementations in the platform's atomic.h. These definitions are
implemented in the sanitizer runtime, which includes
machine/{atomic,bus}.h with SAN_RUNTIME defined to pull in the actual
implementations.
No functional change intended.
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
In both cases, too few frames were trimmed, leading to exception handling
or DTrace internals being exposed in stack traces exposed by D's stack()
primitive.
MFC after: 3 days
Reviewed by: emaste, andrew
This avoids mixing the use of two different enums which modern C
compilers warn about.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29301
Ensure that the stack does not generate a DSACK block for user
data received on a SYN segment in SYN-SENT state.
Reviewed by: rscheff
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29376
Sponsored by: Netflix, Inc.
This warning is very rarely useful (inline is a hint and not mandatory).
This flag results in many warnings being printed when compiling C++
code that uses the standard library with GCC.
This flag was originally added in back in r94332 but the flag is a no-op
in Clang ("This diagnostic flag exists for GCC compatibility, and has no
effect in Clang"). Removing it should make the GCC build output slightly
more readable.
Reviewed By: jrtc27, imp
Differential Revision: https://reviews.freebsd.org/D29235
Currently, AMD-vi PCI-e passthrough will lead to the following lines in
dmesg:
"kernel: CPU0: local APIC error 0x40
ivhd0: Error: completion failed tail:0x720, head:0x0."
After some tracing, the problem is due to the interaction with
amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the
identification of AMD-vi IVHD is done by walking over the ACPI IVRS
table and ivhdX device_ts are added under the acpi bus, while there are
no driver handling the corresponding IOMMU PCI function. In
amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX
device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is
called on ivhdX. the IOMMU pci function device_t is only used for
pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci
function, the IOMMU PCI function device_t's dinfo->cfg.msi is never
updated to reflect the supposed msi_data and msi_addr. So the msi_data
and msi_addr stay in the value 0. When pci_driver_added() tried to loop
over the children of a pci bus, and do pci_cfg_restore() on each of
them, msi_addr and msi_data with value 0 will be written to the MSI
capability of the IOMMU pci function, thus explaining the errors in
dmesg.
This change includes an amdiommu driver which currently does attaching,
detaching and providing DEVMETHODs for setting up and tearing down
interrupt. The purpose of the driver is to prevent pci_driver_added()
from calling pci_cfg_restore() on the IOMMU PCI function device_t.
The introduction of the amdiommu driver handles allocation of an IRQ
resource within the IOMMU PCI function, so that the dinfo->cfg.msi is
populated.
This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU.
Sponsored by: The FreeBSD Foundation
Reviewed by: jhb
Approved by: philip (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28984
* device_printf() is effectively a printf
* if_printf() is effectively a LOG_INFO
This allows subsystems to log device/netif stuff using different log levels,
rather than having to invent their own way to prefix unit/netif names.
Differential Revision: https://reviews.freebsd.org/D29320
Reviewed by: imp
e4b8deb222 removed the last in-tree uses of PCPU_INC(). Its
potential benefit is also practically nonexistent. Non-x86
platforms already implement it as PCPU_ADD(..., 1), and according
to [0] there are no recent x86 processors for which the 'inc'
instruction provides a performance benefit over the equivalent
memory-operand form of the 'add' instruction. The only remaining
benefit of 'inc' is smaller instruction size, which in this case
is inconsequential given the limited number of per-CPU data consumers.
[0]: https://www.agner.org/optimize/instruction_tables.pdf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29308
For guests running under some kind of VMMs, configuration structure is
available in memory space but not I/O space.
Reported by: Yuan Rui <number201724@me.com>
MFC after: 2 weeks
Reviewed by: rpokala, bryanv, jhb
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D28818
The MSI-X resource shouldn't be assumed to be always on BAR1.
The Virtio v1.1 Spec did not specify that MSI-X table and PBA BAR has to
be BAR1 either.
Reported by: Yuan Rui <number201724@me.com>
MFC after: 2 weeks
Reviewed by: bryanv, jhb
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D28817
kevans actually caught this in the original review and I fixed it, but
then I committed an older copy of the branch. Whoops.
Reported by: kevans
MFC after: 13 days
MFC with: 929acdb19a
Differential Revision: https://reviews.freebsd.org/D29031
During a recent NFSv4 testing event a test server caused a hang
where "umount -N" failed. The renew thread was sleeping on "nfsv4lck"
and the "umount" was sleeping, waiting for the renew thread to
terminate.
This is the first of two patches that is hoped to fix the renew thread
so that it will terminate when "umount -N" is done on the mount.
nfsv4_lock() checks for forced dismount, but only after it wakes up
from msleep(). Without this patch, a wakeup() call was required.
This patch adds a 1second timeout on the msleep(), so that it will
wake up and see the forced dismount flag. Normally a wakeup()
will occur in less than 1second, but if a premature return from
msleep() does occur, it will simply loop around and msleep() again.
While here, replace the nfsmsleep() wrapper that was used for portability
with the actual msleep() call and make the same change for nfsv4_getref().
MFC after: 2 weeks
While here, make sure only the PF driver attempts to program the global
RSS key (with options RSS). The VF driver doesn't have access to those
device registers.
MFC after: 1 week
Sponsored by: Chelsio Communications
This is a prerequisite to using these functions outside of ddb, but also
provides some cleanup and minor refactoring. This code is almost
entirely duplicated between the two implementations, the only
significant difference being the lack of dbreg synchronization on i386.
Cleanups are:
- demote some internal functions to static
- use the constant NDBREGS instead of a '4' literal
- remove K&R definitions
- some added comments
Reviewed by: kib, jhb
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29153
A repeat call will recreate the memory windows in the hardware and move
them to their last-known positions without repeating any of the software
initialization.
MFC after: 1 week
Sponsored by: Chelsio Communications
This file inherits some boilerplate and structure from the analogous
file in aesni(4), aesni_wrap.c. Note the derivation and the copyright
holders of that file.
For example, the AES-XTS bits added in 4979620ece were ported from
aesni(4).
Requested by: jmg
Reviewed by: imp, gnn
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29268
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.
We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.
[1]: https://github.com/freebsd/uefi-edk2/pull/9/
Originally by: scottph
MFC After: 1 week
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066
1) F_SETLKW (blocking) operations would be sent to the FUSE server as
F_SETLK (non-blocking).
2) Release operations, F_SETLK with lk_type = F_UNLCK, would simply
return EINVAL.
PR: 253500
Reported by: John Millikin <jmillikin@gmail.com>
MFC after: 2 weeks
MSG_CMSG_CLOEXEC has not been working since 2015 (SVN r284380) because
_finstall expects O_CLOEXEC and not UF_EXCLOSE as the flags argument.
This was probably not noticed because we don't have a test for this flag
so this commit adds one. I found this problem because one of the
libwayland tests was failing.
Fixes: ea31808c3b ("fd: move out actual fp installation to _finstall")
MFC after: 3 days
Reviewed By: mjg, kib
Differential Revision: https://reviews.freebsd.org/D29328
The decision whether a TCP packet is sent over IPv4 or IPv6 was
based on ethertype, which works correctly. In D27926 the criteria
was changed to checking if the CSUM_IP_TSO flag is set in the
csum-flags and then considering it to be TCP/IPv4.
However, the TCP stack sets the flag to CSUM_TSO for IPv4 and IPv6,
where CSUM_TSO is defined as CSUM_IP_TSO|CSUM_IP6_TSO.
Therefore TCP/IPv6 packets gets mis-classified as TCP/IPv4,
which breaks TSO for TCP/IPv6.
This patch bases the check again on the ethertype.
This fix will be MFC instantly as discussed with re(gjb).
MFC after: instantly
PR: 254366
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D29331
During a recent NFSv4 testing event a test server was replying
NFSERR_OLDSTATEID for layout stateids presented to the server
for LayoutReturn operations. Upon rereading RFC5661, it was
apparent that the FreeBSD NFSv4.1/4.2 pNFS client did not
maintain the seqid field of the layout stateid correctly.
This patch is believed to correct the problem. Tested against
a FreeBSD pNFS server with diagnostics added to check the stateid's
seqid did not indicate problems. Unfortunately, testing aginst
this server will not happen in the near future, so the fix may
not be correct yet.
MFC after: 2 weeks
The global list has a marker with an invariant that free vnodes are
placed somewhere past that. A caller which performs filtering (like ZFS)
can move said marker all the way to the end, across free vnodes which
don't match. Then a caller which does not perform filtering will fail to
find them. This makes vn_alloc_hard sleep for 1 second instead of
reclaiming, resulting in significant stalls.
Fix the problem by requiring an explicit marker by callers which do
filtering.
As a temporary measure extend vnlru_free to restart if it fails to
reclaim anything.
Big thanks go to the reporter for testing several iterations of the
patch.
Reported by: Yamagi <lists yamagi.org>
Tested by: Yamagi <lists yamagi.org>
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29324
Factor out ieee80211_probereq_ie() and ieee80211_probereq_ie_len()
and make the length dynamic rather than static max. The latter is
needed as our current fixed length was longer than some "hw scan",
e.g. that of ath10k, will take. This way we can pass what we have.
Should this not be sufficient in the future we might have to deal
with filtering and much more error handling.
This also removes a duplicate calculation for ieee80211_ie_wpa [1].
Repoprted-by: Martin Husemann <martin NetBSD.org> [1]
Sponsored-by: Rubicon Communications, LLC ("Netgate")
Sponsored-by: The FreeBSD Foundation (update for alloc)
Reviewed-by: adrian, martin NetBSD.org (earlier version)
Reviewed-by: philip
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26545
Add support for crc32_le() as a wrapper around crc32_raw().
Sponsored-by: The FreeBSD Foundation
Obtained-from: bz_iwlwifi
MFC-after: 2 weeks
Reviewed-by: hselasky
Differential Revision: https://reviews.freebsd.org/D29187
A lot of small arm64 gadgets are using 1500000 as console speed.
While cu can perfectly deal with this some 3rd party software, e.g.,
comms/conserver-con add speeds based on B<n> being defined.
Having it defined here simplifies enhancing other software.
Obtained-from: NetBSD sys/sys/termios.h 1.36
MFC-after: 2 weeks
Reviewed-by: philip (,okayed by imp)
Differential Revision: https://reviews.freebsd.org/D29209
TCP/IPv6 packets to be forwarded can be laid out with only the Ethernet
header in the first mbuf, and these packets are lost. There was a
previous hack to pullup ICMPv6 packets with such a layout for the
same reason. Generalize, and pullup any IPv6 packets with only the
Ethernet header in the first mbuf. Possibly this should also include
IPv4, but that situation has not been observed to fail.
PR: 254060
Reported by: denis at h3q.com
MFC after: 3 days
On FreeBSD/arm fill_fpregs, fill_dbregs are stubs that zero the reg
struct and return success. set_fpregs and set_dbregs do nothing and
return success.
Provide the same implementation for arm64 COMPAT_FREEBSD32.
Reviewed by: andrew
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29314
These are not stored in the trapframe so must be cleared explicitly.
This is similar to one of the MIPS changes in 822d2d6ac9.
Reviewed by: andrew
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D28711
When we request a bulk sync we need to ensure we actually send out that
request, not just buffer it until we have enough data to send a full
packet.
PR: 254236
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29271
In some cases like broken hardware nvme(4) may wait minutes for
controller response before timeout. Doing so in a tight spin loop
made whole system unresponsive.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29309
Sponsored by: iXsystems, Inc.
After length decisions, we've decided that the if_wg(4) driver and
related work is not yet ready to live in the tree. This driver has
larger security implications than many, and thus will be held to
more scrutiny than other drivers.
Please also see the related message sent to the freebsd-hackers@
and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on
2021/03/16, with the subject line "Removing WireGuard Support From Base"
for additional context.
These ioctl commands aim to provide easier ways for user space
applications to enumerate existing audio devices and the node they can
potentially use.
The exchange of device lists between user space and kernel is done on
nv(9). Some ioctl commands are added to /dev/sndstat node:
- SNDSTAT_REFRESH_DEVS
- SNDSTAT_GET_DEVS
- SNDSTAT_ADD_USER_DEVS
- SNDSTAT_FLUSH_USER_DEVS
Bump __FreeBSD_version to reflect the addition of the ioctls.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D26884
This is the only in-tree driver for the asymmetric crypto support in
OCF that is already marked deprecated for 14.
MFC after: 3 days
Sponsored by: Chelsio Communications
These files are no longer used by the FreeBSD base system. They were being used by the amd port but that has also been deleted.
Reviewed by: rmacklem
Sponsored by: Google
Differential Revision: https://reviews.freebsd.org/D29180
Definitions inside usr.sbin/bhyve/virtio.h are thrown away.
Definitions in sys/dev/virtio are used instead.
This reduces code duplication.
Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29084
stuct pf_pool and struct pf_kpool are different. We should not simply
bcopy() them.
Happily it turns out that their differences were all pointers, and the
userspace provided pointers were overwritten by the kernel, so this did
actually work correctly, but we should fix it anyway.
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29216
As follow-on work to e4b8deb222, move page table page
allocation and freeing into their own functions. Use these
functions to provide separate kernel vs. user page table page
accounting, and to wrap common tasks such as management of
zero-filled page state.
Requested by: markj, kib
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29151
The netmap_ioctl() function has a reference counting bug in case of
NETMAP_REQ_PORT_INFO_GET command. When `hdr->nr_name[0] == '\0'`,
the function does not decrease the refcount of "nmd", which is
increased by netmap_mem_find(), causing a refcount leak.
Reported by: Xiyu Yang <sherllyyang00@gmail.com>
Submitted by: Carl Smith <carl.smith@alliedtelesis.co.nz>
MFC after: 3 days
PR: 254311
This is x86-only and so should not be in the common area.
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential revision: https://reviews.freebsd.org/D29040
Fix compilation since machine/xen/xen-os.h is requiring definition
existing in xen/xen-os.h.
In general machine/xen/xen-os.h should never be included
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential revision: https://reviews.freebsd.org/D29043
This file got resynced with OpenBSD to pick up fixes that had taken
place after the version initially ported to FreeBSD. KASSERT there is
more like MPASS here.
Reported by: David Wolfskill <david@catwhisker.org>
The RSC support feature introduced a bit field "rm_internal" in
struct rndis_pktinfo with total size unchanged.
The guest does not use this field in the tx path. However we need to
initialize it to zero in case older hosts which are not aware of this
field.
Fixes: a491581f ("Hyper-V: hn: Enable vSwitch RSC support")
MFC after: 2 weeks
Sponsored by: Microsoft
This is the culmination of about a week of work from three developers to
fix a number of functional and security issues. This patch consists of
work done by the following folks:
- Jason A. Donenfeld <Jason@zx2c4.com>
- Matt Dunwoodie <ncon@noconroy.net>
- Kyle Evans <kevans@FreeBSD.org>
Notable changes include:
- Packets are now correctly staged for processing once the handshake has
completed, resulting in less packet loss in the interim.
- Various race conditions have been resolved, particularly w.r.t. socket
and packet lifetime (panics)
- Various tests have been added to assure correct functionality and
tooling conformance
- Many security issues have been addressed
- if_wg now maintains jail-friendly semantics: sockets are created in
the interface's home vnet so that it can act as the sole network
connection for a jail
- if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0
- if_wg now exports via ioctl a format that is future proof and
complete. It is additionally supported by the upstream
wireguard-tools (which we plan to merge in to base soon)
- if_wg now conforms to the WireGuard protocol and is more closely
aligned with security auditing guidelines
Note that the driver has been rebased away from using iflib. iflib
poses a number of challenges for a cloned device trying to operate in a
vnet that are non-trivial to solve and adds complexity to the
implementation for little gain.
The crypto implementation that was previously added to the tree was a
super complex integration of what previously appeared in an old out of
tree Linux module, which has been reduced to crypto.c containing simple
boring reference implementations. This is part of a near-to-mid term
goal to work with FreeBSD kernel crypto folks and take advantage of or
improve accelerated crypto already offered elsewhere.
There's additional test suite effort underway out-of-tree taking
advantage of the aforementioned jail-friendly semantics to test a number
of real-world topologies, based on netns.sh.
Also note that this is still a work in progress; work going further will
be much smaller in nature.
MFC after: 1 month (maybe)
This lets one interrupt DDB's output, which is useful if paging is
disabled and the output device is slow.
This follows a previous implementation in svn r311952 / git
5fddef7999 which was reverted because it
broke DDB type-ahead.
Now, try this again, but with a 512-byte type-ahead buffer. While there
is buffer space, control input is handled and non-control input is
buffered. When the buffer is exhausted, the default is to print a
warning and drop further non-control input in order to continue handling
control input. sysctl debug.ddb.prioritize_control_input can be set to
0 to instead preserve all input but lose immediate handling of control
input. This could for example effect pasting of a large script into the
ddb console.
Suggested by: Anton Rang <rang@acm.org>
Reviewed by: markj
Discussed with: imp
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28676
The actual type of kobjop_t is arbitrary, it is only used as a generic
function pointer type. Declare it as void (*)(void) in order to avoid
gcc's -Wcast-function-type, which is included in -Wextra.
Reviewed by: avg, jhb
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28769
Notable upstream pull request merges:
#11153 Scalable teardown lock for FreeBSD
#11651 Don't bomb out when using keylocation=file://
#11667 zvol: call zil_replaying() during replay
#11683 abd_get_offset_struct() may allocate new abd
#11693 Intentionally allow ZFS_READONLY in zfs_write
#11716 zpool import cachefile improvements
#11720 FreeBSD: Clean up zfsdev_close to match Linux
#11730 FreeBSD: bring back possibility to rewind the
checkpoint from bootloader
Obtained from: OpenZFS
MFC after: 2 weeks
Add parsing of the rewind options.
When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.
[1] FreeBSD repo: 277f38abff
[2] OpenZFS repo: f2c027bd6a003ec5793f8716e6189c389c60f47a
Originally reviewed by: tsoome, allanjude
Originally reviewed by: kevans (ok from high-level overview)
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
PR: 254152
Reported by: Zhenlei Huang <zlei.huang at gmail.com>
Obtained from: https://github.com/openzfs/zfs/pull/11730
TSFOOR happens if a beacon with a given TSF isn't received within the
programmed/expected TSF value, plus/minus a fudge range. (OOR == out of range.)
If this happens then it could be because the baseband/mac is stuck, or
the baseband is deaf. So, do a cold reset and resync the beacon to
try and unstick the hardware.
It also happens when a bad AP decides to err, slew its TSF because they
themselves are resetting and they don't preserve the TSF "well."
This has fixed a bunch of weird corner cases on my 2GHz AP radio upstairs
here where it occasionally goes deaf due to how much 2GHz noise is up
here (and ANI gets a little sideways) and this unsticks the station
VAP.
For AP modes a hung baseband/mac usually ends up as a stuck beacon
and those have been addressed for a long time by just resetting the
hardware. But similar hangs in station mode didn't have a similar
recovery mechanism.
Tested:
* AR9380, STA mode, 2GHz/5GHz
* AR9580, STA mode, 5GHz
* QCA9344 SoC w/ on-board wifi (TL-WDR4300/3600 devices); 2GHz
STA mode
Right now ts_antenna is either 0 or 1 in each supported HAL so
this is purely a sanity check.
Later on if I ever get magical free time I may add some extensions
for the NICs that can have slightly more complicated antenna switches
for transmit and I'd like this to not bust memory.
Implement a driver for the RTC embedded in the RK805/RK808 power
management system used for RK3328 and RK3399 SoCs.
Based on experiments on my RK808, setting the time doesn't alter the
internal/inaccessible sub-second counter, therefore there's no point
in calling clock_schedule().
Based on an earlier revision by andrew.
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D22692
Sponsored by: Google
MFC after: 1 week
Queue all XPT_ASYNC ccb's and run those in a new cam async thread. This thread
is allowed to sleep for things like memory. This should allow us to make all the
registration routines for cam periph drivers simpler since they can assume they
can always allocate memory. This is a separate thread so that any I/O that's
completed in xpt_done_td isn't held up.
This should fix the panics for WAITOK alloations that are elsewhere in the
storage stack that aren't so easy to convert to NOWAIT. Additional future work
will convert other allocations in the registration path to WAITOK should
detailed analysis show it to be safe.
Reviewed by: chs@, rpokala@
Differential Revision: https://reviews.freebsd.org/D29210
Completions for crypto requests on port 1 can sometimes return a stale
cookie value due to a firmware bug. Disable requests on port 1 by
default on affected firmware.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26581
These fixes are only relevant for requests on the second port. In
some cases, the crypto completion data, completion message, and
receive descriptor could be written in the wrong order.
- Add a separate rx_channel_id that is a copy of the port's rx_c_chan
and use it when an RX channel ID is required in crypto requests
instead of using the tx_channel_id.
- Set the correct rx_channel_id in the CPL_RX_PHYS_ADDR used to write
the crypto result.
- Set the FID to the first rx queue ID on the adapter rather than the
queue ID of the first rx queue for the port.
- While here, use tx_chan to set the tx_channel_id though this is
identical to the previous value.
Reviewed by: np
Reported by: Chelsio QA
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29175
We have seen several cases of processes which have become "stuck" in
kern_sigsuspend(). When this occurs, the kernel's td_sigblock_val
is set to 0x10 (one block outstanding) and the userspace copy of the
word is set to 0 (unblocked). Because the kernel's cached value
shows that signals are blocked, kern_sigsuspend() blocks almost all
signals, which means the process hangs indefinitely in sigsuspend().
It is not entirely clear what is causing this condition to occur.
However, it seems to make sense to add some protection against this
case by fetching the latest sigfastblock value from userspace for
syscalls which will sleep waiting for signals. Here, the change is
applied to kern_sigsuspend() and kern_sigtimedwait().
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29225
This permits these routines to use special logic for initializing MD
kthread state.
For the kproc case, this required moving the logic to set these flags
from kproc_create() into do_fork().
Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29207