Previously the body of ktls_tick was a nop when NIC TLS was disabled,
but the callout was still scheduled consuming power on otherwise-idle
systems with Chelsio T6 adapters. Now the callout only runs while NIC
TLS is enabled on at least one interface of an adapter.
Reported by: mav
Reviewed by: np, mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D32491
As Radix MMU with superpages enabled is now stable, make it the
default choice on supported hardware (POWER9 and above), since its
performance is greater than that of HPT MMU.
Reviewed by: alfredo, jhibbits
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D30797
Reduce traffic to doorbell register when processing multiple completion
events at once. Only write it at the end of the loop after we've
processed everything (assuming we found at least one completion,
even if that completion wasn't valid).
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D32470
Current implementation of Radix MMU doesn't support mapping
arbitrary virtual addresses, such as the ones generated by
"direct mapping" I/O addresses. This caused the system to hang, when
early I/O addresses, such as those used by OpenFirmware Frame Buffer,
were remapped after the MMU was up.
To avoid having to modify mmu_radix_kenter_attr just to support this
use case, this change makes early I/O map use virtual addresses from
KVA area instead (similar to what mmu_radix_mapdev_attr does), as
these can be safely remapped later.
Reviewed by: alfredo (earlier version), jhibbits (in irc)
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31232
"rm-style" system calls such as kern_frmdirat() and kern_funlinkat()
don't supply SAVENAME to preserve the pathname buffer for subsequent
vnode ops. For unionfs this poses an issue because the pathname may
be needed for a relookup operation in unionfs_remove()/unionfs_rmdir().
Currently unionfs doesn't check for this case, leading to a panic on
DIAGNOSTIC kernels and use-after-free of cn_nameptr otherwise.
The unionfs node's stored buffer would suffice as a replacement for
cnp->cn_nameptr in some (but not all) cases, but it's cleaner to just
ensure that unionfs vnode ops always have a valid cn_nameptr by setting
SAVENAME in unionfs_lookup().
While here, do some light cleanup in unionfs_lookup() and assert that
HASBUF is always present in the relevant relookup calls.
Reported by: pho
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D32148
Without this change, unmounting smbfs filesystems with an INVARIANTS
kernel would panic after 10e64782ed59727e8c9fe4a5c7e17f497903c8eb.
Found by: markj
Reviewed by: markj, jhb
Obtained from: CheriBSD
MFC after: 3 days
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D32492
Without this patch, if a pNFS read layout has already been acquired
for a file, writes would be redirected to the Metadata Server (MDS),
because nfscl_getlayout() would not acquire a read/write layout for
the file. This happened because there was no "mode" argument to
nfscl_getlayout() to indicate whether reading or writing was being done.
Since doing I/O through the Metadata Server is not encouraged for some
pNFS servers, it is preferable to get a read/write layout for writes
instead of redirecting the write to the MDS.
This patch adds a access mode argument to nfscl_getlayout() and
nfsrpc_getlayout(), so that nfscl_getlayout() knows to acquire a read/write
layout for writing, even if a read layout has already been acquired.
This patch only affects NFSv4.1/4.2 client behaviour when pNFS ("pnfs" mount
option against a server that supports pNFS) is in use.
This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.
MFC after: 2 week
TLS 1.0 records are encrypted as one continuous CBC chain where the
last block of the previous record is used as the IV for the next
record. As a result, TLS 1.0 records cannot be encrypted out of order
but must be encrypted as a FIFO.
If the later pages of a sendfile(2) request complete before the first
pages, then TLS records can be encrypted out of order. For TLS 1.1
and later this is fine, but this can break for TLS 1.0.
To cope, add a queue in each TLS session to hold TLS records that
contain valid unencrypted data but are waiting for an earlier TLS
record to be encrypted first.
- In ktls_enqueue(), check if a TLS record being queued is the next
record expected for a TLS 1.0 session. If not, it is placed in
sorted order in the pending_records queue in the TLS session.
If it is the next expected record, queue it for SW encryption like
normal. In addition, check if this new record (really a potential
batch of records) was holding up any previously queued records in
the pending_records queue. Any of those records that are now in
order are also placed on the queue for SW encryption.
- In ktls_destroy(), free any TLS records on the pending_records
queue. These mbufs are marked M_NOTREADY so were not freed when the
socket buffer was purged in sbdestroy(). Instead, they must be
freed explicitly.
Reviewed by: gallatin, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D32381
An IPv4 address is embedded into an ifaddr which is freed
via epoch. And the in_ifaddrhead is already a CK list. Use
the network epoch to protect against use after free.
Next step would be to CK-ify the in_addr hash and get rid of the...
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D32434
To implement -o emptydir, vfs_emptydir() checks that the passed
directory is empty. This should be done after checking whether the
vnode is of type VDIR, though, or vfs_emptydir() may end up calling
VOP_READDIR on a non-directory.
Reported by: syzbot+4006732c69fb0f792b2c@syzkaller.appspotmail.com
Reviewed by: kib, imp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32475
Similar to the existing functions for strings and ints, this lets us
simplify some of the nvlist conversion code.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Without this patch, it is possible to hang the NFSv4 client,
when a rename/remove is being done on a file where the client
holds a delegation, if pNFS is being used. For a delegation
to be returned, dirty data blocks must be flushed to the NFSv4
server. When pNFS is in use, a shared lock on the clientID
must be acquired while doing a write to the DS(s).
However, if rename/remove is doing the delegation return
an exclusive lock will be acquired on the clientID, preventing
the write to the DS(s) from acquiring a shared lock on the clientID.
This patch stops rename/remove from doing a delegation return
if pNFS is enabled. Since doing delegation return in the same
compound as rename/remove is only an optimization, not doing
so should not cause problems.
This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.
MFC after: 1 week
Create the initial pool of kprocs on demand when the first socket AIO
request is submitted instead. The pool of kprocs used for other AIO
requests is similarly created on first use.
Reviewed by: asomers
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D32468
This partially reverts e81e77c5a055, leaving the option both in
GENERICs on amd64/arm64/arm, and in global NOTES file. Apparently
this better matches existing practice, where we do not try to hard
to make LINT and GENERIC complimentary.
Requested and reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
There is no need to read the domain on arm64 when there is only one
in the ACPI tables. This can also happen when the table is missing
as it is unneeded.
Reported by: dch
Sponsored by: The FreeBSD Foundation
This gives the vfs layer a chance to provide handling for EVFILT_VNODE,
for instance. Change pipe_specops to use the default vop_kqfilter to
accommodate fifoops that don't specify the method (i.e. all in-tree).
Based on a patch by Jan Kokemüller.
PR: 225934
Reviewed by: kib, markj (both pre-KASSERT)
Differential Revision: https://reviews.freebsd.org/D32271
Without this patch, it is possible for a process doing an NFSv4
Open/create of a file to block to allow another process
to acquire the exclusive lock on the clientID when holding
a shared lock on the clientID. As such, both processes
deadlock, with one wanting the exclusive lock, while the
other holds the shared lock. This deadlock is unlikely to occur
unless delegations are in use on the NFSv4 mount.
This patch fixes the problem by not deferring to the process
waiting for the exclusive lock when a shared lock (reference cnt)
is already held by the process.
This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.
MFC after: 1 week
sys/sysctl.h moved struct thread forward declaration under #ifdef
_KERNEL and so this header fails when included from userland. Add a
forward declaration here.
Fixes: 99eefc727eba
Sponsored by: Netflix
sys/sysctl.h only needs u_int and size_t from sys/types.h. When the
sysctl interface was designed, having one more more prerequisites
(especially sys/types.h) was the norm. Times have changed, and to make
things more portable, make sys/types.h optional. We do this by including
sys/_types.h, defining size_t if needed, and changing u_int to 'unsigned
int' in a prototype for userland builds. For kernel builds, sys/types.h
is still required.
Sponsored by: Netflix
Reviewed by: kib, jhb
Differential Revision: https://reviews.freebsd.org/D31827
These calls do operate on vnodes only, not file contents.
This is useful for e.g. the xdg-document-portal fuse filesystem.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32438
No functional change intended, but noticed that we could add const here
while adding linuxkpi support for virtio.
Reviewed By: bryanv, imp
Differential Revision: https://reviews.freebsd.org/D32370
I got a compilation failure in virtio-gpu without this change.
Reviewed By: #linuxkpi, manu, bz, hselasky
Differential Revision: https://reviews.freebsd.org/D32366
To minimise NUMA traffic allocate the pcpu, dpcpu, and boot stacks in
the correct domain when possible.
Submitted by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32338
When changing page table properties there is no need to demote a
level 1 or level 2 block if we are changing the entire memory range the
block is mapping. In this case just change the block directly.
Reported by: alc, kib, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32339
Support changing the protection of preloaded kernel modules by
implementing pmap_change_prot on arm64 and calling it from
preload_protect.
Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32026
Some exported file systems, such as ZFS ones, cannot do VOP_ALLOCATE().
Since an NFSv4.2 server must either support the Allocate operation for
all file systems or not support it at all, define a sysctl called
vfs.nfsd.enable_v42allocate to enable the Allocate operation.
This sysctl is false by default and can only be set true if all
exported file systems (or all DSs for a pNFS server) can perform
VOP_ALLOCATE().
Unfortunately, there is no way to know if a ZFS file system will
be exported once the nfsd is operational, even if there are none
exported when the nfsd is started up, so enabling Allocate must
be done manually for a server configuration.
This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.
MFC after: 2 weeks
Without this patch, nfs_allocate() fell back on using vop_stdallocate()
for NFS mounts without Allocate operation support. This was incorrect,
since some file systems, such as ZFS, cannot do allocate via
vop_stdallocate(), which uses writes to try and allocate blocks.
Also, fix nfs_allocate() to return EINVAL when mounts cannot do Allocate,
since that is the correct error for posix_fallocate(2).
Note that Allocate is only supported by some NFSv4.2 servers.
MFC after: 2 weeks
Remove the option from NOTES/LINT, and add to NOTES for powerpc and
riscv.
PR: 259036
Requested by: John Hay <john@sanren.ac.za>
Discussed with: ian, imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Summary:
An error mapping PCI resources results in a panic due to unallocated
resources being freed up. This change puts the appropriate checks in
place to prevent the panic.
PR: 252445
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
Tested by: marcus
MFC after: 1 week
Sponsored by: VMware
Test Plan:
Along with user testing, also simulated error by inserting a ENXIO
return in vmci_map_bars().
Reviewed by: marcus
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D32016
Tracking the number of unused holes in the trie and the range table
was a bad metric based on which full trie and / or range rebuilds
were triggered, which would happen in vain by far too frequently,
particularly with live BGP feeds.
Instead, track the total unused space inside the trie and range table
structures, and trigger rebuilds if the percentage of unused space
exceeds a sysctl-tunable threshold.
MFC after: 3 days
PR: 257965
These platforms don't manage resources for DMA request lines or I/O
ports, this is specific to x86. Remove the references from the comments.
Reviewed by: imp, jhb
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32358
In 526370fb85db4b659cff4625eb2f379acaa4a1a8 "net80211: proper ssid
length check in setmlme_assoc_adhoc()" we are checking the
sizeof on an array function parameter which leads to a warning that
it will resturn the size of the type of the array rather than the
array size itself. Use the defined length used both in the ioctl
and the sizing of the array function parameter instead.
Reported by: CI
MFC after: 3 days
X-MFC with: 526370fb85db4b659cff4625eb2f379acaa4a1a8