139481 Commits

Author SHA1 Message Date
John Baldwin
ef3f98ae47 cxgbe: Only run ktls_tick when NIC TLS is enabled.
Previously the body of ktls_tick was a nop when NIC TLS was disabled,
but the callout was still scheduled consuming power on otherwise-idle
systems with Chelsio T6 adapters.  Now the callout only runs while NIC
TLS is enabled on at least one interface of an adapter.

Reported by:	mav
Reviewed by:	np, mav
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32491
2021-10-14 10:59:16 -07:00
Leandro Lupori
8ecf9a8bab powerpc64: make radix with superpages default
As Radix MMU with superpages enabled is now stable, make it the
default choice on supported hardware (POWER9 and above), since its
performance is greater than that of HPT MMU.

Reviewed by:		alfredo, jhibbits
Sponsored by:		Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D30797
2021-10-14 13:13:27 -03:00
Warner Losh
2ec165e3f0 nvme: Reduce traffic to the doorbell register
Reduce traffic to doorbell register when processing multiple completion
events at once. Only write it at the end of the loop after we've
processed everything (assuming we found at least one completion,
even if that completion wasn't valid).

Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D32470
2021-10-14 08:44:37 -06:00
Leandro Lupori
76384bd10f powerpc64: fix OFWFB with Radix MMU
Current implementation of Radix MMU doesn't support mapping
arbitrary virtual addresses, such as the ones generated by
"direct mapping" I/O addresses. This caused the system to hang, when
early I/O addresses, such as those used by OpenFirmware Frame Buffer,
were remapped after the MMU was up.

To avoid having to modify mmu_radix_kenter_attr just to support this
use case, this change makes early I/O map use virtual addresses from
KVA area instead (similar to what mmu_radix_mapdev_attr does), as
these can be safely remapped later.

Reviewed by:		alfredo (earlier version), jhibbits (in irc)
MFC after:		2 weeks
Sponsored by:		Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D31232
2021-10-14 10:39:52 -03:00
Gordon Bergling
0a8159d8ca ng_ppp(4): Fix a typo in a comment
- s/delcared/declared/

MFC after:	3 days
2021-10-14 15:30:32 +02:00
Jason A. Harmening
152c35ee4f unionfs: Ensure SAVENAME is set for unionfs vnode operations
"rm-style" system calls such as kern_frmdirat() and kern_funlinkat()
don't supply SAVENAME to preserve the pathname buffer for subsequent
vnode ops.  For unionfs this poses an issue because the pathname may
be needed for a relookup operation in unionfs_remove()/unionfs_rmdir().
Currently unionfs doesn't check for this case, leading to a panic on
DIAGNOSTIC kernels and use-after-free of cn_nameptr otherwise.

The unionfs node's stored buffer would suffice as a replacement for
cnp->cn_nameptr in some (but not all) cases, but it's cleaner to just
ensure that unionfs vnode ops always have a valid cn_nameptr by setting
SAVENAME in unionfs_lookup().

While here, do some light cleanup in unionfs_lookup() and assert that
HASBUF is always present in the relevant relookup calls.

Reported by:	pho
Reviewed by:	markj
Differential Revision: https://reviews.freebsd.org/D32148
2021-10-13 19:25:31 -07:00
Brooks Davis
04c91ac48a selsocket: handle sopoll() errors correctly
Without this change, unmounting smbfs filesystems with an INVARIANTS
kernel would panic after 10e64782ed59727e8c9fe4a5c7e17f497903c8eb.

Found by:	markj
Reviewed by:	markj, jhb
Obtained from:	CheriBSD
MFC after:	3 days
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D32492
2021-10-14 00:43:48 +01:00
Rick Macklem
24af0fcdfc nfscl: Make nfscl_getlayout() acquire the correct pNFS layout
Without this patch, if a pNFS read layout has already been acquired
for a file, writes would be redirected to the Metadata Server (MDS),
because nfscl_getlayout() would not acquire a read/write layout for
the file.  This happened because there was no "mode" argument to
nfscl_getlayout() to indicate whether reading or writing was being done.
Since doing I/O through the Metadata Server is not encouraged for some
pNFS servers, it is preferable to get a read/write layout for writes
instead of redirecting the write to the MDS.

This patch adds a access mode argument to nfscl_getlayout() and
nfsrpc_getlayout(), so that nfscl_getlayout() knows to acquire a read/write
layout for writing, even if a read layout has already been acquired.
This patch only affects NFSv4.1/4.2 client behaviour when pNFS ("pnfs" mount
option against a server that supports pNFS) is in use.

This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.

MFC after:	2 week
2021-10-13 15:48:54 -07:00
John Baldwin
9f03d2c001 ktls: Ensure FIFO encryption order for TLS 1.0.
TLS 1.0 records are encrypted as one continuous CBC chain where the
last block of the previous record is used as the IV for the next
record.  As a result, TLS 1.0 records cannot be encrypted out of order
but must be encrypted as a FIFO.

If the later pages of a sendfile(2) request complete before the first
pages, then TLS records can be encrypted out of order.  For TLS 1.1
and later this is fine, but this can break for TLS 1.0.

To cope, add a queue in each TLS session to hold TLS records that
contain valid unencrypted data but are waiting for an earlier TLS
record to be encrypted first.

- In ktls_enqueue(), check if a TLS record being queued is the next
  record expected for a TLS 1.0 session.  If not, it is placed in
  sorted order in the pending_records queue in the TLS session.

  If it is the next expected record, queue it for SW encryption like
  normal.  In addition, check if this new record (really a potential
  batch of records) was holding up any previously queued records in
  the pending_records queue.  Any of those records that are now in
  order are also placed on the queue for SW encryption.

- In ktls_destroy(), free any TLS records on the pending_records
  queue.  These mbufs are marked M_NOTREADY so were not freed when the
  socket buffer was purged in sbdestroy().  Instead, they must be
  freed explicitly.

Reviewed by:	gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D32381
2021-10-13 12:30:15 -07:00
John Baldwin
a63752cce6 ktls: Reject attempts to enable AES-CBC with TLS 1.3.
AES-CBC cipher suites are not supported in TLS 1.3.

Reported by:	syzbot+ab501c50033ec01d53c6@syzkaller.appspotmail.com
Reviewed by:	tuexen, markj
Differential Revision:	https://reviews.freebsd.org/D32404
2021-10-13 12:12:58 -07:00
Gleb Smirnoff
2144431c11 Remove in_ifaddr_lock acquisiton to access in_ifaddrhead.
An IPv4 address is embedded into an ifaddr which is freed
via epoch. And the in_ifaddrhead is already a CK list. Use
the network epoch to protect against use after free.

Next step would be to CK-ify the in_addr hash and get rid of the...

Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D32434
2021-10-13 10:04:46 -07:00
Mark Johnston
03d5820f73 mount: Check for !VDIR mount points before handling -o emptydir
To implement -o emptydir, vfs_emptydir() checks that the passed
directory is empty.  This should be done after checking whether the
vnode is of type VDIR, though, or vfs_emptydir() may end up calling
VOP_READDIR on a non-directory.

Reported by:	syzbot+4006732c69fb0f792b2c@syzkaller.appspotmail.com
Reviewed by:	kib, imp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32475
2021-10-13 09:33:35 -04:00
Kristof Provost
776df104fa pf: Introduce pf_nvbool()
Similar to the existing functions for strings and ints, this lets us
simplify some of the nvlist conversion code.

MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-10-13 12:01:09 +02:00
Hartmut Brandt
ded77e0237 Allow the BPF to be select for write. This is needed for boost:asio
which otherwise fails to handle BPFs.
Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D31967
2021-10-10 17:03:51 +02:00
Rick Macklem
b82168e657 nfscl: Fix another deadlock related to the NFSv4 clientID lock
Without this patch, it is possible to hang the NFSv4 client,
when a rename/remove is being done on a file where the client
holds a delegation, if pNFS is being used.  For a delegation
to be returned, dirty data blocks must be flushed to the NFSv4
server.  When pNFS is in use, a shared lock on the clientID
must be acquired while doing a write to the DS(s).
However, if rename/remove is doing the delegation return
an exclusive lock will be acquired on the clientID, preventing
the write to the DS(s) from acquiring a shared lock on the clientID.

This patch stops rename/remove from doing a delegation return
if pNFS is enabled.  Since doing delegation return in the same
compound as rename/remove is only an optimization, not doing
so should not cause problems.

This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.

MFC after:	1 week
2021-10-12 17:21:01 -07:00
John Baldwin
d1b6fef075 Stop creating socket aio kprocs during boot.
Create the initial pool of kprocs on demand when the first socket AIO
request is submitted instead.  The pool of kprocs used for other AIO
requests is similarly created on first use.

Reviewed by:	asomers
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32468
2021-10-12 14:03:07 -07:00
Warner Losh
18dc12bfd2 nvme: Restore hotplug warning
Restore hotplug warning in recovery state machine. No functional change
other than what message gets printed.

Sponsored by:		Netflix
2021-10-12 14:26:54 -06:00
Konstantin Belousov
4cc167a352 Restore PPS_SYNC in NOTES
This partially reverts e81e77c5a055, leaving the option both in
GENERICs on amd64/arm64/arm, and in global NOTES file.  Apparently
this better matches existing practice, where we do not try to hard
to make LINT and GENERIC complimentary.

Requested and reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-12 23:10:35 +03:00
Ruslan Bukin
aeb76076c6 Prevent repeated deallocation of a resource.
Also deactivate resource if needed.

Discussed with: jrtc27
Differential Revision: https://reviews.freebsd.org/D32458
2021-10-12 20:13:44 +01:00
Andrew Turner
0906563718 Stop reading the arm64 domain when it's known
There is no need to read the domain on arm64 when there is only one
in the ACPI tables. This can also happen when the table is missing
as it is unneeded.

Reported by:	dch
Sponsored by:	The FreeBSD Foundation
2021-10-12 13:16:00 +01:00
Kyle Evans
7259ca3104 fifos: delegate unhandled kqueue filters to underlying filesystem
This gives the vfs layer a chance to provide handling for EVFILT_VNODE,
for instance.  Change pipe_specops to use the default vop_kqfilter to
accommodate fifoops that don't specify the method (i.e. all in-tree).

Based on a patch by Jan Kokemüller.

PR:		225934
Reviewed by:	kib, markj (both pre-KASSERT)
Differential Revision:	https://reviews.freebsd.org/D32271
2021-10-12 02:43:07 -05:00
Rick Macklem
120b20bdf4 nfscl: Fix a deadlock related to the NFSv4 clientID lock
Without this patch, it is possible for a process doing an NFSv4
Open/create of a file to block to allow another process
to acquire the exclusive lock on the clientID when holding
a shared lock on the clientID.  As such, both processes
deadlock, with one wanting the exclusive lock, while the
other holds the shared lock.  This deadlock is unlikely to occur
unless delegations are in use on the NFSv4 mount.

This patch fixes the problem by not deferring to the process
waiting for the exclusive lock when a shared lock (reference cnt)
is already held by the process.

This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.

MFC after:	1 week
2021-10-11 21:58:24 -07:00
Warner Losh
cdccd11b36 forward declare struct thread
sys/sysctl.h moved struct thread forward declaration under #ifdef
_KERNEL and so this header fails when included from userland. Add a
forward declaration here.

Fixes:	     		99eefc727eba
Sponsored by:		Netflix
2021-10-11 12:59:39 -06:00
Warner Losh
99eefc727e sysctl.h: Less namespace pollution
Remove unused struct ctlname. It is unused. Move struct thread to
inside #if _KERNEL.

Sponsored by:		Netflix
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D32457
2021-10-11 11:20:07 -06:00
Warner Losh
56ee5c551f sysctl: make sys/sysctl.h self contained
sys/sysctl.h only needs u_int and size_t from sys/types.h. When the
sysctl interface was designed, having one more more prerequisites
(especially sys/types.h) was the norm. Times have changed, and to make
things more portable, make sys/types.h optional. We do this by including
sys/_types.h, defining size_t if needed, and changing u_int to 'unsigned
int' in a prototype for userland builds. For kernel builds, sys/types.h
is still required.

Sponsored by:		Netflix
Reviewed by:		kib, jhb
Differential Revision:	https://reviews.freebsd.org/D31827
2021-10-11 11:20:07 -06:00
Greg V
98dae405de O_PATH: allow vfs_extattr syscalls
These calls do operate on vnodes only, not file contents.
This is useful for e.g. the xdg-document-portal fuse filesystem.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32438
2021-10-11 20:09:49 +03:00
Mateusz Guzik
2b68eb8e1d vfs: remove thread argument from VOP_STAT
and fo_stat.
2021-10-11 13:22:32 +00:00
Mateusz Guzik
b4a58fbf64 vfs: remove cn_thread
It is always curthread.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D32453
2021-10-11 13:21:47 +00:00
Alex Richardson
9017870541 Add missing const after 6c4f95161d6e
I accidentally didn't include hunk in the committed patch.

Fixes:		6c4f95161d6e ("virtio: make the write_config buffer argument const")
2021-10-11 13:20:56 +01:00
Alex Richardson
6c4f95161d virtio: make the write_config buffer argument const
No functional change intended, but noticed that we could add const here
while adding linuxkpi support for virtio.

Reviewed By:	bryanv, imp
Differential Revision: https://reviews.freebsd.org/D32370
2021-10-11 11:52:18 +01:00
Alex Richardson
d98f2712c7 linuxkpi: implement ida_alloc()
Needed for the virtio-gpu driver.

Reviewed By:	#linuxkpi, manu, bz, hselasky
Differential Revision: https://reviews.freebsd.org/D32366
2021-10-11 11:51:44 +01:00
Alex Richardson
6d15ccde4d linuxkpi: Allow BUILD_BUG_ON in if statements without braces
I got a compilation failure in virtio-gpu without this change.

Reviewed By:	#linuxkpi, manu, bz, hselasky
Differential Revision: https://reviews.freebsd.org/D32366
2021-10-11 11:51:44 +01:00
Alex Richardson
ff479cc6c9 linuxkpi: add PAGE_ALIGNED macro
Needed for the virtio-gpu driver.

Reviewed By:	#linuxkpi, manu, bz, hselasky
Differential Revision: https://reviews.freebsd.org/D32366
2021-10-11 11:51:43 +01:00
Alex Richardson
2686b10db4 linuxkpi: Add sg_init_one
Needed for the virtio-gpu driver.

Reviewed By:	#linuxkpi, manu, bz, hselasky
Differential Revision: https://reviews.freebsd.org/D32366
2021-10-11 11:51:43 +01:00
Andrew Turner
a90ebeb5fe Allocate arm64 per-CPU data in the correct domain
To minimise NUMA traffic allocate the pcpu, dpcpu, and boot stacks in
the correct domain when possible.

Submitted by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32338
2021-10-11 10:36:50 +01:00
Andrew Turner
806a88e742 Only demote when needed in the arm64 pmap_change_props_locked
When changing page table properties there is no need to demote a
level 1 or level 2 block if we are changing the entire memory range the
block is mapping. In this case just change the block directly.

Reported by:	alc, kib, markj
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32339
2021-10-11 10:29:44 +01:00
Andrew Turner
a85ce4ad72 Add pmap_change_prot on arm64
Support changing the protection of preloaded kernel modules by
implementing pmap_change_prot on arm64 and calling it from
preload_protect.

Reviewed by:	alc (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32026
2021-10-11 10:26:45 +01:00
Rick Macklem
dfe887b7d2 nfsd: Disable the NFSv4.2 Allocate operation by default
Some exported file systems, such as ZFS ones, cannot do VOP_ALLOCATE().
Since an NFSv4.2 server must either support the Allocate operation for
all file systems or not support it at all, define a sysctl called
vfs.nfsd.enable_v42allocate to enable the Allocate operation.
This sysctl is false by default and can only be set true if all
exported file systems (or all DSs for a pNFS server) can perform
VOP_ALLOCATE().

Unfortunately, there is no way to know if a ZFS file system will
be exported once the nfsd is operational, even if there are none
exported when the nfsd is started up, so enabling Allocate must
be done manually for a server configuration.

This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.

MFC after:	2 weeks
2021-10-10 18:46:02 -07:00
Rick Macklem
235891a127 nfscl: Fix NFS VOP_ALLOCATE for mounts without Allocate support
Without this patch, nfs_allocate() fell back on using vop_stdallocate()
for NFS mounts without Allocate operation support.  This was incorrect,
since some file systems, such as ZFS, cannot do allocate via
vop_stdallocate(), which uses writes to try and allocate blocks.

Also, fix nfs_allocate() to return EINVAL when mounts cannot do Allocate,
since that is the correct error for posix_fallocate(2).
Note that Allocate is only supported by some NFSv4.2 servers.

MFC after:	2 weeks
2021-10-10 14:27:52 -07:00
Konstantin Belousov
e81e77c5a0 Enable PPS_SYNC on amd64, arm64 and armv7
Remove the option from NOTES/LINT, and add to NOTES for powerpc and
riscv.

PR:	259036
Requested by:	John Hay <john@sanren.ac.za>
Discussed with:	ian, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-10 22:34:40 +03:00
Mateusz Guzik
93e0523499 vfs: add predicts to getvnode and getvnode_path 2021-10-10 18:24:29 +00:00
Mateusz Guzik
a0558fe90d Retire code added to support CloudABI
CloudABI was removed in cf0ee8738e31aa9e6fbf4dca4dac56d89226a71a
2021-10-10 18:24:29 +00:00
Mark Peek
0f14bcbe38 vmci: fix panic due to freeing unallocated resources
Summary:
An error mapping PCI resources results in a panic due to unallocated
resources being freed up. This change puts the appropriate checks in
place to prevent the panic.

PR:		252445
Reported by:	Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
Tested by:	marcus
MFC after:	1 week
Sponsored by:	VMware

Test Plan:
Along with user testing, also simulated error by inserting a ENXIO
return in vmci_map_bars().

Reviewed by:	marcus
Subscribers:	imp
Differential Revision: https://reviews.freebsd.org/D32016
2021-10-09 14:21:16 -07:00
Konstantin Belousov
5fb54d2fc8 readlinkat(2): allow O_PATH fd
PR:	258856
Reported by:	ashish
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32390
2021-10-09 22:31:37 +03:00
Mark Johnston
fa9da1f590 timecounter: Let kern.timecounter.stepwarnings be set as a tunable
MFC after:	1 week
2021-10-09 12:34:06 -04:00
Marko Zec
bc8b8e106b [fib_algo][dxr] Retire counters which are no longer used
The number of chunks can still be tracked via vmstat -z|fgrep dxr.

MFC after:	3 days
2021-10-09 13:47:10 +02:00
Marko Zec
1549575f22 [fib_algo][dxr] Improve incremental updating strategy
Tracking the number of unused holes in the trie and the range table
was a bad metric based on which full trie and / or range rebuilds
were triggered, which would happen in vain by far too frequently,
particularly with live BGP feeds.

Instead, track the total unused space inside the trie and range table
structures, and trigger rebuilds if the percentage of unused space
exceeds a sysctl-tunable threshold.

MFC after:	3 days
PR:		257965
2021-10-09 13:22:27 +02:00
Mitchell Horne
17f790f49f arm, arm64, riscv: adjust top-level nexus comment
These platforms don't manage resources for DMA request lines or I/O
ports, this is specific to x86. Remove the references from the comments.

Reviewed by:	imp, jhb
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32358
2021-10-08 14:16:32 -03:00
Kristof Provost
1c680e620b pf: do not copy anchor_wildcard / anchor_relative from userspace
We overwrite these fields again in pf_kanchor_setup() anyway.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-10-08 14:46:59 +02:00
Bjoern A. Zeeb
0525ece355 net80211: fix build for 526370fb85db4b659cff4625eb2f379acaa4a1a8
In 526370fb85db4b659cff4625eb2f379acaa4a1a8 "net80211: proper ssid
length check in setmlme_assoc_adhoc()" we are checking the
sizeof on an array function parameter which leads to a warning that
it will resturn the size of the type of the array rather than the
array size itself.  Use the defined length used both in the ioctl
and the sizing of the array function parameter instead.

Reported by:	CI
MFC after:	3 days
X-MFC with:	526370fb85db4b659cff4625eb2f379acaa4a1a8
2021-10-08 11:21:44 +00:00