18850 Commits

Author SHA1 Message Date
Konstantin Belousov
df8dd6025a amd64: stop using top of the thread' kernel stack for FPU user save area
Instead do one more allocation at the thread creation time.  This frees
a lot of space on the stack.

Also do not use alloca() for temporal storage in signal delivery sendsig()
function and signal return syscall sys_sigreturn().  This saves equal
amount of space, again by the cost of one more allocation at the thread
creation time.

A useful experiment now would be to reduce KSTACK_PAGES.

Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
2933a7ca03 aio_fsync_vnode: handle ERELOOKUP after VOP_FSYNC()
Reported by:	tmunro
Reviewed by:	jhb, tmunro
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32023
2021-09-20 21:40:17 +03:00
Konstantin Belousov
922bee44e4 aio_fsync_vnode: use for(;;) loop instead of label
Reviewed by:	jhb, tmunro
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32023
2021-09-20 21:39:46 +03:00
Bartlomiej Grzesik
3f9a00e3b5 device: add device_get_property and device_has_property
Generialize bus specific property accessors. Those functions allow driver code
to access device specific information.

Currently there is only support for FDT and ACPI buses.

Reviewed by: manu, mw
Sponsored by: Semihalf
Differential revision: https://reviews.freebsd.org/D31597
2021-09-20 17:17:57 +02:00
Mateusz Guzik
7b2ac8eb9b vfs: add missing VIRF_MOUNTPOINT in vfs_mountroot_shuffle
Reported by:	mav
2021-09-18 21:13:51 +02:00
Mateusz Guzik
0d9e99ce3b vfs: add the missing vnode interlock in vfs_mountroot_shuffle
Around v_mountedhere assignment.
2021-09-18 21:13:51 +02:00
Mark Johnston
50b07c1f71 unix: Fix a use-after-free in unp_drop()
We need to load the socket pointer after locking the PCB, otherwise
the socket may have been detached and freed by the time that unp_drop()
sets so_error.

This previously went unnoticed as the socket zone was _NOFREE.

Reported by:	pho
MFC after:	1 week
2021-09-18 10:38:39 -04:00
Mateusz Guzik
f902e4bb04 lockmgr: fix lock profiling of face adaptive spinning 2021-09-18 10:16:58 +00:00
Mateusz Guzik
a2cb65b8fe cache: count vnodes in cache_purgevfs 2021-09-18 10:16:50 +00:00
Mateusz Guzik
5d8e32a66c vfs: retire VNODE_REFCOUNT_FENCE_* macros
They are unused as of last year.
2021-09-18 10:16:00 +00:00
Mark Johnston
2bd9826995 vfs: Permit unix sockets to be opened with O_PATH
As with FIFOs, a path descriptor for a unix socket cannot be used with
kevent().

In principle connectat(2) and bindat(2) could be modified to support an
AT_EMPTY_PATH-like mode which operates on the socket referenced by an
O_PATH fd referencing a unix socket.  That would eliminate the path
length limit imposed by sockaddr_un.

Update O_PATH tests.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31970
2021-09-17 14:19:06 -04:00
Mark Johnston
ade1daa5c0 socket: Synchronize soshutdown() with listen(2) and AIO
To handle shutdown(SHUT_RD) we flush the receive buffer of the socket.
This may involve searching for control messages of type SCM_RIGHTS,
since we need to close the file references.  Closing arbitrary files
with socket buffer locks held is undesirable, mainly due to lock
ordering issues, so we instead make a copy of the socket buffer and
operate on that without any locks.  Fields in the original buffer are
cleared.

This behaviour clobbered the AIO job queue associated with a receive
buffer.  It could also cause us to leak a KTLS session reference.
Reorder socket buffer fields to address this.

An alternate solution would be to remove the hack in sorflush(), but
this is not quite feasible (yet).  In particular, though sorflush()
flags the sockbuf with SBS_CANTRCVMORE, it is possible for more data to
be queued - the flag just prevents userspace from reading more data.  I
suspect we should fix this; SBS_CANTRCVMORE represents a terminal state
and protocols can likely just drop any data destined for such a buffer.
Many of them already do, but in some cases the check is racy, and some
KPI churn will be needed to fix everything.  This approach is more
straightforward for now.

Reported by:	syzbot+104d8ee3430361cb2795@syzkaller.appspotmail.com
Reported by:	syzbot+5bd2e7d05f84a59d0d1b@syzkaller.appspotmail.com
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31976
2021-09-17 14:19:06 -04:00
Mark Johnston
883761f0a8 socket: Remove NOFREE from the socket zone
This flag was added during the transition away from the legacy zone
allocator, commit c897b81311792ccf6a93feff2a405e2ae53f664e.  The old
zone allocator effectively provided _NOFREE semantics, but it seems that
they are not required for sockets.  In particular, we use reference
counting to keep sockets live.

One somewhat dangerous case is sonewconn(), which returns a pointer to a
socket with reference count 0.  This socket is still effectively owned
by the listening socket.  Protocols must therefore be careful to
synchronize sonewconn() calls with their pru_close implementations,
since for listening sockets soclose() will abort the child sockets.  For
example, TCP holds the listening socket's PCB read locked across the
sonewconn() call, which blocks tcp_usr_close(), and sofree()
synchronizes with a concurrent soabort() of the nascent socket.
However, _NOFREE semantics are not required here.

Eliminating _NOFREE has several benefits: it enables use-after-free
detection (e.g., by KASAN) and lets the system reclaim memory from the
socket zone under memory pressure.  No functional change intended.

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31975
2021-09-17 14:19:06 -04:00
Mark Johnston
6b288408ca socket: Add assertions around naked refcount decrements
Sockets in a listen queue hold a reference to the parent listening
socket.  Several code paths release this reference manually when moving
a child socket out of the queue.

Replace comments about the expected post-decrement refcount value with
assertions.  Use refcount_load() instead of a plain load.  No functional
change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31974
2021-09-17 14:19:06 -04:00
Mark Johnston
dfcef87714 socket: Fix a use-after-free in soclose()
After releasing the fd reference to a socket "so", we should avoid
testing SOLISTENING(so) since the socket may have been freed.  Instead,
directly test whether the list of unaccepted sockets is empty.

Fixes:		f4bb1869ddd2 ("Consistently use the SOLISTENING() macro")
Pointy hat:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31973
2021-09-17 14:19:05 -04:00
Mark Johnston
bf25678226 ktls: Fix error/mode confusion in TCP_*TLS_MODE getsockopt handlers
ktls_get_(rx|tx)_mode() can return an errno value or a TLS mode, so
errors are effectively hidden.  Fix this by using a separate output
parameter.  Convert to the new socket buffer locking macros while here.

Note that the socket buffer lock is not needed to synchronize the
SOLISTENING check here, we can rely on the PCB lock.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31977
2021-09-17 14:19:05 -04:00
Mark Johnston
40fcdb9366 kcov: Disable address and memory sanitizers in get_kinfo()
get_kinfo() is only called from the coverage sanitizer callbacks, which
are similarly uninstrumented.

Sponsored by:	The FreeBSD Foundation
2021-09-17 14:19:05 -04:00
Konstantin Belousov
197a4f29f3 buffer pager: allow get_blksize method to return error
Reported and reviewed by:	asomers
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31998
2021-09-17 20:29:55 +03:00
Konstantin Belousov
796a8e1ad1 procctl(2): Add PROC_WXMAP_CTL/STATUS
It allows to override kern.elf{32,64}.allow_wx on per-process basis.
In particular, it makes it possible to run binaries without PT_GNU_STACK
and without elfctl note while allow_wx = 0.

Reviewed by:	brooks, emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31779
2021-09-17 15:42:01 +03:00
Konstantin Belousov
f575573ca5 Remove PT_GET_SC_ARGS_ALL
Reimplement bdf0f24bb16d556a5b by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.

Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D31968
2021-09-16 20:11:27 +03:00
Piotr Pawel Stefaniak
6e8272f317 mount: improve error message for invalid filesystem names
For an invalid filesystem name used like this:
mount -t asdfs /dev/ada1p5 /usr/obj

emit an error message like this:
mount: /dev/ada1p5: Invalid fstype: Invalid argument

instead of:
mount: /dev/ada1p5: Operation not supported by device

Differential Revision:	https://reviews.freebsd.org/D31540
2021-09-15 16:25:31 +02:00
Edward Tomasz Napierala
bdf0f24bb1 linux: implement PTRACE_GET_SYSCALL_INFO
This is one of the pieces required to make modern (ie Focal)
strace(1) work.

Reviewed By:	jhb (earlier version)
Sponsored by:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D28212
2021-09-14 20:19:55 +00:00
John Baldwin
c782ea8bb5 Add a switch structure for send tags.
Move the type and function pointers for operations on existing send
tags (modify, query, next, free) out of 'struct ifnet' and into a new
'struct if_snd_tag_sw'.  A pointer to this structure is added to the
generic part of send tags and is initialized by m_snd_tag_init()
(which now accepts a switch structure as a new argument in place of
the type).

Previously, device driver ifnet methods switched on the type to call
type-specific functions.  Now, those type-specific functions are saved
in the switch structure and invoked directly.  In addition, this more
gracefully permits multiple implementations of the same tag within a
driver.  In particular, NIC TLS for future Chelsio adapters will use a
different implementation than the existing NIC TLS support for T6
adapters.

Reviewed by:	gallatin, hselasky, kib (older version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31572
2021-09-14 11:43:41 -07:00
Mark Johnston
cf4670fe0b kcov: Integrate with KMSAN
- kern_kcov.c needs to be compiled with -fsanitize=kernel-memory when
  KMSAN is configured since it calls into various other subsystems.
- Disable address and memory sanitizers in kcov(4)'s coverage sanitizer
  callbacks, as they do not provide useful checking.  Moreover, with
  KMSAN we may otherwise get false positives since the caller (coverage
  sanitizer runtime) is not instrumented.
- Disable KASAN and KMSAN interceptors in subr_coverage.c, as they do
  not provide any benefit but do introduce overhead when fuzzing.

Sponsored by:	The FreeBSD Foundation
2021-09-14 14:29:27 -04:00
Mark Johnston
fa0463c384 socket: De-duplicate SBLOCKWAIT() definitions
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-14 09:01:32 -04:00
Wojciech Macek
6fa041d7f1 Measure latency of PMC interruptions
Add HWPMC events to measure latency.
Provide sysctl to choose the number of outstanding events which
trigger HWPMC event.

Obtained from:		Semihalf
Sponsored by:		Stormshield
Differential revision:	https://reviews.freebsd.org/D31283
2021-09-13 06:08:32 +02:00
Mark Johnston
b864b67a0d socket: Do not include control messages in FIONREAD return value
Some system software expects to be able to read at least the number of
bytes returned by FIONREAD.  When control messages are counted in this
return value, this assumption is violated.  Follow Linux and OpenBSD
here (as well as our own kevent(EVFILT_READ)) and only return the number
of data bytes available.

Reported by:	avg
MFC after:	2 weeks
2021-09-12 16:39:44 -04:00
Mark Johnston
2884918c73 aio: Fix up the opcode in aiocb32_copyin()
With lio_listio(2), the opcode is specified by userspace rather than
being hard-coded by the system call (e.g., aio_readv() -> LIO_READV).
kern_lio_listio() calls aio_aqueue() with an opcode of LIO_NOP, which
gets fixed up when the aiocb is copied in.

When copying in a job request for vectored I/O, we need to dynamically
allocate a uio to wrap an iovec.  So aiocb_copyin() needs to get the
opcode from the aiocb and then decide whether an allocation is required.
We failed to do this in the COMPAT_FREEBSD32 case.  Fix it.

Reported by:	syzbot+27eab6f2c2162f2885ee@syzkaller.appspotmail.com
Reviewed by:	kib, asomers
Fixes:	f30a1ae8d529 ("lio_listio(2):  Allow LIO_READV and LIO_WRITEV.")
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31914
2021-09-11 12:58:41 -04:00
Mark Johnston
141fe2dcee aio: Interlock with listen(2)
soo_aio_queue() did not handle the possibility that the provided socket
is a listening socket.  Up until recently, to fix this one would have to
acquire the socket lock first and check, since the socket buffer locks
were destroyed by listen(2).

Now that the socket buffer locks belong to the socket, simply check
SOLISTENING(so) after acquiring them, and make listen(2) return an error
if any AIO jobs are enqueued on the socket.

Add a couple of simple regression test cases.

Note that this fixes things only for the default AIO implementation;
cxgbe(4)'s TCP offload has a separate pru_aio_queue implementation which
requires its own solution.

Reported by:	syzbot+c8aa122fa2c6a4e2a28b@syzkaller.appspotmail.com
Reported by:	syzbot+39af117d43d4f0faf512@syzkaller.appspotmail.com
Reported by:	syzbot+60cceb9569145a0b993b@syzkaller.appspotmail.com
Reported by:	syzbot+2d522c5db87710277ca5@syzkaller.appspotmail.com
Reviewed by:	tuexen, gallatin, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31901
2021-09-10 17:21:11 -04:00
Mark Johnston
b1e6a792d6 net: Enter a net epoch around protocol if_up/down notifications
When traversing a list of interface addresses, we need to be in a net
epoch section, and protocol ctlinput routines need a stable reference to
the address.

Reported by:	syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com
Reviewed by:	kp, melifaro
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31889
2021-09-10 09:07:40 -04:00
Mark Johnston
187afc5879 osd: Fix racy assertions
osd_register(9) may reallocate and expand the destructor array for a
given object type if no space is available for a new key.  This happens
with the object lock held.  Thus, when verifying that a given slot in
the array is occupied, we need to hold the object lock to avoid racing
with a reallocation.

Reported by:	syzbot+69ce54c7d7d813315dd3@syzkaller.appspotmail.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-09 10:11:02 -04:00
Rick Macklem
c5128c48df VOP_COPY_FILE_RANGE: Add a COPY_FILE_RANGE_TIMEO1SEC flag
Although it is not specified in the RFCs, the concept that
the NFSv4 server should reply to an RPC request within a
reasonable time is accepted practice within the NFSv4 community.

Without this patch, the NFSv4.2 server attempts to reply to
a Copy operation within 1second by limiting the copy to
vfs.nfs.maxcopyrange bytes (default 10Mbytes). This is crude at
best, given the large variation in I/O subsystem performance.

This patch adds a kernel only flag COPY_FILE_RANGE_TIMEO1SEC
that the NFSv4.2 can specify, which tells VOP_COPY_FILE_RANGE()
to return after approximately 1 second with a partial result and
implements this in vn_generic_copy_file_range(), used by
vop_stdcopyfilerange().

Modifying the NFSv4.2 server to set this flag will be done in
a separate patch.  Also under consideration is exposing the
COPY_FILE_RANGE_TIMEO1SEC to userland for use on the FreeBSD
copy_file_range(2) syscall.

MFC after:	2 weeks
Reviewed by:	khng
Differential Revision:	https://reviews.freebsd.org/D31829
2021-09-07 17:35:26 -07:00
Mark Johnston
a8aa6f1f78 socket: Avoid clearing SS_ISCONNECTING if soconnect() fails
This behaviour appears to date from the 4.4 BSD import.  It has two
problems:

1. The update to so_state is not protected by the socket lock, so
   concurrent updates to so_state may be lost.
2. Suppose two threads race to call connect(2) on a socket, and one
   succeeds while the other fails.  Then the failing thread may
   incorrectly clear SS_ISCONNECTING, confusing the state machine.

Simply remove the update.  It does not appear to be necessary:
pru_connect implementations which call soisconnecting() only do so after
all failure modes have been handled.  For instance, tcp_connect() and
tcp6_connect() will never return an error after calling soisconnected().
However, we cannot correctly assert that SS_ISCONNECTED is not set after
an error from soconnect() since the socket lock is not held across the
pru_connect call, so a concurrent connect(2) may have set the flag.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31699
2021-09-07 17:12:09 -04:00
Mark Johnston
523d58aad1 socket: Remove unneeded SOLISTENING checks
Now that SOCK_IO_*_LOCK() checks for listening sockets, we can eliminate
some racy SOLISTENING() checks.  No functional change intended.

Reviewed by:	tuexen
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31660
2021-09-07 17:12:09 -04:00
Mark Johnston
bd4a39cc93 socket: Properly interlock when transitioning to a listening socket
Currently, most protocols implement pru_listen with something like the
following:

	SOCK_LOCK(so);
	error = solisten_proto_check(so);
	if (error) {
		SOCK_UNLOCK(so);
		return (error);
	}
	solisten_proto(so);
	SOCK_UNLOCK(so);

solisten_proto_check() fails if the socket is connected or connecting.
However, the socket lock is not used during I/O, so this pattern is
racy.

The change modifies solisten_proto_check() to additionally acquire
socket buffer locks, and the calling thread holds them until
solisten_proto() or solisten_proto_abort() is called.  Now that the
socket buffer locks are preserved across a listen(2), this change allows
socket I/O paths to properly interlock with listen(2).

This fixes a large number of syzbot reports, only one is listed below
and the rest will be dup'ed to it.

Reported by:	syzbot+9fece8a63c0e27273821@syzkaller.appspotmail.com
Reviewed by:	tuexen, gallatin
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31659
2021-09-07 17:11:43 -04:00
Mark Johnston
f94acf52a4 socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK().  These operate on a
socket rather than a socket buffer.  Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.

When locking for I/O, return an error if the socket is a listening
socket.  Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.

Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before.  In
particular, add checks to sendfile() and sorflush().

Reviewed by:	tuexen, gallatin
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31657
2021-09-07 15:06:48 -04:00
Warner Losh
ecfbb2e302 genoffset.sh: Use 10 X's instead of 5 for pick mkdtemp implementations
Linux fails to build now because the mkdtemp in the bootstrapped
environment wants 6 or more X's. Use 10 out of an abundance of caution.

Sponsored by:		Netflix
Reviewed by:		arichards
Differential Revision:	https://reviews.freebsd.org/D31863
2021-09-07 10:08:51 -06:00
Konstantin Belousov
98168a6e6c kqueue: drain kqueue taskqueue if syscall tickled it
Otherwise return from the syscall and next syscall, which could be
kevent(2) on the kqueue that should be notified, races with the kqueue
taskqueue thread, and potentially misses the wakeup.  This is reliably
visible when kevent(2) only peeks into events using zeroed timeout.

PR:	258310
Reported by:	arichardson, Jan Kokemüller <jan.kokemueller@gmail.com>
Reviewed by:	arichardson, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31858
2021-09-07 02:43:34 +03:00
Colin Percival
bd11e253a9 Add _sleep to TSLOG
Most of the nvme initialization time in my tests is being spent here
(via pause_sbt).
2021-09-05 12:50:15 -07:00
Colin Percival
7347dfce01 Add run_interrupt_driven_config_hooks to TSLOG
The 'intr_config_hooks' SYSINIT is now taking a nontrivial amount of
time in my profiling; run_interrupt_driven_config_hooks is responsible
for most of it, so this adds useful information to the resulting
flamecharts.
2021-09-05 12:45:29 -07:00
Alexander Motin
a264594d4f Unify console output.
Without this change when virtual console enabled depending on buffer
presence and state different parts of output go to different consoles.

MFC after:	1 month
2021-09-03 23:13:42 -04:00
Alexander Motin
bd6085c6ae Re-implement virtual console (constty).
Protect conscallout with tty lock instead of Giant.  In addition to
Giant removal it also closes race on console unset.

Introduce additional lock to protect against concurrent console sets.

Remove consbuf free on console unset as unsafe, making impossible to
change buffer size after first allocation.  Instead increase default
buffer size from 8KB to 64KB and processing rate from 5Hz to 10-15Hz
to make the output more smooth.

MFC after:	1 month
2021-09-03 22:18:51 -04:00
Alexander Motin
4730a8972b callout(9): Allow spin locks use with callout_init_mtx().
Implement lock_spin()/unlock_spin() lock class methods, moving the
assertion to _sleep() instead.  Change assertions in callout(9) to
allow spin locks for both regular and C_DIRECT_EXEC cases. In case of
C_DIRECT_EXEC callouts spin locks are the only locks allowed actually.

As the first use case allow taskqueue_enqueue_timeout() use on fast
task queues.  It actually becomes more efficient due to avoided extra
context switches in callout(9) thanks to C_DIRECT_EXEC.

MFC after:	2 weeks
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D31778
2021-09-02 21:16:46 -04:00
Konstantin Belousov
5cc82c563e cluster_write(): do not access buffer after it is released
The issue was reported by
Alexander Lochmann <alexander.lochmann@tu-dortmund.de>,
who found the problem by performing lock analysis using LockDoc,
see https://doi.org/10.1145/3302424.3303948.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31780
2021-09-02 21:36:33 +03:00
Mateusz Guzik
6352bbf7be vmem: disable debug.vmem_check by default
It has a prohibitive performance impact when running real workloads.

Note this only affects kernels with DIAGNOSTIC.

Reviewed by:	markj
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31784
2021-09-02 18:28:45 +00:00
Brooks Davis
6bc90e8acf syscalls.master: correct formatting issues
Reviewed by:	kevans, emaste
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D31351
2021-09-01 21:58:22 +01:00
Brooks Davis
df501bac69 syscalls.master: switch to CAPENABLED flags
Switch the main syscall table to use CAPENABLED flags rather than
capabilities.conf.  This avoid synchronization issues between
syscalls.master and capabilities.conf (e.g. when renaming a syscall
during development).

For now, move capabilities.conf to sys/compat/freebsd32 and use it
there.  Use of sys/compat/freebsd32/syscalls.master should be replaced
by makesyscalls.lua enhancements to allow the main one to be used.

This change results in no changes to generated files after running
`make sysent`.

Reviewed by:	kevans, emaste
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D31350
2021-09-01 21:58:16 +01:00
Brooks Davis
6945df3fff makesyscalls.lua: add a CAPENABLED flag
The CAPENABLED flag indicates that the syscall can be used in capsicum
capability mode.  It is intended to replace capabilities.conf.

Reviewed by:	kevans, emaste
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D31349
2021-09-01 21:58:06 +01:00
Mark Johnston
c511383de7 kevent: Fix races between timer detach and kqtimer_proc_continue()
- When detaching a knote, we need to double check the enqueued flag
  after acquiring the process lock, as kqtimer_proc_continue() may have
  toggled it.
- kqtimer_proc_continue() could in principle reschedule a stopped
  callout after filt_timerdetach() drains the callout.  So, we need to
  re-check.

Reported by:	syzbot+4a4cebb3ec07892cb040@syzkaller.appspotmail.com
Reported by:	syzbot+a9c04bc76078a3b7dd8d@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31772
2021-09-01 14:18:58 -04:00
Ka Ho Ng
92bb74fd4f vfs: Use file_cred for VOP_DEALLOCATE in vn_deallocate if non-NULL
This changes vn_deallocate() to match the behavior of vn_rdwr() when
picking which ucred to use. That is, vn_deallocate() uses file_cred for
making VOP call if it is non-NULL, or use active_cred otherwise.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31712
2021-09-01 20:19:08 +08:00