freebsd-dev

Author	SHA1	Message	Date
Mark Johnston	2bd9826995	vfs: Permit unix sockets to be opened with O_PATH As with FIFOs, a path descriptor for a unix socket cannot be used with kevent(). In principle connectat(2) and bindat(2) could be modified to support an AT_EMPTY_PATH-like mode which operates on the socket referenced by an O_PATH fd referencing a unix socket. That would eliminate the path length limit imposed by sockaddr_un. Update O_PATH tests. Reviewed by: kib MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31970	2021-09-17 14:19:06 -04:00
Mark Johnston	ade1daa5c0	socket: Synchronize soshutdown() with listen(2) and AIO To handle shutdown(SHUT_RD) we flush the receive buffer of the socket. This may involve searching for control messages of type SCM_RIGHTS, since we need to close the file references. Closing arbitrary files with socket buffer locks held is undesirable, mainly due to lock ordering issues, so we instead make a copy of the socket buffer and operate on that without any locks. Fields in the original buffer are cleared. This behaviour clobbered the AIO job queue associated with a receive buffer. It could also cause us to leak a KTLS session reference. Reorder socket buffer fields to address this. An alternate solution would be to remove the hack in sorflush(), but this is not quite feasible (yet). In particular, though sorflush() flags the sockbuf with SBS_CANTRCVMORE, it is possible for more data to be queued - the flag just prevents userspace from reading more data. I suspect we should fix this; SBS_CANTRCVMORE represents a terminal state and protocols can likely just drop any data destined for such a buffer. Many of them already do, but in some cases the check is racy, and some KPI churn will be needed to fix everything. This approach is more straightforward for now. Reported by: syzbot+104d8ee3430361cb2795@syzkaller.appspotmail.com Reported by: syzbot+5bd2e7d05f84a59d0d1b@syzkaller.appspotmail.com Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31976	2021-09-17 14:19:06 -04:00
Mark Johnston	883761f0a8	socket: Remove NOFREE from the socket zone This flag was added during the transition away from the legacy zone allocator, commit `c897b81311`. The old zone allocator effectively provided _NOFREE semantics, but it seems that they are not required for sockets. In particular, we use reference counting to keep sockets live. One somewhat dangerous case is sonewconn(), which returns a pointer to a socket with reference count 0. This socket is still effectively owned by the listening socket. Protocols must therefore be careful to synchronize sonewconn() calls with their pru_close implementations, since for listening sockets soclose() will abort the child sockets. For example, TCP holds the listening socket's PCB read locked across the sonewconn() call, which blocks tcp_usr_close(), and sofree() synchronizes with a concurrent soabort() of the nascent socket. However, _NOFREE semantics are not required here. Eliminating _NOFREE has several benefits: it enables use-after-free detection (e.g., by KASAN) and lets the system reclaim memory from the socket zone under memory pressure. No functional change intended. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31975	2021-09-17 14:19:06 -04:00
Mark Johnston	6b288408ca	socket: Add assertions around naked refcount decrements Sockets in a listen queue hold a reference to the parent listening socket. Several code paths release this reference manually when moving a child socket out of the queue. Replace comments about the expected post-decrement refcount value with assertions. Use refcount_load() instead of a plain load. No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31974	2021-09-17 14:19:06 -04:00
Mark Johnston	dfcef87714	socket: Fix a use-after-free in soclose() After releasing the fd reference to a socket "so", we should avoid testing SOLISTENING(so) since the socket may have been freed. Instead, directly test whether the list of unaccepted sockets is empty. Fixes: `f4bb1869dd` ("Consistently use the SOLISTENING() macro") Pointy hat: markj MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31973	2021-09-17 14:19:05 -04:00
Mark Johnston	bf25678226	ktls: Fix error/mode confusion in TCP_*TLS_MODE getsockopt handlers ktls_get_(rx\|tx)_mode() can return an errno value or a TLS mode, so errors are effectively hidden. Fix this by using a separate output parameter. Convert to the new socket buffer locking macros while here. Note that the socket buffer lock is not needed to synchronize the SOLISTENING check here, we can rely on the PCB lock. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31977	2021-09-17 14:19:05 -04:00
Mark Johnston	40fcdb9366	kcov: Disable address and memory sanitizers in get_kinfo() get_kinfo() is only called from the coverage sanitizer callbacks, which are similarly uninstrumented. Sponsored by: The FreeBSD Foundation	2021-09-17 14:19:05 -04:00
Konstantin Belousov	197a4f29f3	buffer pager: allow get_blksize method to return error Reported and reviewed by: asomers Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31998	2021-09-17 20:29:55 +03:00
Konstantin Belousov	796a8e1ad1	procctl(2): Add PROC_WXMAP_CTL/STATUS It allows to override kern.elf{32,64}.allow_wx on per-process basis. In particular, it makes it possible to run binaries without PT_GNU_STACK and without elfctl note while allow_wx = 0. Reviewed by: brooks, emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31779	2021-09-17 15:42:01 +03:00
Konstantin Belousov	f575573ca5	Remove PT_GET_SC_ARGS_ALL Reimplement `bdf0f24bb1` by checking for the caller' ABI in the implementation of PT_GET_SC_ARGS, and copying out everything if it is Linuxolator. Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the thread reused after other process, it allows to read some number of that thread last syscall arguments. Clear td_sa.args in thread_alloc(). Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D31968	2021-09-16 20:11:27 +03:00
Piotr Pawel Stefaniak	6e8272f317	mount: improve error message for invalid filesystem names For an invalid filesystem name used like this: mount -t asdfs /dev/ada1p5 /usr/obj emit an error message like this: mount: /dev/ada1p5: Invalid fstype: Invalid argument instead of: mount: /dev/ada1p5: Operation not supported by device Differential Revision: https://reviews.freebsd.org/D31540	2021-09-15 16:25:31 +02:00
Edward Tomasz Napierala	bdf0f24bb1	linux: implement PTRACE_GET_SYSCALL_INFO This is one of the pieces required to make modern (ie Focal) strace(1) work. Reviewed By: jhb (earlier version) Sponsored by: EPSRC Differential Revision: https://reviews.freebsd.org/D28212	2021-09-14 20:19:55 +00:00
John Baldwin	c782ea8bb5	Add a switch structure for send tags. Move the type and function pointers for operations on existing send tags (modify, query, next, free) out of 'struct ifnet' and into a new 'struct if_snd_tag_sw'. A pointer to this structure is added to the generic part of send tags and is initialized by m_snd_tag_init() (which now accepts a switch structure as a new argument in place of the type). Previously, device driver ifnet methods switched on the type to call type-specific functions. Now, those type-specific functions are saved in the switch structure and invoked directly. In addition, this more gracefully permits multiple implementations of the same tag within a driver. In particular, NIC TLS for future Chelsio adapters will use a different implementation than the existing NIC TLS support for T6 adapters. Reviewed by: gallatin, hselasky, kib (older version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D31572	2021-09-14 11:43:41 -07:00
Mark Johnston	cf4670fe0b	kcov: Integrate with KMSAN - kern_kcov.c needs to be compiled with -fsanitize=kernel-memory when KMSAN is configured since it calls into various other subsystems. - Disable address and memory sanitizers in kcov(4)'s coverage sanitizer callbacks, as they do not provide useful checking. Moreover, with KMSAN we may otherwise get false positives since the caller (coverage sanitizer runtime) is not instrumented. - Disable KASAN and KMSAN interceptors in subr_coverage.c, as they do not provide any benefit but do introduce overhead when fuzzing. Sponsored by: The FreeBSD Foundation	2021-09-14 14:29:27 -04:00
Mark Johnston	fa0463c384	socket: De-duplicate SBLOCKWAIT() definitions MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-09-14 09:01:32 -04:00
Wojciech Macek	6fa041d7f1	Measure latency of PMC interruptions Add HWPMC events to measure latency. Provide sysctl to choose the number of outstanding events which trigger HWPMC event. Obtained from: Semihalf Sponsored by: Stormshield Differential revision: https://reviews.freebsd.org/D31283	2021-09-13 06:08:32 +02:00
Mark Johnston	b864b67a0d	socket: Do not include control messages in FIONREAD return value Some system software expects to be able to read at least the number of bytes returned by FIONREAD. When control messages are counted in this return value, this assumption is violated. Follow Linux and OpenBSD here (as well as our own kevent(EVFILT_READ)) and only return the number of data bytes available. Reported by: avg MFC after: 2 weeks	2021-09-12 16:39:44 -04:00
Mark Johnston	2884918c73	aio: Fix up the opcode in aiocb32_copyin() With lio_listio(2), the opcode is specified by userspace rather than being hard-coded by the system call (e.g., aio_readv() -> LIO_READV). kern_lio_listio() calls aio_aqueue() with an opcode of LIO_NOP, which gets fixed up when the aiocb is copied in. When copying in a job request for vectored I/O, we need to dynamically allocate a uio to wrap an iovec. So aiocb_copyin() needs to get the opcode from the aiocb and then decide whether an allocation is required. We failed to do this in the COMPAT_FREEBSD32 case. Fix it. Reported by: syzbot+27eab6f2c2162f2885ee@syzkaller.appspotmail.com Reviewed by: kib, asomers Fixes: `f30a1ae8d5` ("lio_listio(2): Allow LIO_READV and LIO_WRITEV.") Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31914	2021-09-11 12:58:41 -04:00
Mark Johnston	141fe2dcee	aio: Interlock with listen(2) soo_aio_queue() did not handle the possibility that the provided socket is a listening socket. Up until recently, to fix this one would have to acquire the socket lock first and check, since the socket buffer locks were destroyed by listen(2). Now that the socket buffer locks belong to the socket, simply check SOLISTENING(so) after acquiring them, and make listen(2) return an error if any AIO jobs are enqueued on the socket. Add a couple of simple regression test cases. Note that this fixes things only for the default AIO implementation; cxgbe(4)'s TCP offload has a separate pru_aio_queue implementation which requires its own solution. Reported by: syzbot+c8aa122fa2c6a4e2a28b@syzkaller.appspotmail.com Reported by: syzbot+39af117d43d4f0faf512@syzkaller.appspotmail.com Reported by: syzbot+60cceb9569145a0b993b@syzkaller.appspotmail.com Reported by: syzbot+2d522c5db87710277ca5@syzkaller.appspotmail.com Reviewed by: tuexen, gallatin, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31901	2021-09-10 17:21:11 -04:00
Mark Johnston	b1e6a792d6	net: Enter a net epoch around protocol if_up/down notifications When traversing a list of interface addresses, we need to be in a net epoch section, and protocol ctlinput routines need a stable reference to the address. Reported by: syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com Reviewed by: kp, melifaro MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31889	2021-09-10 09:07:40 -04:00
Mark Johnston	187afc5879	osd: Fix racy assertions osd_register(9) may reallocate and expand the destructor array for a given object type if no space is available for a new key. This happens with the object lock held. Thus, when verifying that a given slot in the array is occupied, we need to hold the object lock to avoid racing with a reallocation. Reported by: syzbot+69ce54c7d7d813315dd3@syzkaller.appspotmail.com MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-09-09 10:11:02 -04:00
Rick Macklem	c5128c48df	VOP_COPY_FILE_RANGE: Add a COPY_FILE_RANGE_TIMEO1SEC flag Although it is not specified in the RFCs, the concept that the NFSv4 server should reply to an RPC request within a reasonable time is accepted practice within the NFSv4 community. Without this patch, the NFSv4.2 server attempts to reply to a Copy operation within 1second by limiting the copy to vfs.nfs.maxcopyrange bytes (default 10Mbytes). This is crude at best, given the large variation in I/O subsystem performance. This patch adds a kernel only flag COPY_FILE_RANGE_TIMEO1SEC that the NFSv4.2 can specify, which tells VOP_COPY_FILE_RANGE() to return after approximately 1 second with a partial result and implements this in vn_generic_copy_file_range(), used by vop_stdcopyfilerange(). Modifying the NFSv4.2 server to set this flag will be done in a separate patch. Also under consideration is exposing the COPY_FILE_RANGE_TIMEO1SEC to userland for use on the FreeBSD copy_file_range(2) syscall. MFC after: 2 weeks Reviewed by: khng Differential Revision: https://reviews.freebsd.org/D31829	2021-09-07 17:35:26 -07:00
Mark Johnston	a8aa6f1f78	socket: Avoid clearing SS_ISCONNECTING if soconnect() fails This behaviour appears to date from the 4.4 BSD import. It has two problems: 1. The update to so_state is not protected by the socket lock, so concurrent updates to so_state may be lost. 2. Suppose two threads race to call connect(2) on a socket, and one succeeds while the other fails. Then the failing thread may incorrectly clear SS_ISCONNECTING, confusing the state machine. Simply remove the update. It does not appear to be necessary: pru_connect implementations which call soisconnecting() only do so after all failure modes have been handled. For instance, tcp_connect() and tcp6_connect() will never return an error after calling soisconnected(). However, we cannot correctly assert that SS_ISCONNECTED is not set after an error from soconnect() since the socket lock is not held across the pru_connect call, so a concurrent connect(2) may have set the flag. MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31699	2021-09-07 17:12:09 -04:00
Mark Johnston	523d58aad1	socket: Remove unneeded SOLISTENING checks Now that SOCK_IO_*_LOCK() checks for listening sockets, we can eliminate some racy SOLISTENING() checks. No functional change intended. Reviewed by: tuexen MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31660	2021-09-07 17:12:09 -04:00
Mark Johnston	bd4a39cc93	socket: Properly interlock when transitioning to a listening socket Currently, most protocols implement pru_listen with something like the following: SOCK_LOCK(so); error = solisten_proto_check(so); if (error) { SOCK_UNLOCK(so); return (error); } solisten_proto(so); SOCK_UNLOCK(so); solisten_proto_check() fails if the socket is connected or connecting. However, the socket lock is not used during I/O, so this pattern is racy. The change modifies solisten_proto_check() to additionally acquire socket buffer locks, and the calling thread holds them until solisten_proto() or solisten_proto_abort() is called. Now that the socket buffer locks are preserved across a listen(2), this change allows socket I/O paths to properly interlock with listen(2). This fixes a large number of syzbot reports, only one is listed below and the rest will be dup'ed to it. Reported by: syzbot+9fece8a63c0e27273821@syzkaller.appspotmail.com Reviewed by: tuexen, gallatin MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31659	2021-09-07 17:11:43 -04:00
Mark Johnston	f94acf52a4	socket: Rename sb(un)lock() and interlock with listen(2) In preparation for moving sockbuf locks into the containing socket, provide alternative macros for the sockbuf I/O locks: SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a socket rather than a socket buffer. Note that these locks are used only to prevent concurrent readers and writters from interleaving I/O. When locking for I/O, return an error if the socket is a listening socket. Currently the check is racy since the sockbuf sx locks are destroyed during the transition to a listening socket, but that will no longer be true after some follow-up changes. Modify a few places to check for errors from sblock()/SOCK_IO_(SEND\|RECV)_LOCK() where they were not before. In particular, add checks to sendfile() and sorflush(). Reviewed by: tuexen, gallatin MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31657	2021-09-07 15:06:48 -04:00
Warner Losh	ecfbb2e302	genoffset.sh: Use 10 X's instead of 5 for pick mkdtemp implementations Linux fails to build now because the mkdtemp in the bootstrapped environment wants 6 or more X's. Use 10 out of an abundance of caution. Sponsored by: Netflix Reviewed by: arichards Differential Revision: https://reviews.freebsd.org/D31863	2021-09-07 10:08:51 -06:00
Konstantin Belousov	98168a6e6c	kqueue: drain kqueue taskqueue if syscall tickled it Otherwise return from the syscall and next syscall, which could be kevent(2) on the kqueue that should be notified, races with the kqueue taskqueue thread, and potentially misses the wakeup. This is reliably visible when kevent(2) only peeks into events using zeroed timeout. PR: 258310 Reported by: arichardson, Jan Kokemüller <jan.kokemueller@gmail.com> Reviewed by: arichardson, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31858	2021-09-07 02:43:34 +03:00
Colin Percival	bd11e253a9	Add _sleep to TSLOG Most of the nvme initialization time in my tests is being spent here (via pause_sbt).	2021-09-05 12:50:15 -07:00
Colin Percival	7347dfce01	Add run_interrupt_driven_config_hooks to TSLOG The 'intr_config_hooks' SYSINIT is now taking a nontrivial amount of time in my profiling; run_interrupt_driven_config_hooks is responsible for most of it, so this adds useful information to the resulting flamecharts.	2021-09-05 12:45:29 -07:00
Alexander Motin	a264594d4f	Unify console output. Without this change when virtual console enabled depending on buffer presence and state different parts of output go to different consoles. MFC after: 1 month	2021-09-03 23:13:42 -04:00
Alexander Motin	bd6085c6ae	Re-implement virtual console (constty). Protect conscallout with tty lock instead of Giant. In addition to Giant removal it also closes race on console unset. Introduce additional lock to protect against concurrent console sets. Remove consbuf free on console unset as unsafe, making impossible to change buffer size after first allocation. Instead increase default buffer size from 8KB to 64KB and processing rate from 5Hz to 10-15Hz to make the output more smooth. MFC after: 1 month	2021-09-03 22:18:51 -04:00
Alexander Motin	4730a8972b	callout(9): Allow spin locks use with callout_init_mtx(). Implement lock_spin()/unlock_spin() lock class methods, moving the assertion to _sleep() instead. Change assertions in callout(9) to allow spin locks for both regular and C_DIRECT_EXEC cases. In case of C_DIRECT_EXEC callouts spin locks are the only locks allowed actually. As the first use case allow taskqueue_enqueue_timeout() use on fast task queues. It actually becomes more efficient due to avoided extra context switches in callout(9) thanks to C_DIRECT_EXEC. MFC after: 2 weeks Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D31778	2021-09-02 21:16:46 -04:00
Konstantin Belousov	5cc82c563e	cluster_write(): do not access buffer after it is released The issue was reported by Alexander Lochmann <alexander.lochmann@tu-dortmund.de>, who found the problem by performing lock analysis using LockDoc, see https://doi.org/10.1145/3302424.3303948. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31780	2021-09-02 21:36:33 +03:00
Mateusz Guzik	6352bbf7be	vmem: disable debug.vmem_check by default It has a prohibitive performance impact when running real workloads. Note this only affects kernels with DIAGNOSTIC. Reviewed by: markj Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31784	2021-09-02 18:28:45 +00:00
Brooks Davis	6bc90e8acf	syscalls.master: correct formatting issues Reviewed by: kevans, emaste MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D31351	2021-09-01 21:58:22 +01:00
Brooks Davis	df501bac69	syscalls.master: switch to CAPENABLED flags Switch the main syscall table to use CAPENABLED flags rather than capabilities.conf. This avoid synchronization issues between syscalls.master and capabilities.conf (e.g. when renaming a syscall during development). For now, move capabilities.conf to sys/compat/freebsd32 and use it there. Use of sys/compat/freebsd32/syscalls.master should be replaced by makesyscalls.lua enhancements to allow the main one to be used. This change results in no changes to generated files after running `make sysent`. Reviewed by: kevans, emaste MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D31350	2021-09-01 21:58:16 +01:00
Brooks Davis	6945df3fff	makesyscalls.lua: add a CAPENABLED flag The CAPENABLED flag indicates that the syscall can be used in capsicum capability mode. It is intended to replace capabilities.conf. Reviewed by: kevans, emaste MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D31349	2021-09-01 21:58:06 +01:00
Mark Johnston	c511383de7	kevent: Fix races between timer detach and kqtimer_proc_continue() - When detaching a knote, we need to double check the enqueued flag after acquiring the process lock, as kqtimer_proc_continue() may have toggled it. - kqtimer_proc_continue() could in principle reschedule a stopped callout after filt_timerdetach() drains the callout. So, we need to re-check. Reported by: syzbot+4a4cebb3ec07892cb040@syzkaller.appspotmail.com Reported by: syzbot+a9c04bc76078a3b7dd8d@syzkaller.appspotmail.com Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31772	2021-09-01 14:18:58 -04:00
Ka Ho Ng	92bb74fd4f	vfs: Use file_cred for VOP_DEALLOCATE in vn_deallocate if non-NULL This changes vn_deallocate() to match the behavior of vn_rdwr() when picking which ucred to use. That is, vn_deallocate() uses file_cred for making VOP call if it is non-NULL, or use active_cred otherwise. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31712	2021-09-01 20:19:08 +08:00
Alexander Motin	706b1a5724	Align taskqueue_enqueue_timeout() to hardclock. It is done for all other KPIs using HZ, but was missed here. MFC after: 2 weeks	2021-08-31 23:50:35 -04:00
Mark Johnston	3138392a46	itimer: Serialize access to the p_itimers array Fix the following race between itimer_proc_continue() and process exit. itimer_proc_continue() may be called via realitexpire(), the real interval timer. Note that exit1() drains this timer _after_ draining and freeing itimers. Moreover, itimers_exit() is called without the process lock held; it only acquires the proc lock when deleting individual itimers, so once they are drained we free p->p_itimers without any synchronization. Thus, itimer_proc_continue() may load a non-NULL p->p_itimers array and iterate over it after it has been freed. Fix the problem by using the process lock when clearing p->p_itimers, to synchronize with itimer_proc_continue(). Formally, accesses to this field should be protected by the process lock anyway, and since the array is allocated lazily this will not incur any overhead in the common case. Reported by: syzbot+c40aa8bf54fe333fc50b@syzkaller.appspotmail.com Reported by: syzbot+929be2f32503bbc3844f@syzkaller.appspotmail.com Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31759	2021-08-31 16:38:05 -04:00
John Baldwin	470e851c4b	ktls: Support asynchronous dispatch of AEAD ciphers. KTLS OCF support was originally targeted at software backends that used host CPU cycles to encrypt TLS records. As a result, each KTLS worker thread queued a single TLS record at a time and waited for it to be encrypted before processing another TLS record. This works well for software backends but limits throughput on OCF drivers for coprocessors that support asynchronous operation such as qat(4) or ccr(4). This change uses an alternate function (ktls_encrypt_async) when encrypt TLS records via a coprocessor. This function queues TLS records for encryption and returns. It defers the work done after a TLS record has been encrypted (such as marking the mbufs ready) to a callback invoked asynchronously by the coprocessor driver when a record has been encrypted. - Add a struct ktls_ocf_state that holds the per-request state stored on the stack for synchronous requests. Asynchronous requests malloc this structure while synchronous requests continue to allocate this structure on the stack. - Add a ktls_encrypt_async() variant of ktls_encrypt() which does not perform request completion after dispatching a request to OCF. Instead, the ktls_ocf backends invoke ktls_encrypt_cb() when a TLS record request completes for an asynchronous request. - Flag AEAD software TLS sessions as async if the backend driver selected by OCF is an async driver. - Pull code to create and dispatch an OCF request out of ktls_encrypt() into a new ktls_encrypt_one() function used by both ktls_encrypt() and ktls_encrypt_async(). - Pull code to "finish" the VM page shuffling for a file-backed TLS record into a helper function ktls_finish_noanon() used by both ktls_encrypt() and ktls_encrypt_cb(). Reviewed by: markj Tested on: ccr(4) (jhb), qat(4) (markj) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D31665	2021-08-30 13:11:52 -07:00
Andrew Turner	b792434150	Create sys/reg.h for the common code previously in machine/reg.h Move the common kernel function signatures from machine/reg.h to a new sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2). Reviewed by: imp, markj Sponsored by: DARPA, AFRL (original work) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19830	2021-08-30 12:50:53 +01:00
Ka Ho Ng	a58e222b3b	vfs: yield in vn_deallocate_impl() loop Yield at the end of each loop iteration if there are remaining works as indicated by the value of *len updated by VOP_DEALLOCATE. Without this, when calling vop_stddeallocate to zero a large region, the implementation only zerofills a relatively small chunk and returns. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31705	2021-08-29 16:26:00 +08:00
Mark Johnston	7326e8589c	fsetown: Avoid process group lock recursion Restore the pre-1d874ba4f8ba behaviour of disassociating the current SIGIO recipient before looking up the specified process or process group. This avoids a lock recursion in the scenario where a process group is configured to receive SIGIO for an fd when it has already been so configured. Reported by: pho Tested by: pho Reviewed by: kib MFC after: 3 days	2021-08-28 15:50:44 -04:00
Rick Macklem	da779f262c	vfs_default: Change vop_stddeallocate() from static to global A future commit to the NFS client uses vop_stddeallocate() for cases where the NFS server does not support a Deallocate operation. Change vop_stddeallocate() from static to global so that it can be called by the NFS client. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31640	2021-08-27 18:25:44 -07:00
Konstantin Belousov	f19063ab02	vfs_hash_rehash(): require the vnode to be exclusively locked Rehash updates v_hash. Also, rehash moves the vnode to different hash bucket, which should be noticed in vfs_hash_get() after sleeping for the vnode lock. Reviewed by: mckusick, rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31464	2021-08-27 18:39:45 +03:00
Konstantin Belousov	7c1e4aab79	vfs_hash_insert: ensure that predicate is true After vnode lock, recheck v_hash. When vfs_hash_insert() is used with a predicate, recheck it after the selected vnode is locked. Since vfs_hash_lock is dropped, vnode could be rehashed during the sleep for the vnode lock, which could go unnoticed there. Reported and tested by: pho Reviewed by: mckusick, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31464	2021-08-27 18:39:45 +03:00
Mark Johnston	091869def9	connect: Use soconnectat() unconditionally in kern_connect() soconnect(...) is equivalent to soconnectat(AT_FDCWD, ...), so rely on this to save a branch. No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-08-27 08:32:07 -04:00
Mateusz Guzik	f1e2cc1c66	vfs: drop dedicated sysinit for mountlist_mtx Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-08-26 20:52:03 +02:00
Mateusz Guzik	0d28d014c8	vfs: refactor kern_unmount Split unmounting by path and id in preparation for other changes. Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-08-26 13:58:28 +02:00
Mateusz Guzik	7b2561b46b	vfs: stop open-coding vfs_getvfs in kern_unmount Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-08-26 11:38:31 +00:00
Mark Johnston	a507a40f3b	fsetown: Simplify error handling No functional change intended. Suggested by: kib Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31671	2021-08-25 16:20:07 -04:00
Mark Johnston	1d874ba4f8	fsetown: Fix process lookup bugs - pget()/pfind() will acquire the PID hash bucket locks, which are sleepable sx locks, but this means that the sigio mutex cannot be held while calling these functions. Instead, use pget() to hold the process, after which we lock the sigio and proc locks, respectively. - funsetownlst() assumes that processes cannot be registered for SIGIO once they have P_WEXIT set. However, pfind() will happily return exiting processes, breaking the invariant. Add an explicit check for P_WEXIT in fsetown() to fix this. [1] Fixes: `f52979098d` ("Fix a pair of races in SIGIO registration") Reported by: syzkaller [1] Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31661	2021-08-25 16:18:10 -04:00
Ka Ho Ng	9e202d036d	fspacectl(2): Changes on rmsr.r_offset's minimum value returned rmsr.r_offset now is set to rqsr.r_offset plus the number of bytes zeroed before hitting the end-of-file. After this change rmsr.r_offset no longer contains the EOF when the requested operation range is completely beyond the end-of-file. Instead in such case rmsr.r_offset is equal to rqsr.r_offset. Callers can obtain the number of bytes zeroed by subtracting rqsr.r_offset from rmsr.r_offset. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31677	2021-08-26 00:03:37 +08:00
Ka Ho Ng	5c1428d2c4	uipc_shm: Handle offset on shm_size as if it is beyond shm_size This avoids any unnecessary works in such case. Sponsored by: The FreeBSD Foundation Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D31655	2021-08-24 23:49:18 +08:00
Ka Ho Ng	1eaa36523c	fspacectl(2): Clarifies the return values rmacklem@ spotted two things in the system call: - Upon returning from a successful operation, vop_stddeallocate can update rmsr.r_offset to a value greater than file size. This behavior, although being harmless, can be confusing. - The EINVAL return value for rqsr.r_offset + rqsr.r_len > OFF_MAX is undocumented. This commit has the following changes: - vop_stddeallocate and shm_deallocate to bound the the affected area further by the file size. - The EINVAL case for rqsr.r_offset + rqsr.r_len > OFF_MAX is documented. - The fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9)'s return len is explicitly documented the be the value 0, and the return offset is restricted to be the smallest of off + len and current file size suggested by kib@. This semantic allows callers to interact better with potential file size growth after the call. Sponsored by: The FreeBSD Foundation Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D31604	2021-08-24 17:08:28 +08:00
Mateusz Guzik	b65ad70195	cache: retire cache_fast_revlookup sysctl Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-08-23 15:31:44 +02:00
Mateusz Guzik	7fd856ba07	vfs: s/__unused/__diagused in crossmp_* Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-08-23 15:23:42 +02:00
Mateusz Guzik	614faa3269	vfs: fix cache-relatecd LOR introduced in the previous change Reported by: kib Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-08-22 16:20:07 +00:00
Thomas Munro	f30a1ae8d5	lio_listio(2): Allow LIO_READV and LIO_WRITEV. Allow multiple vector IOs to be started with one system call. aio_readv() and aio_writev() already used these opcodes under the covers. This commit makes them available to user space. Being non-standard extensions, they're only visible if __BSD_VISIBLE is defined, like the functions. Reviewed by: asomers, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D31627	2021-08-22 23:00:42 +12:00
Jason A. Harmening	e81e71b0e9	Use interruptible wait for blocking recursive unmounts Now that we allow recursive unmount attempts to be abandoned upon exceeding the retry limit, we should avoid leaving an unkillable thread when a synchronous unmount request was issued against the base filesystem. Reviewed by: kib (earlier revision), mkusick Differential Revision: https://reviews.freebsd.org/D31450	2021-08-20 13:21:56 -07:00
Jason A. Harmening	a8c732f4e5	VFS: add retry limit and delay for failed recursive unmounts A forcible unmount attempt may fail due to a transient condition, but it may also fail due to some issue in the filesystem implementation that will indefinitely prevent successful unmount. In such a case, the retry logic in the recursive unmount facility will cause the deferred unmount taskqueue to execute constantly. Avoid this scenario by imposing a retry limit, with a default value of 10, beyond which the recursive unmount facility will emit a log message and give up. Additionally, introduce a grace period, with a default value of 1s, between successive unmount retries on the same mount. Create a new sysctl node, vfs.deferred_unmount, to export the total number of failed recursive unmount attempts since boot, and to allow the retry limit and retry grace period to be tuned. Reviewed by: kib (earlier revision), mkusick Differential Revision: https://reviews.freebsd.org/D31450	2021-08-20 13:20:50 -07:00
Mateusz Guzik	5d75ffdd0c	vfs: remove an unused variable from nameicap_tracker_add Reported by cc --analyze Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-08-20 17:52:24 +00:00
Mateusz Guzik	dbc689cdef	vfs: use vn_lock_pair to avoid establishing an ordering on mount This fixes some of the LORs seen on mount/unmount. Complete fix will require taking care of unmount as well. Reviewed by: kib Tested by: pho (previous version) Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31611	2021-08-20 17:52:24 +00:00
Kyle Evans	d7e1bdfeba	uipc: avoid circular pr_{slow,fast}timos domain_init() gets reinvoked for each vnet on a system, so we must not alter global state. Practically speaking, we were creating circular lists and tying up a softclock thread into an infinite loop. The breakage here was most easily observed by simply creating a jail in a new vnet and watching the system suddenly become erratic. Reported by: markj Fixes: `e0a17c3f06` ("uipc: create dedicated lists for fast ...") Pointy hat: kevans	2021-08-18 12:46:54 -05:00
Kristof Provost	07edc89c39	witness: remove ifnet_rw This lock no longer exists. It was removed in `a60100fdfc` (if: Remove ifnet_rwlock, 2020-11-25) Reviewed by: mjg Pointed out by: Dheeraj Kandula <dheerajk@netapp.com> Different Revision: https://reviews.freebsd.org/D31585	2021-08-18 08:51:26 +02:00
Kristof Provost	a051ca72e2	Introduce m_get3() Introduce m_get3() which is similar to m_get2(), but can allocate up to MJUM16BYTES bytes (m_get2() can only allocate up to MJUMPAGESIZE). This simplifies the bpf improvement in `f13da24715`. Suggested by: glebius Differential Revision: https://reviews.freebsd.org/D31455	2021-08-18 08:48:27 +02:00
Mateusz Guzik	e0a17c3f06	uipc: create dedicated lists for fast and slow timeout callbacks This avoids having to walk all possible protocols only to check if they have one (vast majority does not). Original patch by kevans@. Reviewed by: kevans Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-08-17 21:56:05 +02:00
Mark Johnston	c4feb1ab0a	sigtimedwait: Use a unique wait channel for sleeping When a sigtimedwait(2) caller goes to sleep, it uses a wait channel of p->p_sigacts with the proc lock as the interlock. However, p_sigacts can be shared between processes if a child is created with rfork(RFSIGSHARE \| RFPROC). Thus we can end up with two threads sleeping on the same wait channel using different locks, which is not permitted. Fix the problem simply by using a process-unique wait channel, following the example of sigsuspend. The actual wait channel value is irrelevant here, sleeping threads are awoken using sleepq_abort(). Reported by: syzbot+8c417afabadb50bb8827@syzkaller.appspotmail.com Reported by: syzbot+1d89fc2a9ef92ef64fa8@syzkaller.appspotmail.com Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31563	2021-08-16 15:11:15 -04:00
John Baldwin	d16cb228c1	ktls: Fix accounting for TLS 1.0 empty fragments. TLS 1.0 empty fragment mbufs have no payload and thus m_epg_npgs is zero. However, these mbufs need to occupy a "unit" of space for the purposes of M_NOTREADY tracking similar to regular mbufs. Previously this was done for the page count returned from ktls_frame() and passed to ktls_enqueue() as well as the page count passed to pru_ready(). However, sbready() and mb_free_notready() only use m_epg_nrdy to determine the number of "units" of space in an M_EXT mbuf, so when a TLS 1.0 fragment was marked ready it would mark one unit of the next mbuf in the socket buffer as ready as well. To fix, set m_epg_nrdy to 1 for empty fragments. This actually simplifies the code as now only ktls_frame() has to handle TLS 1.0 fragments explicitly and the rest of the KTLS functions can just use m_epg_nrdy. Reviewed by: gallatin MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D31536	2021-08-16 10:42:46 -07:00
Konstantin Belousov	81b895a95b	pipe_paircreate(): do not leak pipepair memory on error Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2021-08-16 17:08:44 +03:00
Kyle Evans	29e400e994	domain: make it safer to add domains post-domainfinalize I can see two concerns for adding domains after domainfinalize: 1.) The slow/fast callouts have already been setup. 2.) Userland could create a socket while we're in the middle of initialization. We can address #1 fairly easily by tracking whether the domain's been initialized for at least the default vnet. There are still some concerns about the callbacks being invoked while a vnet is in the process of being created/destroyed, but this is a pre-existing issue that the callbacks must coordinate anyways. We should also address #2, but technically this has been an issue anyways because we don't assert on post-domainfinalize additions; we don't seem to hit it in practice. Future work can fix that up to make sure we don't find partially constructed domains, but care must be taken to make sure that at least, e.g., the usages of pffindproto in ip_input.c can still find them. Differential Revision: https://reviews.freebsd.org/D25459	2021-08-16 00:59:56 -05:00
Kyle Evans	239aebee61	domain: give domains a chance to probe for availability This gives any given domain a chance to indicate that it's not actually supported on the current system. If dom_probe isn't supplied, we assume the domain is universally applicable as most of them are. Keeping fully-initialized and registered domains around that physically can't work on a large majority of FreeBSD deployments is sub-optimal and leads to errors that aren't consistent with the reality of why the socket can't be created (e.g. ESOCKTNOSUPPORT) because such scenario has to be caught upon pru_attach, at which point kicking back the more-appropriate EAFNOSUPPORT would seem weird. The initial consumer of this will be hvsock, which is only available on HyperV guests. Reviewed by: cem (earlier version), bcr (manpages) Differential Revision: https://reviews.freebsd.org/D25062	2021-08-16 00:59:56 -05:00
Konstantin Belousov	9446d9e88f	fstatat(2): handle non-vnode file descriptors for AT_EMPTY_PATH Set NIRES_EMPTYPATH earlies, to have use of EMPTYPATH recorded even if we are going to return error. When namei_setup() refused to accept dirfd, which is not of the vnode type, and indicated by ENOTDIR error return, fall back to kern_fstat(dirfd). Reported by: dchagin Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31530	2021-08-14 00:17:18 +03:00
Ka Ho Ng	454bc887f2	uipc_shm: Implements fspacectl(2) support This implements fspacectl(2) support on shared memory objects. The semantic of SPACECTL_DEALLOC is equivalent to clearing the backing store and free the pages within the affected range. If the call succeeds, subsequent reads on the affected range return all zero. tests/sys/posixshm/posixshm_tests.c is expanded to include a fspacectl(2) functional test. Sponsored by: The FreeBSD Foundation Reviewed by: kevans, kib Differential Revision: https://reviews.freebsd.org/D31490	2021-08-12 23:04:18 +08:00
Ka Ho Ng	a638dc4ebc	vfs: Add ioflag to VOP_DEALLOCATE(9) The addition of ioflag allows callers passing IO_SYNC/IO_DATASYNC/IO_DIRECT down to the file system implementation. The vop_stddeallocate fallback implementation is updated to pass the ioflag to the file system implementation. vn_deallocate(9) internally is also changed to pass ioflag to the VOP_DEALLOCATE call. Sponsored by: The FreeBSD Foundation Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D31500	2021-08-12 23:03:49 +08:00
Ka Ho Ng	c15384f896	vfs: Add get_write_ioflag helper to calculate ioflag Converted vn_write to use this helper. Sponsored by: The FreeBSD Foundation MFC after: 3 days Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31513	2021-08-12 17:35:34 +08:00
Dmitry Chagin	71854d9b2b	fork: Remove the unnecessary spaces. MFC after: 2 weeks	2021-08-12 11:58:17 +03:00
Dmitry Chagin	de8374df28	fork: Allow ABI to specify fork return values for child. At least Linux x86 ABI's does not use carry bit and expects that the dx register is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork(). Add a short comment about touching dx in x86_set_fork_retval(), for more details see phab comments from kib@ and imp@. Reviewed by: kib Differential revision: https://reviews.freebsd.org/D31472 MFC after: 2 weeks	2021-08-12 11:45:25 +03:00
Eric van Gyzen	13a58148de	netdump: send key before dump, in case dump fails Previously, if an encrypted netdump failed, such as due to a timeout or network failure, the key was not saved, so a partial dump was completely useless. Send the key first, so the partial dump can be decrypted, because even a partial dump can be useful. Reviewed by: bdrewery, markj MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D31453	2021-08-11 10:54:56 -05:00
Mark Johnston	10a8e93da1	kmsan: Export kmsan_mark_mbuf() and kmsan_mark_bio() Sponsored by: The FreeBSD Foundation	2021-08-11 16:33:41 -04:00
Andrew Gallatin	95c51fafa4	ktls: Init reset tag task for cloned sessions When cloning a ktls session (which is needed when we need to switch output NICs for a NIC TLS session), we need to also init the reset task, like we do when creating a new tls session. Reviewed by: jhb Sponsored by: Netflix	2021-08-11 14:06:43 -04:00
Mitchell Horne	4ccaa87f69	kdb: Handle process enumeration before procinit() Make kdb_thr_first() and kdb_thr_next() return sane values if the allproc list and pidhashtbl haven't been initialized yet. This can happen if the debugger is entered very early on, for example with the '-d' boot flag. This allows remote gdb to attach at such a time, and fixes some ddb commands like 'show threads'. Be explicit about the static initialization of these variables. This part has no functional change. Reviewed by: markj, imp (previous version) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31495	2021-08-11 14:44:22 -03:00
Ka Ho Ng	4a9b832a2a	vfs: Rename ioflg to ioflag in vn_deallocate This includes a style fix around ioflag checking as well. Sponsored by: The FreeBSD Foundation Reviewed by: kib, bcr Differential Revision: https://reviews.freebsd.org/D31505	2021-08-11 17:45:47 +08:00
Alexander Motin	67f508db84	Mark some sysctls as CTLFLAG_MPSAFE. MFC after: 2 weeks	2021-08-10 22:18:26 -04:00
Mark Johnston	100949103a	uma: Add KMSAN hooks For now, just hook the allocation path: upon allocation, items are marked as initialized (absent M_ZERO). Some zones are exempted from this when it would otherwise raise false positives. Use kmsan_orig() to update the origin map for UMA and malloc(9) allocations. This allows KMSAN to print the return address when an uninitialized UMA item is implicated in a report. For example: panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe Sponsored by: The FreeBSD Foundation	2021-08-10 21:27:54 -04:00
Mark Johnston	693c9516fa	busdma: Add KMSAN integration Sanitizer instrumentation of course cannot automatically update shadow state when devices write to host memory. KMSAN thus hooks into busdma, both to update shadow state after a device write, and to verify that the kernel does not publish uninitalized bytes to devices. To implement this, when KMSAN is configured, each dmamap embeds a memory descriptor describing the region currently loaded into the map. bus_dmamap_sync() uses the operation flags to determine whether to validate the loaded region or to mark it as initialized in the shadow map. Note that in cases where the amount of data written is less than the buffer size, the entire buffer is marked initialized even when it is not. For example, if a NIC writes a 128B packet into a 2KB buffer, the entire buffer will be marked initialized, but subsequent accesses past the first 128 bytes are likely caused by bugs. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31338	2021-08-10 21:27:54 -04:00
Mark Johnston	b0f71f1bc5	amd64: Add MD bits for KMSAN Interrupt and exception handlers must call kmsan_intr_enter() prior to calling any C code. This is because the KMSAN runtime maintains some TLS in order to track initialization state of function parameters and return values across function calls. Then, to ensure that this state is kept consistent in the face of asynchronous kernel-mode excpeptions, the runtime uses a stack of TLS blocks, and kmsan_intr_enter() and kmsan_intr_leave() push and pop that stack, respectively. Use these functions in amd64 interrupt and exception handlers. Note that handlers for user->kernel transitions need not be annotated. Also ensure that trap frames pushed by the CPU and by handlers are marked as initialized before they are used. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31467	2021-08-10 21:27:53 -04:00
Mark Johnston	8978608832	amd64: Populate the KMSAN shadow maps and integrate with the VM - During boot, allocate PDP pages for the shadow maps. The region above KERNBASE is currently not shadowed. - Create a dummy shadow for the vm page array. For now, this array is not protected by the shadow map to help reduce kernel memory usage. - Grow shadows when growing the kernel map. - Increase the default kernel stack size when KMSAN is enabled. As with KASAN, sanitizer instrumentation appears to create stack frames large enough that the default value is not sufficient. - Disable UMA's use of the direct map when KMSAN is configured. KMSAN cannot validate the direct map. - Disable unmapped I/O when KMSAN configured. - Lower the limit on paging buffers when KMSAN is configured. Each buffer has a static MAXPHYS-sized allocation of KVA, which in turn eats 2*MAXPHYS of space in the shadow map. Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31295	2021-08-10 21:27:53 -04:00
Mark Johnston	5dda15adbc	kern: Ensure that thread-local KMSAN state is available Sponsored by: The FreeBSD Foundation	2021-08-10 21:27:53 -04:00
Mark Johnston	a422084abb	Add the KMSAN runtime KMSAN enables the use of LLVM's MemorySanitizer in the kernel. This enables precise detection of uses of uninitialized memory. As with KASAN, this feature has substantial runtime overhead and is intended to be used as part of some automated testing regime. The runtime maintains a pair of shadow maps. One is used to track the state of memory in the kernel map at bit-granularity: a bit in the kernel map is initialized when the corresponding shadow bit is clear, and is uninitialized otherwise. The second shadow map stores information about the origin of uninitialized regions of the kernel map, simplifying debugging. KMSAN relies on being able to intercept certain functions which cannot be instrumented by the compiler. KMSAN thus implements interceptors which manually update shadow state and in some cases explicitly check for uninitialized bytes. For instance, all calls to copyout() are subject to such checks. The runtime exports several functions which can be used to verify the shadow map for a given buffer. Helpers provide the same functionality for a few structures commonly used for I/O, such as CAM CCBs, BIOs and mbufs. These are handy when debugging a KMSAN report whose proximate and root causes are far away from each other. Obtained from: NetBSD Sponsored by: The FreeBSD Foundation	2021-08-10 21:27:53 -04:00
Mark Johnston	eca9ac5a32	vfs: Avoid a comparison with an uninitialized field in setutimes() Some filesystems, e.g., devfs, do not populate va_birthtime in their GETATTR implementations. To handle this, make sure that va_birthtime is initialized to the quasi-standard value of { VNOVAL, 0 } before calling VOP_GETATTR. Reported by: KMSAN Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31468	2021-08-09 13:27:20 -04:00
Alexander Motin	696fca3fd4	Optimize res_find(). When the device name is provided, we can simply run strncmp() for each line to quickly skip unrelated ones, that is much faster than sscanf() and only then strcmp(). MFC after: 2 weeks	2021-08-08 21:54:49 -04:00
Ed Maste	9feff969a0	Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights These ones were unambiguous cases where the Foundation was the only listed copyright holder (in the associated license block). Sponsored by: The FreeBSD Foundation	2021-08-08 10:42:24 -04:00
Mateusz Guzik	b30e7cb7fa	cache: add OPENREAD and OPENWRITE to fast path lookup	2021-08-07 13:02:38 +02:00
Rick Macklem	c18c74a87c	namei: Add cn_flags bits for OPENREAD and OPENWRITE VOP_LOOKUP() is called with cn_flags bits ISLASTCN and ISOPEN to indicate that the lookup is for the last component of a pathname when doing open. If the cn_flags also indicates if the open is for Reading, Writing or Both, the NFSv4 client can do an NFSv4 Open operation in the same compound RPC as Lookup, often avoiding the additional Open RPC now done when VOP_OPEN() is called. This patch defines two new cn_flags bits called OPENREAD and OPENWRITE and sets these in open2nameif() based on FREAD, FWRITE flag bits. This will allow a subsequent patch to the NFSv4 client to do the Open operation in the same RPC as Lookup. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31431	2021-08-06 18:41:11 -07:00
Andrew Gallatin	09066b9866	ktls: Use the new PNOLOCK flag Use the new PNOLOCK flag to tsleep() to indicate that we are managing potential races, and don't need to sleep with a lock, or have a backstop timeout. Reviewed by: jhb Sponsored by: Netflix	2021-08-05 17:19:12 -04:00
Andrew Gallatin	1b97a054f3	tsleep: Add a PNOLOCK flag Add a PNOLOCK flag so that, in the race circumstance where wakeup races are externally mitigated, tsleep() can be called with a sleep time of 0 without triggering an an assertion. Reviewed by: jhb Sponsored by: Netflix	2021-08-05 17:16:30 -04:00

1 2 3 4 5 ...

18640 Commits