freebsd-nq

Author	SHA1	Message	Date
Ka Ho Ng	454bc887f2	uipc_shm: Implements fspacectl(2) support This implements fspacectl(2) support on shared memory objects. The semantic of SPACECTL_DEALLOC is equivalent to clearing the backing store and free the pages within the affected range. If the call succeeds, subsequent reads on the affected range return all zero. tests/sys/posixshm/posixshm_tests.c is expanded to include a fspacectl(2) functional test. Sponsored by: The FreeBSD Foundation Reviewed by: kevans, kib Differential Revision: https://reviews.freebsd.org/D31490	2021-08-12 23:04:18 +08:00
Ka Ho Ng	a638dc4ebc	vfs: Add ioflag to VOP_DEALLOCATE(9) The addition of ioflag allows callers passing IO_SYNC/IO_DATASYNC/IO_DIRECT down to the file system implementation. The vop_stddeallocate fallback implementation is updated to pass the ioflag to the file system implementation. vn_deallocate(9) internally is also changed to pass ioflag to the VOP_DEALLOCATE call. Sponsored by: The FreeBSD Foundation Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D31500	2021-08-12 23:03:49 +08:00
Ka Ho Ng	c15384f896	vfs: Add get_write_ioflag helper to calculate ioflag Converted vn_write to use this helper. Sponsored by: The FreeBSD Foundation MFC after: 3 days Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31513	2021-08-12 17:35:34 +08:00
Dmitry Chagin	71854d9b2b	fork: Remove the unnecessary spaces. MFC after: 2 weeks	2021-08-12 11:58:17 +03:00
Dmitry Chagin	de8374df28	fork: Allow ABI to specify fork return values for child. At least Linux x86 ABI's does not use carry bit and expects that the dx register is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork(). Add a short comment about touching dx in x86_set_fork_retval(), for more details see phab comments from kib@ and imp@. Reviewed by: kib Differential revision: https://reviews.freebsd.org/D31472 MFC after: 2 weeks	2021-08-12 11:45:25 +03:00
Eric van Gyzen	13a58148de	netdump: send key before dump, in case dump fails Previously, if an encrypted netdump failed, such as due to a timeout or network failure, the key was not saved, so a partial dump was completely useless. Send the key first, so the partial dump can be decrypted, because even a partial dump can be useful. Reviewed by: bdrewery, markj MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D31453	2021-08-11 10:54:56 -05:00
Mark Johnston	10a8e93da1	kmsan: Export kmsan_mark_mbuf() and kmsan_mark_bio() Sponsored by: The FreeBSD Foundation	2021-08-11 16:33:41 -04:00
Andrew Gallatin	95c51fafa4	ktls: Init reset tag task for cloned sessions When cloning a ktls session (which is needed when we need to switch output NICs for a NIC TLS session), we need to also init the reset task, like we do when creating a new tls session. Reviewed by: jhb Sponsored by: Netflix	2021-08-11 14:06:43 -04:00
Mitchell Horne	4ccaa87f69	kdb: Handle process enumeration before procinit() Make kdb_thr_first() and kdb_thr_next() return sane values if the allproc list and pidhashtbl haven't been initialized yet. This can happen if the debugger is entered very early on, for example with the '-d' boot flag. This allows remote gdb to attach at such a time, and fixes some ddb commands like 'show threads'. Be explicit about the static initialization of these variables. This part has no functional change. Reviewed by: markj, imp (previous version) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31495	2021-08-11 14:44:22 -03:00
Ka Ho Ng	4a9b832a2a	vfs: Rename ioflg to ioflag in vn_deallocate This includes a style fix around ioflag checking as well. Sponsored by: The FreeBSD Foundation Reviewed by: kib, bcr Differential Revision: https://reviews.freebsd.org/D31505	2021-08-11 17:45:47 +08:00
Alexander Motin	67f508db84	Mark some sysctls as CTLFLAG_MPSAFE. MFC after: 2 weeks	2021-08-10 22:18:26 -04:00
Mark Johnston	100949103a	uma: Add KMSAN hooks For now, just hook the allocation path: upon allocation, items are marked as initialized (absent M_ZERO). Some zones are exempted from this when it would otherwise raise false positives. Use kmsan_orig() to update the origin map for UMA and malloc(9) allocations. This allows KMSAN to print the return address when an uninitialized UMA item is implicated in a report. For example: panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe Sponsored by: The FreeBSD Foundation	2021-08-10 21:27:54 -04:00
Mark Johnston	693c9516fa	busdma: Add KMSAN integration Sanitizer instrumentation of course cannot automatically update shadow state when devices write to host memory. KMSAN thus hooks into busdma, both to update shadow state after a device write, and to verify that the kernel does not publish uninitalized bytes to devices. To implement this, when KMSAN is configured, each dmamap embeds a memory descriptor describing the region currently loaded into the map. bus_dmamap_sync() uses the operation flags to determine whether to validate the loaded region or to mark it as initialized in the shadow map. Note that in cases where the amount of data written is less than the buffer size, the entire buffer is marked initialized even when it is not. For example, if a NIC writes a 128B packet into a 2KB buffer, the entire buffer will be marked initialized, but subsequent accesses past the first 128 bytes are likely caused by bugs. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31338	2021-08-10 21:27:54 -04:00
Mark Johnston	b0f71f1bc5	amd64: Add MD bits for KMSAN Interrupt and exception handlers must call kmsan_intr_enter() prior to calling any C code. This is because the KMSAN runtime maintains some TLS in order to track initialization state of function parameters and return values across function calls. Then, to ensure that this state is kept consistent in the face of asynchronous kernel-mode excpeptions, the runtime uses a stack of TLS blocks, and kmsan_intr_enter() and kmsan_intr_leave() push and pop that stack, respectively. Use these functions in amd64 interrupt and exception handlers. Note that handlers for user->kernel transitions need not be annotated. Also ensure that trap frames pushed by the CPU and by handlers are marked as initialized before they are used. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31467	2021-08-10 21:27:53 -04:00
Mark Johnston	8978608832	amd64: Populate the KMSAN shadow maps and integrate with the VM - During boot, allocate PDP pages for the shadow maps. The region above KERNBASE is currently not shadowed. - Create a dummy shadow for the vm page array. For now, this array is not protected by the shadow map to help reduce kernel memory usage. - Grow shadows when growing the kernel map. - Increase the default kernel stack size when KMSAN is enabled. As with KASAN, sanitizer instrumentation appears to create stack frames large enough that the default value is not sufficient. - Disable UMA's use of the direct map when KMSAN is configured. KMSAN cannot validate the direct map. - Disable unmapped I/O when KMSAN configured. - Lower the limit on paging buffers when KMSAN is configured. Each buffer has a static MAXPHYS-sized allocation of KVA, which in turn eats 2*MAXPHYS of space in the shadow map. Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31295	2021-08-10 21:27:53 -04:00
Mark Johnston	5dda15adbc	kern: Ensure that thread-local KMSAN state is available Sponsored by: The FreeBSD Foundation	2021-08-10 21:27:53 -04:00
Mark Johnston	a422084abb	Add the KMSAN runtime KMSAN enables the use of LLVM's MemorySanitizer in the kernel. This enables precise detection of uses of uninitialized memory. As with KASAN, this feature has substantial runtime overhead and is intended to be used as part of some automated testing regime. The runtime maintains a pair of shadow maps. One is used to track the state of memory in the kernel map at bit-granularity: a bit in the kernel map is initialized when the corresponding shadow bit is clear, and is uninitialized otherwise. The second shadow map stores information about the origin of uninitialized regions of the kernel map, simplifying debugging. KMSAN relies on being able to intercept certain functions which cannot be instrumented by the compiler. KMSAN thus implements interceptors which manually update shadow state and in some cases explicitly check for uninitialized bytes. For instance, all calls to copyout() are subject to such checks. The runtime exports several functions which can be used to verify the shadow map for a given buffer. Helpers provide the same functionality for a few structures commonly used for I/O, such as CAM CCBs, BIOs and mbufs. These are handy when debugging a KMSAN report whose proximate and root causes are far away from each other. Obtained from: NetBSD Sponsored by: The FreeBSD Foundation	2021-08-10 21:27:53 -04:00
Mark Johnston	eca9ac5a32	vfs: Avoid a comparison with an uninitialized field in setutimes() Some filesystems, e.g., devfs, do not populate va_birthtime in their GETATTR implementations. To handle this, make sure that va_birthtime is initialized to the quasi-standard value of { VNOVAL, 0 } before calling VOP_GETATTR. Reported by: KMSAN Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31468	2021-08-09 13:27:20 -04:00
Alexander Motin	696fca3fd4	Optimize res_find(). When the device name is provided, we can simply run strncmp() for each line to quickly skip unrelated ones, that is much faster than sscanf() and only then strcmp(). MFC after: 2 weeks	2021-08-08 21:54:49 -04:00
Ed Maste	9feff969a0	Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights These ones were unambiguous cases where the Foundation was the only listed copyright holder (in the associated license block). Sponsored by: The FreeBSD Foundation	2021-08-08 10:42:24 -04:00
Mateusz Guzik	b30e7cb7fa	cache: add OPENREAD and OPENWRITE to fast path lookup	2021-08-07 13:02:38 +02:00
Rick Macklem	c18c74a87c	namei: Add cn_flags bits for OPENREAD and OPENWRITE VOP_LOOKUP() is called with cn_flags bits ISLASTCN and ISOPEN to indicate that the lookup is for the last component of a pathname when doing open. If the cn_flags also indicates if the open is for Reading, Writing or Both, the NFSv4 client can do an NFSv4 Open operation in the same compound RPC as Lookup, often avoiding the additional Open RPC now done when VOP_OPEN() is called. This patch defines two new cn_flags bits called OPENREAD and OPENWRITE and sets these in open2nameif() based on FREAD, FWRITE flag bits. This will allow a subsequent patch to the NFSv4 client to do the Open operation in the same RPC as Lookup. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31431	2021-08-06 18:41:11 -07:00
Andrew Gallatin	09066b9866	ktls: Use the new PNOLOCK flag Use the new PNOLOCK flag to tsleep() to indicate that we are managing potential races, and don't need to sleep with a lock, or have a backstop timeout. Reviewed by: jhb Sponsored by: Netflix	2021-08-05 17:19:12 -04:00
Andrew Gallatin	1b97a054f3	tsleep: Add a PNOLOCK flag Add a PNOLOCK flag so that, in the race circumstance where wakeup races are externally mitigated, tsleep() can be called with a sleep time of 0 without triggering an an assertion. Reviewed by: jhb Sponsored by: Netflix	2021-08-05 17:16:30 -04:00
Andrew Gallatin	2694c869ff	ktls: fix a panic with INVARIANTS `98215005b7` introduced a new thread that uses tsleep(..0) to sleep forever. This hit an assert due to sleeping with a 0 timeout. So spell "forever" using SBT_MAX instead, which does not trigger the assert. Pointy hat to: gallatin Pointed out by: emaste Sponsored by: Netflix	2021-08-05 13:09:06 -04:00
Ka Ho Ng	da9fe3529b	Regen after `0dc332bff2`	2021-08-05 23:22:02 +08:00
Ka Ho Ng	0dc332bff2	Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9). fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the deallocation. vn_deallocate(9) is a public KPI for kmods' use. The purpose of proposing a new system call, a KPI and a VOP call is to allow bhyve or other hypervisor monitors to emulate the behavior of SCSI UNMAP/NVMe DEALLOCATE on a plain file. fspacectl(2) comprises of cmd and flags parameters to specify the space management operation to be performed. Currently cmd has to be SPACECTL_DEALLOC, and flags has to be 0. fo_fspacectl is added to fileops. VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation of VOP_DEALLOCATE(9) is provided. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28347	2021-08-05 23:20:42 +08:00
Ka Ho Ng	abbb57d5a6	vfs: Introduce vn_bmap_seekhole_locked() vn_bmap_seekhole_locked() is factored out version of vn_bmap_seekhole(). This variant requires shared vnode lock being held around the call. Sponsored by: The FreeBSD Foundation Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D31404	2021-08-05 22:52:26 +08:00
Andrew Gallatin	98215005b7	ktls: start a thread to keep the 16k ktls buffer zone populated Ktls recently received an optimization where we allocate 16k physically contiguous crypto destination buffers. This provides a large (more than 5%) reduction in CPU use in our workload. However, after several days of uptime, the performance benefit disappears because we have frequent allocation failures from the ktls buffer zone. It turns out that when load drops off, the ktls buffer zone is trimmed, and some 16k buffers are freed back to the OS. When load picks back up again, re-allocating those 16k buffers fails after some number of days of uptime because physical memory has become fragmented. This causes allocations to fail, because they are intentionally done without M_NORECLAIM, so as to avoid pausing the ktls crytpo work thread while the VM system defragments memory. To work around this, this change starts one thread per VM domain to allocate ktls buffers with M_NORECLAIM, as we don't care if this thread is paused while memory is defragged. The thread then frees the buffers back into the ktls buffer zone, thus allowing future allocations to succeed. Note that waking up the thread is intentionally racy, but neither of the races really matter. In the worst case, we could have either spurious wakeups or we could have to wait 1 second until the next rate-limited allocation failure to wake up the thread. This patch has been in use at Netflix on a handful of servers, and seems to fix the issue. Differential Revision: https://reviews.freebsd.org/D31260 Reviewed by: jhb, markj, (jtl, rrs, and dhw reviewed earlier version) Sponsored by: Netflix	2021-08-05 10:19:12 -04:00
John Baldwin	c51e4962a3	Document kern.log_wakeups_per_second. PR: 148680 MFC after: 2 weeks	2021-08-04 11:50:34 -07:00
Konstantin Belousov	0ef5eee9d9	Add vn_lktype_write() and remove repetetive code that calculates vnode locking type for write. Reviewed by: khng, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31405	2021-08-04 19:40:13 +03:00
Kyle Evans	04cc0c393c	malloc(9): provide missing malloc_aligned implementation Pointy hat: kevans Fixes: `6162cf885c` ("malloc(9): Document/complete aligned variants")	2021-08-02 21:12:39 -05:00
Eric van Gyzen	428624130a	Fix lockstat:::thread-spin dtrace probe with LOCK_PROFILING The spinning start time is missing from the calculation due to a misplaced #endif. Return the #endif where it's supposed to be. Submitted by: Alexander Alexeev <aalexeev@isilon.com> Reviewed by: bdrewery, mjg MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D31384	2021-08-02 14:44:23 -05:00
Adam Fenn	8ca384eb1d	devclass_alloc_unit: move "at" hint test to after device-in-use test Only perform this expensive operation when the unit number is a potential candidate (i.e. not already in use), thereby reducing device scan time on systems with many devices, unit numbers, and drivers. Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. X-NetApp-PR: #61 Differential Revision: https://reviews.freebsd.org/D31381	2021-08-02 11:27:17 -05:00
Alexander Motin	ca34553b6f	sched_ule(4): Pre-seed sched_random(). I don't think it changes anything, but why not. While there, make cpu_search_highest() use all 8 lower load bits for noise, since it does not use cs_prefer and the code is not shared with cpu_search_lowest() any more. MFC after: 1 month	2021-08-02 10:55:28 -04:00
Alexander Motin	8bb173fb5b	sched_ule(4): Use trylock when stealing load. On some load patterns it is possible for several CPUs to try steal thread from the same CPU despite randomization introduced. It may cause significant lock contention when holding one queue lock idle thread tries to acquire another one. Use of trylock on the remote queue allows both reduce the contention and handle lock ordering easier. If we can't get lock inside tdq_trysteal() we just return, allowing tdq_idled() handle it. If it happens in tdq_idled(), then we repeat search for load skipping this CPU. On 2-socket 80-thread Xeon system I am observing dramatic reduction of the lock spinning time when doing random uncached 4KB reads from 12 ZVOLs, while IOPS increase from 327K to 403K. MFC after: 1 month	2021-08-01 22:42:01 -04:00
Alexander Motin	2668bb2add	sched_ule(4): Reduce duplicate search for load. When sched_highest() called for some CPU group returns nothing, idle thread calls it for the parent CPU group. But the parent CPU group also includes the CPU group we've just searched, and unless there is a race going on, it is unlikely we find anything new this time. Avoid the double search in case of parent group having only two sub- groups (the most prominent case). Instead of escalating to the parent group run the next search over the sibling subgroup and escalate two levels up after if that fail too. In case of more than two siblings the difference is less significant, while searching the parent group can result in better decision if we find several candidate CPUs. On 2-socket 40-core Xeon system I am measuring ~25% reduction of CPU time spent inside cpu_search_highest() in both SMT (2x20x2) and non- SMT (2x20) cases. MFC after: 1 month	2021-08-01 22:07:51 -04:00
Mark Johnston	6f179693c5	Add interceptors for atomic operations on userspace memory Implement them for KASAN. KCSAN interceptors are left unimplemented for now. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-07-29 21:14:36 -04:00
Mark Johnston	a90d053b84	Simplify kernel sanitizer interceptors KASAN and KCSAN implement interceptors for various primitive operations that are not instrumented by the compiler. KMSAN requires them as well. Rather than adding new cases for each sanitizer which requires interceptors, implement the following protocol: - When interceptor definitions are required, define SAN_NEEDS_INTERCEPTORS and SANITIZER_INTERCEPTOR_PREFIX. - In headers that declare functions which need to be intercepted by a sanitizer runtime, use SANITIZER_INTERCEPTOR_PREFIX to provide declarations. - When SAN_RUNTIME is defined, do not redefine the names of intercepted functions. This is typically the case in files which implement sanitizer runtimes but is also needed in, for example, files which define ifunc selectors for intercepted operations. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-07-29 21:13:32 -04:00
Mark Johnston	9e575fadf4	link_elf_obj: Invoke fini callbacks This is required for KASAN: when a module is unloaded, poisoned regions (e.g., pad areas between global variables) are left as such, so if they are reused as KLDs are loaded, false positives can arise. Reported by: pho, Jenkins Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31339	2021-07-29 09:46:25 -04:00
Dmitry Chagin	9e32efa79b	umtx: Split do_unlock_pi on two counterparts. The umtx_pi_frop() will be used by Linux emulation layer. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31238 MFC after: 2 weeks	2021-07-29 12:47:39 +03:00
Dmitry Chagin	09f55e6002	umtx: Expose some of the pi umtx structures and API to the rest of the kernel. Differential Revision: https://reviews.freebsd.org/D31237 MFC after: 2 weeks	2021-07-29 12:46:58 +03:00
Dmitry Chagin	8e4d22c01d	umtx: Add umtxq_requeue Linux emulation layer extension. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31235 MFC after: 2 weeks	2021-07-29 12:43:07 +03:00
Dmitry Chagin	7caa29115b	umtx: Add bitset conditional wakeup functionality. The bitset is a Linux emulation layer extension. This 32-bit mask, in which at least one bit must be set, is used to select which threads should be woken up. The bitset is stored in the umtx_q structure, which is used to enqueue the waiter into the umtx waitqueue. Put the bitset into the hole, that appeared on LP64 due to data alignment, to prevent the growth of the struct umtx_q. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31234 MFC after: 2 weeks	2021-07-29 12:42:49 +03:00
Dmitry Chagin	1fdcc87cfd	umtx: Expose some of the umtx structures and API to the rest of the kernel. Differential Revision: https://reviews.freebsd.org/D31233 MFC after: 2 weeks	2021-07-29 12:42:17 +03:00
Dmitry Chagin	307a3dd35c	umtx: Expose struct abs_timeout to the rest of the kernel. Add umtx_ prefix to all abs_timeout facility and add declaration for it. For consistency with others abs_timeout mark inline abs_timeout_init2. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31249 MFC after: 2 weeks	2021-07-29 12:41:58 +03:00
Dmitry Chagin	af29f39958	umtx: Split umtx.h on two counterparts. To prevent umtx.h polluting by future changes split it on two headers: umtx.h - ABI header for userspace; umtxvar.h - the kernel staff. While here fix umtx_key_match style. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31248 MFC after: 2 weeks	2021-07-29 12:41:29 +03:00
Kyle Evans	e3707726c1	kern: remove deprecated makesyscalls.sh makesyscalls was rewritten in Lua and introduced in `d3276301ab`. In the time since, no objections have risen and a warning was introduced long ago on invocation of makesyscalls.sh that it would be removed before FreeBSD 13. Belatedly follow through on that.	2021-07-28 22:22:23 -05:00
Alexander Motin	aefe0a8c32	Refactor/optimize cpu_search_(). Remove cpu_search_both(), unused for many years. Without it there is less sense for the trick of compiling common cpu_search() into separate cpu_search_lowest() and cpu_search_highest(), so split them completely, making code more readable. While there, split iteration over children groups and CPUs, complicating code for very small deduplication. Stop passing cpuset_t arguments by value and avoid some manipulations. Since MAXCPU bump from 64 to 256, what was a single register turned into 32-byte memory array, requiring memory allocation and accesses. Splitting struct cpu_search into parameter and result parts allows to even more reduce stack usage, since the first can be passed through on recursion. Remove CPU_FFS() from the hot paths, precalculating first and last CPU for each CPU group in advance during initialization. Again, it was not a problem for 64 CPUs before, but for 256 FFS needs much more code. With these changes on 80-thread system doing ~260K uncached ZFS reads per second I observe ~30% reduction of time spent in cpu_search_(). MFC after: 1 month	2021-07-28 22:00:29 -04:00
Warner Losh	824897a3ae	genoffset: simplify and rewrite in sh genoffset used the fully generic ASSYM macro to generate the offsets needed for the thread_lite structure. However, since these are offsets into a structure, they will always be necessarily small and positive. As such, just create a simple character array of the right size and use a naming convention such that we can recover the field name, structure name and type. Use nm -t d and sort -n to sort these into order, then loop over the resutls to generate the thread_lite structure. MFC After: 2 weeks Reviewed by: kib, markj (earlier versions) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D31203	2021-07-28 13:50:09 -06:00

1 2 3 4 5 ...

18514 Commits