freebsd-dev

Author	SHA1	Message	Date
Jake Freeland	af93fea710	timerfd: Move implementation from linux compat to sys/kern Move the timerfd impelemntation from linux compat code to sys/kern. Use it to implement the new system calls for timerfd. Add a hook to kern_tc to allow timerfd to know when the system time has stepped. Add kqueue support to timerfd. Adjust a few names to be less Linux centric. RelNotes: YES Reviewed by: markj (on irc), imp, kib (with reservations), jhb (slack) Differential Revision: https://reviews.freebsd.org/D38459	2023-08-24 14:28:56 -06:00
Andrew Turner	676386b556	Support dynamically sized register sets We don't always know the size of the register set at compile time, e.g. on arm64 the size of the SVE registers need to be queried on boot. To support register sets that needs to be calculated at run time query the correct size when it is zero. Reviewed by: markj, kib (earlier version) Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D41302	2023-08-23 15:32:56 +01:00
Konstantin Belousov	c7df872096	Regen	2023-08-23 03:02:21 +03:00
Konstantin Belousov	4a69fc16a5	Add membarrier(2) This is an attempt at clean-room implementation of the Linux' membarrier(2) syscall. For documentation, you would need to read both membarrier(2) Linux man page, the comments in Linux kernel/sched/membarrier.c implementation and possibly look at actual uses. Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32360	2023-08-23 03:02:21 +03:00
Doug Moore	3b7ffacdee	pctrie: change for vm_radix compatibility Restructure parts of pctrie code to make it more compatible with the needs of vm_radix code. 1. End passing function pointers for memory management. By breaking insertion into two functions, the call for allocating memory can happen at the top level and be inlined, rather than happening via an function pointer to a memory allocator. By changing the remove function slightly, freeing of memory, when necessary, can happen at the top level and be inlined. By turning the reclamation code into two functions, one for starting iteration over to-be-freed nodes and the other continuing it, all the freeing can happen at the top level and be inlined. 2. Offer a version of remove that does not panic and returns the freed value (or NULL). 3. Offer a 'replace' operation, to replace one leaf with another that has the same key. These are three of the roadblocks that prevent code sharing between pctrie and vm_radix code. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D41396	2023-08-21 12:28:51 -05:00
Colin Percival	9a7add6d01	init_main: Switch from sysinit array to SLIST This has two effects: 1. We can mergesort the sysinits instead of bubblesorting them, which shaves about 2 ms off the boot time in Firecracker. 2. Adding more sysinits (e.g. from a KLD) can be performed by sorting them and then merging the sorted lists, which is both faster than the previous "append and sort" approach and avoids needing malloc. Reviewed by: jhb (previous version) Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D41075	2023-08-19 22:04:56 -07:00
Mateusz Guzik	64e881f2db	vfs: track how many times vn_alloc blocked on hitting the vnode limit	2023-08-18 23:56:58 +00:00
Konstantin Belousov	04f683b25a	subr_unit.c: another attempt to fix the build Reported by: cy Sponsored by: The FreeBSD Foundation MFC after: 1 week	2023-08-18 19:28:42 +03:00
Konstantin Belousov	1384a0b940	kern/subr_unit.c: fix non-debug build Sponsored by: The FreeBSD Foundation MFC after: 1 week	2023-08-18 16:37:16 +03:00
Kyle Evans	a76629cb03	kern: osd: stop downsizing arrays when the last slot deregisters It was noted in D41404 that these reallocations aren't actually guaranteed to succeed, despite assertions to the contrary. We're talking relatively small allocations, so just free up the individual slot to be reused later as needed. Note that this doesn't track the last active slot as of this moment, but this could be done later if we find it's worth the complexity for what little that would allow to be optimized (osd_call, slightly). While we're here, fix the debug message that indicates which slot we just allocated when we find an unused one; the slot # is actually one higher than the index. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D41409	2023-08-17 23:06:12 -05:00
Dag-Erling Smørgrav	e738085b94	Remove my middle name.	2023-08-17 15:08:30 +02:00
Warner Losh	78d146160d	sys: Remove $FreeBSD$: one-line bare tag Remove /^\s*\$FreeBSD\$$\n/	2023-08-16 11:55:17 -06:00
Warner Losh	031beb4e23	sys: Remove $FreeBSD$: one-line sh pattern Remove /^\s#[#!]?\s\$FreeBSD\$.*$\n/	2023-08-16 11:54:58 -06:00
Warner Losh	685dc743dc	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/	2023-08-16 11:54:36 -06:00
Warner Losh	95ee2897e9	sys: Remove $FreeBSD$: two-line .h pattern Remove /^\s\\n \*\s+\$FreeBSD\$$\n/	2023-08-16 11:54:11 -06:00
John Baldwin	41582f28dd	sys: Add a deprecation warning for 32-bit kernels. Per recent discussions on arch@ and at the BSDCan developer summit, we are considering removing support for 32-bit platforms (in some form) for 15.0 (at the earliest). A final decision on what will ship in 15.0 will be made closer to the release of 15.0. However, we should communicate the potential deprecation in 14.0 to provide notice to users. This commit adds a warning during boot on 32-bit kernels that they are deprecated and may be removed in 15.0. More details will be included in a followup commit to RELNOTES. Reviewed by: brooks, imp, emaste Differential Revision: https://reviews.freebsd.org/D41163	2023-08-16 09:48:51 -07:00
Gleb Smirnoff	d29b95ecc0	sockets: on accept(2) don't copy all of so_options to new socket As uncovered by `e3ba0d6add` we are copying lots of irrelevant options from the listener to an accepted socket, even those that aren't relevant to a non-listener, e.g. SO_REUSE*, SO_ACCEPTFILTER. Stop doing that and provide a fixed opt-in list for options to be inherited. Ideally we shall not inherit anything at all. For compatibility inherit a set of options that are meaningful for a non-listening socket of a protocol that can listen(2). Differential Revision: https://reviews.freebsd.org/D41412 Fixes: `e3ba0d6add`	2023-08-14 12:56:08 -07:00
Dmitry Chagin	f3e11927dc	vm: Allow MAP_32BIT for all architectures Reviewed by: alc, kib, markj Differential revision: https://reviews.freebsd.org/D41435	2023-08-14 20:20:20 +03:00
Kyle Evans	2bd446d7f1	kern: osd: avoid dereferencing freed slots If a slot is freed that isn't the last one, we'll set its destructor to NULL to indicate that it's been freed and leave a hole in the slot map. Check osd_destructors in osd_call() to avoid dereferencing a method that is potentially from a module that's been unloaded. This scenario would most commonly surface when two modules are loaded that osd_register(), then the earlier one deregisters and an osd_call() is made after the fact. In the specific report that triggered the investigation, kldload if_wg -> kldload linux* -> kldunload if_wg -> destroy a jail -> panic. Noted in the review, but left for follow-up work, is that the realloc that may happen in osd_deregister() should likely go away and the assumption that reallocating to a smaller size cannot fail is actually not correct. Reported by: dim Reviewed by: markj, jamie Differential Revision: https://reviews.freebsd.org/D41404	2023-08-10 12:33:26 -05:00
Mateusz Guzik	b8b33f3b3b	vfs: retire NAMEI_DIAGNOSTIC It is too spammy and information-deficient for practical use. Also see https://reviews.freebsd.org/D41207	2023-08-09 10:37:13 +00:00
Doug Moore	15047a6509	rangesets: use PCTRIE_DEFINE subr_rangeset.c is the only source file that calls functions like pctrie_insert and pctrie_remove directly; other users of pctries use the PCTRIE_DEFINE macro to define interfaces to pctrie that let them ignore issues of offsets within structs and uint64_t return values. Change subr_rangeset.c to use PCTRIE_DEFINE too. And change pctrie.h to mark the lookup function as unused, to avoid warnings when compiling files, like subr_rangeset.c, that don't invoke lookup(). Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D41391	2023-08-09 02:26:25 -05:00
Konstantin Belousov	28b36ecf99	Revert "exit1(): Revert sparc64 workaround" This reverts commit `96c76d9306`. There are other relatively common reasons why init might get killed during reboot, the workaround was really not sparc64-specific. Discussed with: marius Sponsored by: The FreeBSD Foundation	2023-08-09 09:00:20 +03:00
Konstantin Belousov	821dec4d56	vnode io: request range-locking when pgcache reads are enabled PR: 272678 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41334	2023-08-09 06:54:15 +03:00
Konstantin Belousov	651fdc3d19	Revert "vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault()" This reverts commit `5b353925ff`. The reason is the lesser scalability of the vnode' rangelock comparing with the vnode lock. Requested by: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41334	2023-08-09 06:54:15 +03:00
Marius Strobl	96c76d9306	exit1(): Revert sparc64 workaround If this still is a problem on other architectures, it should be fixed properly. This reverts commit `5486ffc898`.	2023-08-06 22:26:01 +02:00
Ed Maste	1a6238d1ea	sigexit: add a break in default case Suggested by: markj Fixes: `6edbe5616c` ("Provide some more information for...") Sponsored by: The FreeBSD Foundation	2023-08-05 19:21:11 -04:00
Ed Maste	6edbe5616c	Provide some more information for userland core dumps Previously the log message indicated only "(core dumped)" if a core was successfully created, or nothing if it was not. This provides insufficient information to faciliate debugging. Dtrace is no help as coredump() is static and we cannot find the return value via fbt. Expand the log message to include error return value information. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D39942	2023-08-05 19:18:35 -04:00
Igor Ostapenko	452661c967	exit1(): fix a comment typo Signed-off-by: Igor Ostapenko <pm@igoro.pro> Reviewed by: emaste Pull Request: https://github.com/freebsd/freebsd-src/pull/809	2023-08-04 20:42:07 -04:00
Mark Johnston	bd16c274c3	kdb: Permit a NULL thread credential in kdb_backend_permitted() Early during boot, thread0 runs with td->td_ucred == NULL. This is fixed up in proc0_init() at SI_SUB_INTRINSIC. If a panic occurs before then, rather than dereference a NULL pointer, simply allow the thread to enter KDB. Reported by: stevek Reviewed by: mhorne, stevek MFC after: 1 week Fixes: `cab1056105` ("kdb: Modify securelevel policy") Differential Revision: https://reviews.freebsd.org/D41280	2023-08-02 09:15:08 -04:00
John Baldwin	f561c2ec08	memdesc: Add routines for copying data to/from memory descriptors These are modeled on the API used for m_copydata/m_copyback and the crypto buffer APIs. One day it might be nice to reduce the proliferation of these by adding cursors and using memdesc directly for crypto request buffers. Reviewed by: markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D40615	2023-07-31 13:24:44 -07:00
Doug Moore	ac0572e660	radix_tree: compute slot from keybarr The computation of keybarr(), the function that determines when a search has failed at a non-leaf node, can be done in a way that computes the 'slot' value when keybarr() fails, which is exactly when slot() would next be invoked. Computing things this way saves space in search loops. This reduces the amd64 coding of the search loop in vm_radix_lookup from 40 bytes to 28 bytes. Reviewed by: alc Tested by: pho (as part of a larger change) Differential Revision: https://reviews.freebsd.org/D41235	2023-07-30 15:12:06 -05:00
Doug Moore	38f5cb1bfb	radix_tree: redefine the clev field The clev field in the node struct is almost always multiplied by WIDTH; occasionally, it is incremented and then multiplied by WIDTH. Instructions can be saved by storing it always multiplied by WIDTH. For the computation of slot(), this just eliminates a multiplication. For trimkey(), where the caller always adds one to clev before passing it as an argument, this change has the caller, not the caller, do that. Trimkey() handles it not by adding WIDTH to the input parameter, but by shifting COUNT, and not 1. That produces the same result, and it relieves keybarr of the need to test to avoid shifting by more than 63 bits, since level is always <= 63. This takes 3 instrutions and 14 bytes out of the basic lookup loop on amd64. Reviewed by: kib Tested by: pho (as part of a larger change) Differential Revision: https://reviews.freebsd.org/D41226	2023-07-30 01:20:07 -05:00
Dmitry Chagin	dbac8474fe	vfs: Deleting a doubled inclusion of sys/capsicum.h Reviewed by: Differential Revision: https://reviews.freebsd.org/D41223 MFC after: 1 week	2023-07-29 11:21:58 +03:00
Doug Moore	2d2bcba7ba	Every path in a radix trie ends with a leaf or a NULL. By replacing NULL (non-leaf) pointers with NULL leaves, there is a NULL test removed from every iteration of an index-based search loop. This speeds up radix trie searches by few percent. If there are any radix tries that are not initialized with the init() function, but instead depend on zeroing everything being proper initialization, this will break those tries. Reviewed by: alc, kib Tested by: pho (as part of a larger change) Differential Revision: https://reviews.freebsd.org/D41171	2023-07-28 11:39:52 -05:00
Jessica Clarke	8a6ab0f71f	Pre-quote macros passed to .incbin to avoid unwanted substitution Currently for the MFS, firmware and VDSO template assembly files we pass the path to include with .incbin unquoted and use __XSTRING within the assembly file to stringify it. However, __XSTRING doesn't just perform a single level of expansion, it performs the normal full expansion of the macro, and so if the path itself happens to tokenise to something that includes a defined macro in it that will itself be substituted. For example, with #define MACRO 1, a path like /path/containing/MACRO/in/it will expand to /path/containing/1/in/it and then, when stringified, end up as "/path/containing/1/in/it", not the intended string. Normally, macros have names that start or end witih underscores and are unlikely to appear in a tokenised path (even if technically they could), but now that we've switched to GNU C as of commit `ec41a96daa` ("sys: Switch the kernel's C standard from C99 to GNU99.") there are a few new macros defined which don't start or end with underscores: unix, which is always defined to 1, and i386, which is defined to 1 on i386. The former probably doesn't appear in user paths in practice, but the latter has been seen to and is likely quite common in the wild. Fix this by defining the macro pre-quoted instead of using __XSTRING. Note that technically we don't need to do this for vdso_wrap.S today as all the paths passed to it are safe file names with no user-controlled prefix but we should do it anyway for consistency and robustness against future changes. This allows make tinderbox to pass when built with source and object directories inside ~/path-with-unix, which would otherwise expand to ~/path-with-1 and break. PR: 272744 Fixes: `ec41a96daa` ("sys: Switch the kernel's C standard from C99 to GNU99.")	2023-07-28 05:08:43 +01:00
Mark Johnston	ca6cd604c8	kmsan: Use the correct origin bytes in kmsan_check_arg() Upon discovering a violation kmsan_check_arg() passes a pointer to function parameter shadow state to kmsan_report_hook(). kmsan_report_hook() uses that address to find the origin cells, assuming that the passed address belongs to the kernel map. This has two problems: 1) Function parameter origin state is also located in TLS, not in the origin map, but kmsan_report_hook() doesn't know this. 2) KMSAN TLS for thread0 is statically allocated and thus isn't shadowed (because the kernel itself is not shadowed). These bugs could result in inaccuracies in KMSAN reports, or a page fault when trying to report a KMSAN violation (which by default panics the kernel anyway). Fix the problem by making callers of kmsan_report_hook() provide a pointer to origin cells. Sponsored by: The FreeBSD Foundation	2023-07-27 16:02:03 -04:00
Konstantin Belousov	474708c334	fork1(): properly track the state of the pg_killsx lock Reported by: dchagin Fixes: `232b922cb3` Sponsored by: The FreeBSD Foundation MFC after: 1 week	2023-07-27 02:33:58 +03:00
Konstantin Belousov	232b922cb3	killpg(): close a race with fork(), part 2 When we are sending terminating signal to the group, killpg() needs to guarantee that all group members are to be terminated (it does not need to ensure that they are terminated on return from killpg()). The pg_killsx change eliminates the largest window there, but still, if a multithreaded process is signalled, the following could happen: - thread 1 is selected for the signal delivery and gets descheduled - thread 2 waits for pg_killsx lock, obtains it and forks - thread 1 continue executing and terminates the process This scenario allows the child to escape still. Fix it by single-threading forking parent if a conflict with pg_killsx is noted. We try to lock pg_killsx without sleeping, and failure to acquire it means that a parallel killpg(2) is executed. Then, stop other threads for running and in particular, receive signals, to avoid the situation explained above. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41128	2023-07-26 18:13:02 +03:00
Konstantin Belousov	dfe172484d	sigtd(): prefer non-stopped thread as a target for signal queue This should improve signal delivery latency and better expose the process state to the executing threads. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41128	2023-07-26 18:12:55 +03:00
Konstantin Belousov	aaa924138a	Revert "killpg(): close a race with fork(), part 2" This reverts commits `81a37995c7` and `565a343ae3`. There is still a leakage of the p_killpg_cnt, some but not all sources of which were identified. Second, and more important, is that there is a fundamental issue with blocked signals having KSI_KILLPG flag set. Queueing of such signal increments p_killpg_cnt, but it cannot be decremented until the signal is delivered. If, for instance, a single-threaded process with blocked signal receives killpg-kill and executes fork(2), the fork enter check returns with ERESTART. And since signal is blocked, the condition cannot be cleared. Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41128	2023-07-26 18:12:55 +03:00
Brooks Davis	437e1e37df	kern_sig.c: include sys/jail.h per style(9) Fixes: `e722820434` Sponsored by: DARPA	2023-07-25 18:13:17 +01:00
Konstantin Belousov	5b353925ff	vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault() To ensure atomicity of reads against parallel writes and truncates, vnode lock was not enough at least since introduction of vn_io_fault(). That code only take rangelock when it was possible that vn_read() and vn_write() could drop the vnode lock. At least since the introduction of VOP_READ_PGCACHE() which generally does not lock the vnode at all, rangelocks become required even for filesystems that do not need vn_io_fault() workaround. For instance, tmpfs. PR: 272678 Analyzed and reviewed by: Andrew Gierth <andrew@tao11.riddles.org.uk> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41158	2023-07-25 01:02:59 +03:00
Marius Strobl	0b416346e1	bus_dma: Trim CAM includes from subr_bus_dma.c These are no longer needed after commit `c5312bd79e`. This did require adding an include of <sys/limits.h> instead for SIZE_T_MAX which previously was dragged in via header pollution.	2023-07-24 10:26:06 -07:00
Mitchell Horne	a4e4ea738b	sys_getrandom: fix a function reference in a comment MFC after: 3 days Sponsored by: FreeBSD Foundation	2023-07-24 10:50:04 -03:00
Mateusz Guzik	176d83eafc	vfs: fix up NDFREE_PNBUF usage in vfs_mountroot_shuffle Noted by: karels	2023-07-23 13:44:15 +00:00
Dmitry Chagin	6453d4240f	vfs: Export exattr methods to reuse by Linuxulator Reviewed by: Differential revision: https://reviews.freebsd.org/D35543 MFC after: 1 month	2023-07-22 14:03:33 +03:00
Navdeep Parhar	c721694a1c	ktls_alloc_rcv_tag: Fix capability checks for RXTLS4/6. IFCAP2_* has the bit position and not the shifted value. Reviewed by: kib@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D41100	2023-07-19 11:12:14 -07:00
Doug Moore	6f251ef228	radix_trie: simplify ge, le lookups Replace the implementations of lookup_le and lookup_ge with ones that do not use a stack or climb back up the tree, and instead exploit the popmap field to quickly identify the place to resume searching if the straightforward indexed search fails. The code size of the original functions shrinks by a combined 160 bytes on amd64, and the cumulative cycle count per invocation of the two functions together is reduced 20% in a buildworld test. Reviewed by: alc, markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D40936	2023-07-19 09:43:31 -05:00
Dmitry Chagin	e38c634b77	vfs: Add a parenthese to vn_lock_pair() asserts to silence gcc Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D41070	2023-07-19 16:51:07 +03:00
John Baldwin	c5312bd79e	cam: Move bus_dmamap_load_ccb into cam.c. This routine is specific to CAM and no longer assumes any internal bus_dma knowledge as it is simple wrapper around bus_dmamap_load_mem. Fixes: `60381fd1ee` memdesc: Retire MEMDESC_CCB. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D41058	2023-07-18 18:19:27 -07:00

1 2 3 4 5 ...

19728 Commits