freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	1b0a4974c5	thread_create(): call cpu_copy_thread() after td_pflags is zeroed By calling the function too early we might still have the td_pflags value cached from the previous struct thread use. cpu_copy_thread() depends on correct value for TDP_KTHREAD at least on x86. Reported, bisected, and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D36069	2022-08-08 19:44:17 +03:00
Gordon Bergling	fa1ac9693a	vnode(9): Fix a typo in a source code comment - s/paramater/parameter/ MFC after: 3 days	2022-08-07 16:08:43 +02:00
Ed Maste	f0687f3e0e	Clarify code comments on ASLR default settings Sponsored by: The FreeBSD Foundation	2022-08-05 10:01:16 -04:00
Mark Johnston	d07675a935	file: Move code to share fdtol structs into kern_descrip.c This ensures the filedesc-to-leader code is consistently encapsulated in kern_descrip.c. No functional change intended. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35988	2022-08-04 09:39:25 -04:00
Konstantin Belousov	c53fec7603	sig_suspend_threads(): remove 'sending' arg The TDA_AST flag is set on td2 unconditionally (as it was TDF_ASTPENDING before AST rework), so it is not used practically for some time. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D36033	2022-08-03 16:56:23 +03:00
Konstantin Belousov	f2fd7d8bfc	ast_sig(): add missed TDAI() Mask checked was completely wrong Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D36033	2022-08-03 16:56:23 +03:00
Mark Johnston	852695416c	domain: Use designated constants for timeout periods No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-08-02 20:31:29 -04:00
Konstantin Belousov	4a662c9064	ktrace: change AST handler to require AST flag set When it was inline it made sense to depend on the existing nested check in KTRUSERRET() rather than adding a new td_flags flag. However, since we now have a TDA_KTRACE flag anyway, we might as well check it and avoid the call. Suggested by: jhb Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888	2022-08-02 21:11:10 +03:00
Konstantin Belousov	c46771a7b7	kern/subr_trap.c: cleanup no longer needed headers Also bump Foundation' copyright year Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888	2022-08-02 21:11:10 +03:00
Konstantin Belousov	cc1ec77231	Adjust g_waitidle() visibility and definition Explicitly pass the struct thread argument. Move the function prototype from sys/systm.h to geom/geom.h, we do not need almost each kernel source to see the prototype, it is now used only by kern/vfs_mountroot.c outside geom/geom_event.c, where the function is defined. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888	2022-08-02 21:11:10 +03:00
Konstantin Belousov	4fced8642f	sigfastblock_setpend() and fastblock_mask can be static now Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888	2022-08-02 21:11:10 +03:00
Konstantin Belousov	c6d31b8306	AST: rework Make most AST handlers dynamically registered. This allows to have subsystem-specific handler source located in the subsystem files, instead of making subr_trap.c aware of it. For instance, signal delivery code on return to userspace is now moved to kern_sig.c. Also, it allows to have some handlers designated as the cleanup (kclear) type, which are called both at AST and on thread/process exit. For instance, ast(), exit1(), and NFS server no longer need to be aware about UFS softdep processing. The dynamic registration also allows third-party modules to register AST handlers if needed. There is one caveat with loadable modules: the code does not make any effort to ensure that the module is not unloaded before all threads processed through AST handler in it. In fact, this is already present behavior for hwpmc.ko and ufs.ko. I do not think it is worth the efforts and the runtime overhead to try to fix it. Reviewed by: markj Tested by: emaste (arm64), pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888	2022-08-02 21:11:09 +03:00
Alexander V. Chernikov	be1f485d7d	sockets: add MSG_TRUNC flag handling for recvfrom()/recvmsg(). Implement Linux-variant of MSG_TRUNC input flag used in recv(), recvfrom() and recvmsg(). Posix defines MSG_TRUNC as an output flag, indicating packet/datagram truncation. Linux extended it a while (~15+ years) ago to act as input flag, resulting in returning the full packet size regarless of the input buffer size. It's a (relatively) popular pattern to do recvmsg( MSG_PEEK \| MSG_TRUNC) to get the packet size, allocate the buffer and issue another call to fetch the packet. In particular, it's popular in userland netlink code, which is the primary driving factor of this change. This commit implements the MSG_TRUNC support for SOCK_DGRAM sockets (udp, unix and all soreceive_generic() users). PR: kern/176322 Reviewed by: pauamma(doc) Differential Revision: https://reviews.freebsd.org/D35909 MFC after: 1 month	2022-07-30 18:21:51 +00:00
John Baldwin	ea8f128c7c	pmap_mapdev: Consistently use vm_paddr_t for the first argument. The devmap variants used vm_offset_t for some reason, and a few places explicitly cast bus addresses to vm_offset_t. (Probably those casts along with similar casts for vm_size_t should just be removed and instead permit the compiler to DTRT.) Reviewed by: markj Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D35961	2022-07-28 15:55:10 -07:00
Dimitry Andric	a387bd1b6a	Adjust function definition in vfs_bio.c to avoid clang 15 warnings With clang 15, the following -Werror warning is produced: sys/kern/vfs_bio.c:3430:11: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] buf_daemon() ^ void This is because buf_daemon() is declared with a (void) argument list, but defined with an empty argument list. Make the definition match the declaration. MFC after: 3 days	2022-07-26 19:59:57 +02:00
Dimitry Andric	78cfed2de7	Adjust function definitions in sysv_msg.c to avoid clang 15 warnings With clang 15, the following -Werror warnings are produced: sys/kern/sysv_msg.c:213:8: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] msginit() ^ void sys/kern/sysv_msg.c:316:10: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] msgunload() ^ void This is because msginit() and msgunload() are declared with (void) argument lists, but defined with empty argument lists. Make the definitions match the declarations. MFC after: 3 days	2022-07-26 19:59:57 +02:00
Dimitry Andric	b54e962aca	Adjust function definition in subr_bus.c to avoid clang 15 warnings With clang 15, the following -Werror warning is produced: sys/kern/subr_bus.c:871:16: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] bus_topo_assert() ^ void This is because bus_topo_assert() is declared with a (void) argument list, but defined with an empty argument list. Make the definition match the declaration. MFC after: 3 days	2022-07-26 19:59:57 +02:00
Dimitry Andric	3c8f0790dd	Adjust function definition in subr_autoconf.c to avoid clang 15 warnings With clang 15, the following -Werror warning is produced: sys/kern/subr_autoconf.c:119:34: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] run_interrupt_driven_config_hooks() ^ void This is because run_interrupt_driven_config_hooks() is declared with a (void) argument list, but defined with an empty argument list. Make the definition match the declaration. MFC after: 3 days	2022-07-26 19:59:57 +02:00
Dimitry Andric	f2eb09b089	Adjust function definitions in kern_resource.c to avoid clang 15 warnings With clang 15, the following -Werror warnings are produced: sys/kern/kern_resource.c:1212:10: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] lim_alloc() ^ void sys/kern/kern_resource.c:1365:11: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] uihashinit() ^ void This is because lim_alloc() and uihashinit() are declared with (void) argument lists, but defined with empty argument lists. Make the definitions match the declarations. MFC after: 3 days	2022-07-26 19:59:57 +02:00
Dimitry Andric	db8ea61ae2	Adjust function definitions in kern_dtrace.c to avoid clang 15 warnings With clang 15, the following -Werror warnings are produced: sys/kern/kern_dtrace.c:64:18: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] kdtrace_proc_size() ^ void sys/kern/kern_dtrace.c:87:20: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] kdtrace_thread_size() ^ void This is because kdtrace_proc_size() and kdtrace_thread_size() are declared with (void) argument lists, but defined with empty argument lists. Make the definitions match the declarations. MFC after: 3 days	2022-07-26 19:59:57 +02:00
Dimitry Andric	9806e82a23	Adjust function definitions in kern_cons.c to avoid clang 15 warnings With clang 15, the following -Werror warnings are produced: sys/kern/kern_cons.c:201:14: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] cninit_finish() ^ void sys/kern/kern_cons.c:376:7: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] cngrab() ^ void sys/kern/kern_cons.c:389:9: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] cnungrab() ^ void sys/kern/kern_cons.c:402:9: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] cnresume() ^ void This is because cninit_finish(), cngrab(), cnungrab(), and cnresume() are declared with (void) argument lists, but defined with empty argument lists. Make the definitions match the declarations. MFC after: 3 days	2022-07-26 19:59:56 +02:00
Ka Ho Ng	8c9aa94b42	Convert runtime param checks to KASSERTs for fo_fspacectl Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D35880	2022-07-23 15:16:23 -04:00
Colin Percival	84ec7df0d7	Add kern.reboot_wait_time sysctl Historic FreeBSD behaviour (dating back to 1994-04-02) when rebooting is to print "Rebooting..." and then /* wait 1 sec for printf's to complete and be read */ Prior to April 1994, there was a 100 ms delay (added 1993-11-12). Since (a) most users will already be aware that the system is rebooting and do not need to take time to read an additional message to that effect, and (b) most FreeBSD systems don't have anyone actively looking at the console anyway, this delay no longer serves much purpose. This commit adds a kern.reboot_wait_time sysctl which defaults to 0; historic behaviour can be regained by setting it to 1. Reviewed by: imp Relnotes: FreeBSD now reboots faster; to restore the traditional wait after printing "Rebooting..." to the console, set kern.reboot_wait_time=1 (or more). Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D35796	2022-07-18 17:23:25 -07:00
Mitchell Horne	2449b9e5fe	mac: kdb/ddb framework hooks Add three simple hooks to the debugger allowing for a loaded MAC policy to intervene if desired: 1. Before invoking the kdb backend 2. Before ddb command registration 3. Before ddb command execution We extend struct db_command with a private pointer and two flag bits reserved for policy use. Reviewed by: markj Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D35370	2022-07-18 22:06:13 +00:00
Mitchell Horne	c84c5e00ac	ddb: annotate some commands with DB_CMD_MEMSAFE This is not completely exhaustive, but covers a large majority of commands in the tree. Reviewed by: markj Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D35583	2022-07-18 22:06:09 +00:00
Mark Johnston	bd980ca847	sched_ule: Ensure we hold the thread lock when modifying td_flags The load balancer may force a running thread to reschedule and pick a new CPU. To do this it sets some flags in the thread running on a loaded CPU. But the code assumed that a running thread's lock is the same as that of the corresponding runqueue, and there are small windows where this is not true. In this case, we can end up with non-atomic modifications to td_flags. Since this load balancing is best-effort, simply give up if the thread's lock doesn't match; in this case the thread is about to enter the scheduler anyway. Reviewed by: kib Reported by: glebius Fixes: `e745d729be` ("sched_ule(4): Improve long-term load balancer.") MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35821	2022-07-18 15:52:27 -04:00
Kornel Dulęba	939f0b6323	Implement shared page address randomization It used to be mapped at the top of the UVA. If the randomization is enabled any address above .data section will be randomly chosen and a guard page will be inserted in the shared page default location. The shared page is now mapped in exec_map_stack, instead of exec_new_vmspace. The latter function is called before image activator has a chance to parse ASLR related flags. The KERN_PROC_VM_LAYOUT sysctl was extended to provide shared page address. The feature is enabled by default for 64 bit applications on all architectures. It can be toggled kern.elf64.aslr.shared_page sysctl. Approved by: mw(mentor) Sponsored by: Stormshield Obtained from: Semihalf Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35349	2022-07-18 16:27:37 +02:00
Kornel Dulęba	361971fbca	Rework how shared page related data is stored Store the shared page address in struct vmspace. Also instead of storing absolute addresses of various shared page segments save their offsets with respect to the shared page address. This will be more useful when the shared page address is randomized. Approved by: mw(mentor) Sponsored by: Stormshield Obtained from: Semihalf Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35393	2022-07-18 16:27:32 +02:00
Kornel Dulęba	f6ac79fb12	Introduce the PROC_SIGCODE() macro Use a getter macro instead of fetching the sigcode address directly from a sysent of a given process. It assumes that the sigcode is stored in the shared page, which is true in all cases, except for a.out binaries. This will be later useful when the shared page address randomization is introduced. No functional change intended. Approved by: mw(mentor) Sponsored by: Stormshield Obtained from: Semihalf Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35392	2022-07-18 16:27:26 +02:00
Mark Johnston	46eab86035	callout: Simplify the inner loop in callout_process() a bit - Use LIST_FOREACH_SAFE. - Simplify control flow. No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-07-17 13:58:19 -04:00
Mark Johnston	aac7c7ac54	callout: Remove a redundant parameter to callout_cc_add() The passed cpuid is always equal to the one stored in the callout structure. No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-07-17 13:58:19 -04:00
Mateusz Guzik	6eeba7dbd6	ule: unbreak UP builds Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-07-16 12:45:09 +00:00
Dmitry Chagin	fc90f3a281	ktrace: Increase precision of timestamps. Replace struct timeval in header with struct timespec. To differentiate header formats, add a new KTR_VERSIONED flag set in the header type field similar to the existing KTRDROP flag. To make it easier to extend ktrace headers in the future, extend the existing header with a version field (version 0 is reserved for older records without KTR_VERSIONED) as well as new fields holding the thread ID and CPU ID. Reviewed by: jhb, pauamma Differential Revision: https://reviews.freebsd.org/D35774 MFC after: 2 weeks	2022-07-16 12:46:12 +03:00
John Baldwin	2cf7870864	Collapse interrupt thread priorities. Allow high priority hardware interrupts to run at PI_REALTIME via INTR_TYPE_CLK, but collapse all other hardware interrupt threads to the next priority level (PI_INTR). Collapse all SWI priorities to the same priority level (PI_SOFT) just below PI_INTR. Reviewed by: kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35646	2022-07-14 13:14:33 -07:00
John Baldwin	40efe74352	4bsd: Simplistic time-sharing for interrupt threads. If an interrupt thread runs for a full quantum without yielding the CPU, demote its priority and schedule a preemption to give other ithreads a turn. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35645	2022-07-14 13:14:17 -07:00
John Baldwin	954cffe95d	ule: Simplistic time-sharing for interrupt threads. If an interrupt thread runs for a full quantum without yielding the CPU, demote its priority and schedule a preemption to give other ithreads a turn. Reviewed by: kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35644	2022-07-14 13:13:57 -07:00
John Baldwin	ed998d1c24	ithreads: Support priority adjustment by schedulers. Use sched_wakeup instead of sched_add when marking an ithread runnable. This allows schedulers to reset their internal time slice tracking state and restore the base ithread priority when an ithread resumes from idle. Reviewed by: markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35643	2022-07-14 13:13:35 -07:00
John Baldwin	fea89a2804	Add sched_ithread_prio to set the base priority of an interrupt thread. Use it instead of sched_prio when setting the priority of an interrupt thread. Reviewed by: kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35642	2022-07-14 13:13:10 -07:00
Mark Johnston	6cbc4ceb7a	sched_ule: Use the correct atomic_load variant for tdq_lowpri Reported by: tuexen Fixes: `11484ad8a2` ("sched_ule: Use explicit atomic accesses for tdq fields")	2022-07-14 15:34:02 -04:00
Mark Johnston	11484ad8a2	sched_ule: Use explicit atomic accesses for tdq fields Different fields in the tdq have different synchronization protocols. Some are constant, some are accessed only while holding the tdq lock, some are modified with the lock held but accessed without the lock, some are accessed only on the tdq's CPU, and some are not synchronized by the lock at all. Convert ULE to stop using volatile and instead use atomic_load_* and atomic_store_* to provide the desired semantics for lockless accesses. This makes the intent of the code more explicit, gives more freedom to the compiler when accesses do not need to be qualified, and lets KCSAN intercept unlocked accesses. Thus: - Introduce macros to provide unlocked accessors for certain fields. - Use atomic_load/store for all accesses of tdq_cpu_idle, which is not synchronized by the mutex. - Use atomic_load/store for accesses of the switch count, which is updated by sched_clock(). - Add some comments to fields of struct tdq describing how accesses are synchronized. No functional change intended. Reviewed by: mav, kib MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35737	2022-07-14 10:45:33 -04:00
Mark Johnston	0927ff7814	sched_ule: Enable preemption of curthread in the load balancer The load balancer executes from statclock and periodically tries to move threads among CPUs in order to balance load. It may move a thread to the current CPU (the loader balancer always runs on CPU 0). When it does so, it may need to schedule preemption of the interrupted thread. Use sched_setpreempt() to do so, same as sched_add(). PR: 264867 Reviewed by: mav, kib, jhb MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35744	2022-07-14 10:27:58 -04:00
Mark Johnston	6d3f74a14a	sched_ule: Fix racy loads of pc_curthread Thread switching used to be atomic with respect to the current CPU's tdq lock. Since commit `686bcb5c14` that is no longer the case. Now sched_switch() does this: 1. lock tdq (might already be locked) 2. maybe put the current thread in the tdq, choose a new thread to run 2a. update tdq_lowpri 3. unlock tdq 4. switch CPU context, update curthread Some code paths in ULE will load pc_curthread from a remote CPU with that CPU's tdq lock held, usually to inspect its priority. But, as of the aforementioned commit this is racy. The problem I noticed is in tdq_notify(), which optionally sends an IPI to a remote CPU when a new thread is added to its runqueue. If the new thread's priority is higher (lower) than the currently running thread's priority, then we deliver an IPI. But inspecting pc_curthread->td_priority doesn't work, since pc_curthread might be between steps 3 and 4 above. If pc_curthread's priority is higher than that of the newly added thread, but pc_curthread is switching to a lower-priority thread, then tdq_notify() might fail to deliever an IPI, leaving a high priority thread stuck on the runqueue for longer than it should. This can cause multi-millisecond stalls in interactive/ithread/realtime threads. Fix this problem by modifying tdq_add() and tdq_move() to return the value of tdq_lowpri before the addition of the new thread. This ensures that tdq_notify() has the correct priority value to compare against. The other two uses of pc_curthread are susceptible to the same race. To fix the one in sched_rem()->tdq_setlowpri() we need to have an exact value for curthread. Thus, introduce a new tdq_curthread field to the tdq which gets updated any time a new thread is selected to run on the CPU. Because this field is synchronized by the thread lock, its priority reflects the correct lowpri value for the tdq. PR: 264867 Fixes: `686bcb5c14` ("schedlock 4/4") Reviewed by: mav, kib, jhb MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35736	2022-07-14 10:27:51 -04:00
Mark Johnston	ef221ff645	time: Make realitexpire() local to kern_time.c MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-07-13 09:57:28 -04:00
Mark Johnston	38e1d32dab	callout: Simplify cpuid validation in callout_reset_sbt_on() - Remove a flag variable. - Convert a runtime check of the passed cpuid to a KASSERT. - Remove the cc_inited flag. An attempt to schedule a callout before SI_SUB_CPU will crash anyway since the per-CPU mutexes won't have been initialized, and that flag was only checked in the case where a cpuid was explicitly specified by the caller. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2022-07-13 09:47:33 -04:00
Mark Johnston	ece453d5fa	eventtimer: Simplify KTR traces Stop including the current CPU in all event messages, since it's already saved in KTR log entries and thus is redundant. All eventtimer traces occur in a context where CPU migration is not possible. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-07-11 15:58:43 -04:00
Mark Johnston	a889a65ba3	eventtimer: Fix several races in the timer reload code In handleevents(), lock the timer state before fetching the time for the next event. A concurrent callout_cc_add() call might be changing the next event time, and the race can cause handleevents() to program an out-of-date time, causing the callout to run later (by an unbounded period, up to the idle hardclock period of 1s) than requested. In cpu_idleclock(), call getnextcpuevent() with the timer state mutex held, for similar reasons. In particular, cpu_idleclock() runs with interrupts enabled, so an untimely timer interrupt can result in a stale next event time being programmed. Further, an interrupt can cause cpu_idleclock() to use a stale value for "now". In cpu_activeclock(), disable interrupts before loading "now", so as to avoid going backwards in time when calling handleevents(). It's ok to leave interrupts enabled when checking "state->idle", since the race at worst will cause handleevents() to be called unnecessarily. But use an atomic load to indicate that the test is racy. PR: 264867 Reviewed by: mav, jhb, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35735	2022-07-11 15:58:43 -04:00
Mark Johnston	ebb3cb6195	eventtimer: Pass a pcpu state pointer to getnext(cpu)event() Callers have already loaded the pointer, so these functions don't need to fetch it again. No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-07-11 15:58:43 -04:00
Mark Johnston	ba71333f60	sched_ule: Fix a typo in a comment PR: 226107 MFC after: 1 week	2022-07-11 15:58:43 -04:00
Mark Johnston	ef80894c9d	sched_ule: Purge an obsolete comment The referenced bitmask was removed in commit `62fa74d95a`. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-07-11 15:58:43 -04:00
Mark Johnston	35dd6d6cb5	sched_ule: Eliminate a superfluous local variable in tdq_move() No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-07-11 15:58:43 -04:00
Gleb Smirnoff	c261510ef5	sockets: fix setsockopt(SO_RCVTIMEO) on a listening socket MFC after: 3 weeks	2022-07-08 11:33:24 -07:00
Mitchell Horne	258958b3c7	ddb: use _FLAGS command macros where appropriate Some command definitions were forced to use DB_FUNC in order to specify their required flags, CS_OWN or CS_MORE. Use the new macros to simplify these. Reviewed by: markj, jhb MFC after: 3 days Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D35582	2022-07-05 11:56:55 -03:00
Gleb Smirnoff	d8596171c5	sockets: use only soref()/sorele() as socket reference count o Retire SS_FDREF as it is basically a debug flag on top of already existing soref()/sorele(). o Convert SS_PROTOREF into soref()/sorele(). o Change reference model for the listen queues, see below. o Make sofree() private. The correct KPI to use is only sorele(). o Make soabort() respect the model and sorele() instead of sofree(). Note on listening queues. Until now the sockets on a queue had zero reference count. And the reference were given only upon accept(2). The assumption was that there is no way to see the queued socket from anywhere except its head. This is not true, since queued sockets already have pcbs, which are linked at least into the global pcb lists. With this change we put the reference right in the sonewconn() and on accept(2) path we just hand the existing reference to the file descriptor. Differential revision: https://reviews.freebsd.org/D35679	2022-07-04 12:40:51 -07:00
Gleb Smirnoff	bc7605647c	sockets: use positive flag for file descriptor socket reference Rename SS_NOFDREF to SS_FDREF and flip all bitwise operations. Mark sockets created by socreate() with SS_FDREF. This change is mostly illustrative. With it we see that SS_FDREF is a debugging flag, since: * socreate() takes a reference with soref(). * on accept path solisten_dequeue() takes a reference with soref() and then soaccept() sets SS_FDREF. * soclose() checks SS_FDREF, removes it and does sorele(). Reviewed by: tuexen Differential revision: https://reviews.freebsd.org/D35678	2022-07-04 12:40:51 -07:00
Warner Losh	b69996d1d5	tty: Default to printing kernel stack traceback only on INVARIANT kernels Change the default from printing a breif kernel thread stack informaton back to omitting it for non-invariant kernels in response to SIGINFO/^T. Full and brief stack support can be selected with the kern.tty_info_kstacks sysctl. MFC After: 2 weeks Sponsored by: Netflix Reviewed by: grembo, jhb Differential Revision: https://reviews.freebsd.org/D35576	2022-07-02 08:02:12 -06:00
John Baldwin	0bd73da206	busdma_bounce: Use PRI_ITHD scheduling class for worker thread. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35641	2022-06-30 10:06:04 -07:00
John Baldwin	0288d4277f	Add register sets for NT_THRMISC and NT_PTLWPINFO. For the kernel this is mostly a non-functional change. However, this will be useful for simplifying gcore(1). Reviewed by: markj MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D35666	2022-06-30 10:04:56 -07:00
Gleb Smirnoff	66c8e3fccf	socket: fix listen(2) on an already listening socket Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35669 Fixes: `141fe2dcee`	2022-06-30 07:50:29 -07:00
Konstantin Belousov	ad175a107b	vfs_mount.c: convert explicit panics and KASSERTs to MPASSERT/MPPASS Reviewed by: imp, mjg Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35652	2022-06-29 21:31:47 +03:00
Konstantin Belousov	1e54362824	vfs_op_exit(): assert that mnt_vfs_ops stays non-zero for unmount or suspend Reviewed by: mjg Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35639	2022-06-29 21:31:47 +03:00
Jamie Gritton	7060da62ff	jail: Remove a prison's shared memory when it dies Add shm_remove_prison(), that removes all POSIX shared memory segments belonging to a prison. Call it from prison_cleanup() so a prison won't be stuck in a dying state due to the resources still held. PR: 257555 Reported by: grembo	2022-06-29 10:47:39 -07:00
Jamie Gritton	a9f7455c38	jail: add prison_cleanup() to release resources held by a dying jail Currently, when a jail starts dying, either by losing its last user reference or by being explicitly killed, osd_jail_call(...PR_METHOD_REMOVE...) is called. Encapsulate this into a function prison_cleanup() that can then do other cleanup.	2022-06-29 10:33:05 -07:00
Gleb Smirnoff	48a55bbfe9	unix: change error code for recvmsg() failed due to RLIMIT_NOFILE Instead of returning EMSGSIZE pass the error code from fdallocn() directly to userland. That would be EMFILE, which makes much more sense. This error code is not listed in the specification[1], but the specification doesn't cover such edge case at all. Meanwhile the specification lists EMSGSIZE as the error code for invalid value of msg_iovlen, and FreeBSD follows that, see sys_recmsg(). Differentiating these two cases will make a developer/admin life much easier when debugging. [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/recvmsg.html Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35640	2022-06-29 09:42:58 -07:00
Kristof Provost	ab91feabcc	ovpn: Introduce OpenVPN DCO support OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing (i.e. tunneling and cryptography) into the kernel, rather than using tap devices. This avoids significant copying and context switching overhead between kernel and user space and improves OpenVPN throughput. In my test setup throughput improved from around 660Mbit/s to around 2Gbit/s. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34340	2022-06-28 11:33:10 +02:00
Mateusz Guzik	7388fb714a	cache: drop the vfs.cache_rename_add tunable The functionality has been in use since Jan 2021 -- long enough(tm).	2022-06-27 09:56:20 +02:00
Gleb Smirnoff	458f475df8	unix/dgram: smart socket buffers for one-to-many sockets A one-to-many unix/dgram socket is a socket that has been bound with bind(2) and can get multiple connections. A typical example is /var/run/log bound by syslogd(8) and receiving multiple connections from libc syslog(3) API. Until now all of these connections shared the same receive socket buffer of the bound socket. This made the socket vulnerable to overflow attack. See `240d5a9b1c` for a historical attempt to workaround the problem. This commit creates a per-connection socket buffer for every single connected socket and eliminates the problem. The new behavior will optimize seldom writers over frequent writers. See added test case scenarios and code comments for more detailed description of the new behavior. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35303	2022-06-24 09:09:11 -07:00
Gleb Smirnoff	1093f16487	unix/dgram: reduce mbuf chain traversals in send(2) and recv(2) o Use m_pkthdr.memlen from m_uiotombuf() o Modify unp_internalize() to keep track of allocated space and memory as well as pointer to the last buffer. o Modify unp_addsockcred() to keep track of allocated space and memory as well as pointer to the last buffer. o Record the datagram len/memlen/ctllen in the first (from) mbuf of the chain in uipc_sosend_dgram() and reuse it in uipc_soreceive_dgram(). Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35302	2022-06-24 09:09:11 -07:00
Gleb Smirnoff	9b841b0e23	m_uiotombuf: write total memory length of the allocated chain in pkthdr Data allocated by m_uiotombuf() usually goes into a socket buffer. We are interested in the length of useful data to be added to sb_acc, as well as total memory used by mbufs. The later would be added to sb_mbcnt. Calculating this value at allocation time allows to save on extra traversal of the mbuf chain. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35301	2022-06-24 09:09:11 -07:00
Gleb Smirnoff	a7444f807e	unix/dgram: use minimal possible socket buffer for PF_UNIX/SOCK_DGRAM This change fully splits away PF_UNIX/SOCK_DGRAM from other socket buffer implementations, without any behavior changes. Generic socket implementation is reduced down to one STAILQ and very little code. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35300	2022-06-24 09:09:11 -07:00
Gleb Smirnoff	a4fc41423f	sockets: enable protocol specific socket buffers Split struct sockbuf into common shared fields and protocol specific union, where protocols are free to implement whatever buffer they want. Such protocols should mark themselves with PR_SOCKBUF and are expected to initialize their buffers in their pr_attach and tear them down in pr_detach. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35299	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	315167c0de	unix: provide an option to return locked from unp_connectat() Use this new version in unix/dgram socket when sending to a target address. This removes extra lock release/acquisition and possible counter-intuitive ENOTCONN. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35298	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	5dc8dd5f3a	unix/dgram: inline sbappendaddr_locked() into uipc_sosend_dgram() This allows to remove one M_NOWAIT allocation and also makes it more clear what's going on. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35297	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	e3fbbf965e	unix/dgram: add a specific receive method - uipc_soreceive_dgram With this second step PF_UNIX/SOCK_DGRAM has protocol specific implementation. This gives some possibility performance optimizations. However, it still operates on the same struct socket as all other sockets do. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35296	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	f384a97c83	unix/dgram: cleanup uipc_send of PF_UNIX/SOCK_DGRAM, step 2 Just remove one level of indentation as the case clause always match. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35295	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	7e5b6b391e	unix/dgram: cleanup uipc_send of PF_UNIX/SOCK_DGRAM, step 1 Remove the dead code. The new uipc_sosend_dgram() handles send() on PF_UNIX/SOCK_DGRAM in full. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35294	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	3464958246	unix/dgram: add a specific send method - uipc_sosend_dgram() This is first step towards splitting classic BSD socket implementation into separate classes. The first to be split is PF_UNIX/SOCK_DGRAM as it has most differencies to SOCK_STREAM sockets and to PF_INET sockets. Historically a protocol shall provide two methods for sendmsg(2): pru_sosend and pru_send. The former is a generic send method, e.g. sosend_generic() which would internally call the latter, uipc_send() in our case. There is one important exception, though, the sendfile(2) code will call pru_send directly. But sendfile doesn't work on SOCK_DGRAM, so we can do the trick. We will create socket class specific uipc_sosend_dgram() which will carry only important bits from sosend_generic() and uipc_send(). Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35293	2022-06-24 09:09:10 -07:00
Mitchell Horne	29afffb942	subr_bus: restore bus_null_rescan() Partially revert the previous change; we need to keep this method as a specific override for pci_driver subclasses which should not use pci_rescan_method() -- cardbus and ofw_pcibus. However, change the return value to ENODEV for the same reasoning given in the original commit, and use this as the default rescan method in bus_if.m. Reported by: jhb Fixes: `36a8572ee8` ("bus_if: provide a default null rescan method") MFC with: `36a8572ee8`	2022-06-23 16:07:00 -03:00
Mitchell Horne	8701571df9	set_cputicker: use a bool The third argument to this function indicates whether the supplied ticker is fixed or variable, i.e. requiring calibration. Give this argument a type and name that better conveys this purpose. Reviewed by: kib, markj MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35459	2022-06-23 15:15:11 -03:00
Mitchell Horne	36a8572ee8	bus_if: provide a default null rescan method There is an existing helper method in subr_bus.c, but almost no drivers know to use it. It also returns the same error as an empty method, making it not very useful. Move this to bus_if.m and return a more sensible error code. This gives a slightly more meaningful error message when attempting 'devctl rescan' on buses and devices alike: "Device not configured" --> "Operation not supported by device" Reviewed by: imp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35501	2022-06-23 15:15:10 -03:00
Chuck Silvers	5bd21cbbd1	vfs: fix vfs_bio_clrbuf() for PAGE_SIZE > block size Calculate the desired page valid mask using math that will not overflow the types used. Sponsored by: Netflix Reviewed by: mckusick, kib, markj Differential Revision: https://reviews.freebsd.org/D34837	2022-06-21 17:58:52 -07:00
Mark Johnston	9553bc89db	aio: Improve UMA usage - Remove the AIO proc zone. This zone gets one allocation per AIO daemon process, which isn't enough to warrant a dedicated zone. Plus, unlike other AIO structures, aiops are small (32 bytes with LP64), so UMA doesn't provide better space efficiency than malloc(9). Change one of the malloc types in vfs_aio.c to make it more general. - Don't set the NOFREE flag on the other AIO zones. This flag means that memory allocated to the AIO subsystem is never freed back to the VM, so it's always preferable to avoid using it when possible. NOFREE was set without explanation when AIO was converted to use UMA 20 years ago, but it does not appear to be required; all of the structures allocated from UMA (per-process kaioinfo, kaiocb, and aioliojob) keep track of references and get freed only when none exist. Plus, these structures will contain dangling pointer after they're freed (e.g., the "cred", "fd_file" and "uiop" fields of struct kaiocb), so use-after-frees are dangerous even when the structures themselves are type-stable. Reviewed by: asomers MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35493	2022-06-20 12:48:13 -04:00
Damjan Jovanovic	8c309d48aa	struct kinfo_file changes needed for lsof to work using only usermode APIs` Add kf_pipe_buffer_[in/out/size] fields to kf_pipe, and populate them. Add a kf_kqueue struct to the kf_un union, to allow querying kqueue state, and populate it. Populate the kf_sock_rcv_sb_state and kf_sock_snd_sb_state fields in kf_sock for INET/INET6 sockets, and populate all other fields for all transport layer protocols, not just TCP. Bump __FreeBSD_version. Differential revision: https://reviews.freebsd.org/D34184 Reviewed by: jhb, kib, se MFC after: 1 week	2022-06-18 12:34:25 +03:00
Damjan Jovanovic	8ae7694913	KERN_LOCKF: report kl_file_fsid consistently with stat(2) PR: 264723 Reviewed by: kib Discussed with: markj MFC after: 1 week	2022-06-18 12:34:17 +03:00
Mark Johnston	f6379f7fde	socket: Fix a race between kevent(2) and listen(2) When locking the knote list for a socket, we check whether the socket is a listening socket in order to select the appropriate mutex; a listening socket uses the socket lock, while data sockets use socket buffer mutexes. If SOLISTENING(so) is false and the knote lock routine locks a socket buffer, then it must re-check whether the socket is a listening socket since solisten_proto() could have changed the socket's identity while we were blocked on the socket buffer lock. Reported by: syzkaller Reviewed by: glebius MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35492	2022-06-16 10:20:04 -04:00
Mark Johnston	756bc3adc5	kasan: Create a shadow for the bootstack prior to hammer_time() When the kernel is compiled with -asan-stack=true, the address sanitizer will emit inline accesses to the shadow map. In other words, some shadow map accesses are not intercepted by the KASAN runtime, so they cannot be disabled even if the runtime is not yet initialized by kasan_init() at the end of hammer_time(). This went unnoticed because the loader will initialize all PML4 entries of the bootstrap page table to point to the same PDP page, so early shadow map accesses do not raise a page fault, though they are silently corrupting memory. In fact, when the loader does not copy the staging area, we do get a page fault since in that case only the first and last PML4Es are populated by the loader. But due to another bug, the loader always treated KASAN kernels as non-relocatable and thus always copied the staging area. It is not really practical to annotate hammer_time() and all callees with __nosanitizeaddress, so instead add some early initialization which creates a shadow for the boot stack used by hammer_time(). This is only needed by KASAN, not by KMSAN, but the shared pmap code handles both. Reported by: mhorne Reviewed by: kib MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35449	2022-06-15 11:39:10 -04:00
Doug Ambrisko	ce00b11940	mount: revert the active vnode reporting feature Revert the computing of active vnode reporting since statfs is used by a lot of tools. Only report the vnodes used. Reported by: mjg	2022-06-15 07:24:55 -07:00
Mark Johnston	7565431f30	mount: Fix an incorrect assertion in kernel_mount() The pointer to the mount values may be null if an error occurred while copying them in, so fix the assertion condition to reflect that possibility. While here, move some initialization code into the error == 0 block. No functional change intended. Reported by: syzkaller MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2022-06-14 12:00:59 -04:00
Mark Johnston	630f633f2a	vm_object: Use the vm_object_(set\|clear)_flag() helpers ... rather than setting and clearing flags inline. No functional change intended. Reviewed by: alc, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35469	2022-06-14 12:00:59 -04:00
Mark Johnston	e8955bd643	pipe: Use a distinct wait channel for I/O serialization Suppose a thread tries to read from an empty pipe. pipe_read() does the following: 1. pipelock(), possibly sleeping 2. check for buffered data 3. pipeunlock() 4. set PIPE_WANTR and sleep 5. goto 1 pipelock() is an open-coded mutex; if a thread blocks in pipelock(), it sleeps until the lock holder calls pipeunlock(). Both sleeps use the same wait channel. So if there are multiple threads in pipe_read(), a thread T1 in step 3 can wake up a thread T2 sleeping in step 4. Then T1 goes to sleep in step 4, and T2 acquires and releases the pipelock, waking up T1 again. This can go on indefinitely, livelocking the process (and potentially starving a would-be writer). Fix the problem by using a separate wait channel for pipelock(). Reported by: Paul Floyd <paulf2718@gmail.com> Reviewed by: mjg, kib PR: 264441 MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35415	2022-06-14 12:00:59 -04:00
Cy Schubert	d781401512	kern_thread.c: Fix i386 build Chase `4493a13e3b` by updating static assertions of struct proc.	2022-06-13 19:35:33 -07:00
Konstantin Belousov	1575804961	reap_kill_proc(): avoid singlethreading any other process if we are exiting This is racy because curproc process lock is not used, but allows the process to exit faster. It is userspace issue to create such race anyway, and not fullfilling the guarantee that all reaper descendants are signalled should be fine. In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:03 +03:00
Konstantin Belousov	e0343eacf3	reap_kill_subtree(): hold the reaper when entering it into the queue to handle later We drop proctree_lock, which allows the process to exit while memoized in the list to proceed. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:03 +03:00
Konstantin Belousov	1d4abf2cfa	reap_kill_subtree_once(): handle proctree_lock unlock in reap_kill_proc() Recorded reaper might loose its reaper status, so we should not assert it, but check and avoid signalling if this happens. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 week Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:03 +03:00
Konstantin Belousov	addf103ce6	reap_kill_proc: do not retry on thread_single() failure The failure means that the process does single-threading itself, which makes our action not needed. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:03 +03:00
Konstantin Belousov	008b2e6544	Make stop_all_proc_block interruptible to avoid deadlock with parallel suspension If we try to single-thread a process which thread entered procctl(REAP_KILL_SUBTREE), and sleeping waiting for us unlocking stop_all_proc_blocker, we must be able to finish single-threading. This requires the sleep to be interruptible. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:03 +03:00
Mark Johnston	2d5ef216b6	thread_single_end(): consistently maintain p_boundary_count for ALLPROC mode Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 week Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:03 +03:00
Konstantin Belousov	1b4701fe1e	thread_unsuspend(): do not unuspend the suspended leader thread doing SINGLE_ALLPROC markj wrote: tdsendsignal() may unsuspend a target thread. I think there is at least one bug there: suppose thread T is suspended in thread_single(SINGLE_ALLPROC) when trying to kill another process with REAP_KILL. Suppose a different thread sends SIGKILL to T->td_proc. Then, tdsendsignal() calls thread_unsuspend(T, T->td_proc). thread_unsuspend() incorrectly decrements T->td_proc->p_suspcount to -1. Later, when T->td_proc exits, it will wait forever in thread_single(SINGLE_EXIT) since T->td_proc->p_suspcount never reaches 1. Since the thread suspension is bounded by time needed to do thread_single(), skipping the thread_unsuspend_one() call there should not affect signal delivery if this thread is selected as target. Reported by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:03 +03:00
Konstantin Belousov	b9009b1789	thread_single(): remove already checked conditional expression Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:03 +03:00
Konstantin Belousov	4493a13e3b	Do not single-thread itself when the process single-threaded some another process Since both self single-threading and remote single-threading rely on suspending the thread doing thread_single(), it cannot be mixed: thread doing thread_suspend_switch() might be subject to thread_suspend_one() and vice versa. In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:03 +03:00
Konstantin Belousov	dd883e9a7e	weed_inhib(): correct the condition to re-suspend a thread suspended for SINGLE_ALLPROC mode. There is no need to check for boundary state. It is only required to see that the suspension comes from the ALLPROC mode. In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:03 +03:00
Konstantin Belousov	b9893b3533	weed_inhib(): do not double-suspend already suspended thread if the loop reiterates In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:02 +03:00
Konstantin Belousov	d7a9e6e740	thread_single: wait for P_STOPPED_SINGLE to pass to avoid ALLPROC mode to try to race with any other single-threading mode. In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:02 +03:00
Konstantin Belousov	02a2aacbe2	issignal(): ignore signals when process is single-threading for exit Places that will wait for curproc->p_singlethr to become zero (in the next commit, the counter of number of external single-threading is to be introduced), must wait for it interruptible, otherwise we deadlock. On the other hand, a signal delivered during this window, if directed to the waiting thread, would cause the wait loop to become a busy loop. Since we are exiting, it is safe to ignore the signals. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:02 +03:00
Konstantin Belousov	d3000939c7	P2_WEXIT: avoid thread_single() for exiting process earlier before the process itself does thread_single(SINGLE_EXIT). We cannot single-thread such process in ALLPROC (external) mode, and properly detect and report the failure to do so due to the process becoming zombie is easier to prevent than handle. In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D35310	2022-06-13 22:30:02 +03:00
Doug Ambrisko	6468cd8e0e	mount: add vnode usage per file system with mount -v This avoids the need to drop into the ddb to figure out vnode usage per file system. It helps to see if they are or are not being freed. Suggestion to report active vnode count was from kib@ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35436	2022-06-13 07:56:38 -07:00
Hans Petter Selasky	b8394039dc	mbuf(9): Fix size of mbuf for all 32-bit platforms (i386, ARM, PowerPC and RISCV) Do this by reducing the size of the MBUF_PEXT_MAX_PGS, causing "struct mbuf" to be bigger than M_SIZE, and also add a missing padding field to ensure 64-bit alignment. Reviewed by: gallatin@ Reported by: Elliott Mitchell Differential revision: https://reviews.freebsd.org/D35339 MFC after: 1 week Sponsored by: NVIDIA Networking	2022-06-07 22:09:10 +02:00
Hans Petter Selasky	fe8c78f0d2	ktls: Add full support for TLS RX offloading via network interface. Basic TLS RX offloading uses the "csum_flags" field in the mbuf packet header to figure out if an incoming mbuf has been fully offloaded or not. This information follows the packet stream via the LRO engine, IP stack and finally to the TCP stack. The TCP stack preserves the mbuf packet header also when re-assembling packets after packet loss. When the mbuf goes into the socket buffer the packet header is demoted and the offload information is transferred to "m_flags" . Later on a worker thread will analyze the mbuf flags and decide if the mbufs making up a TLS record indicate a fully-, partially- or not decrypted TLS record. Based on these three cases the worker thread will either pass the packet on as-is or recrypt the decrypted bits, if any, or decrypt the packet as usual. During packet loss the kernel TLS code will call back into the network driver using the send tag, informing about the TCP starting sequence number of every TLS record that is not fully decrypted by the network interface. The network interface then stores this information in a compressed table and starts asking the hardware if it has found a valid TLS header in the TCP data payload. If the hardware has found a valid TLS header and the referred TLS header is at a valid TCP sequence number according to the TCP sequence numbers provided by the kernel TLS code, the network driver then informs the hardware that it can resume decryption. Care has been taken to not merge encrypted and decrypted mbuf chains, in the LRO engine and when appending mbufs to the socket buffer. The mbuf's leaf network interface pointer is used to figure out from which network interface the offloading rule should be allocated. Also this pointer is used to track route changes. Currently mbuf send tags are used in both transmit and receive direction, due to convenience, but may get a new name in the future to better reflect their usage. Reviewed by: jhb@ and gallatin@ Differential revision: https://reviews.freebsd.org/D32356 Sponsored by: NVIDIA Networking	2022-06-07 12:58:09 +02:00
Hans Petter Selasky	f0fca64618	ktls: Refer send tag pointer once. So that the asserts and the actual code see the same values. Differential revision: https://reviews.freebsd.org/D32356 MFC after: 1 week Sponsored by: NVIDIA Networking	2022-06-07 12:57:03 +02:00
Hans Petter Selasky	4d88d81c31	mbuf(9): Implement a leaf network interface field in the mbuf packet header. When packets are received they may traverse several network interfaces like vlan(4) and lagg(9). When doing receive side offloads it is important to know the first network interface entry point, because that is where all offloading is taking place. This makes it possible to track receive side route changes for multiport setups, for example when lagg(9) receives traffic from more than one port. This avoids having to install multiple offloading rules for the same stream. This field works similar to the existing "rcvif" mbuf packet header field. Submitted by: jhb@ Reviewed by: gallatin@ and gnn@ Differential revision: https://reviews.freebsd.org/D35339 Sponsored by: NVIDIA Networking Sponsored by: Netflix	2022-06-07 12:54:42 +02:00
Gleb Smirnoff	d97922c6c6	unix/*: rewrite unp_internalize() cmsg parsing cycle Make it a complex, but a single for(;;) statement. The previous cycle with some loop logic in the beginning and some loop logic at the end was confusing. Both me and markj@ were misleaded to a conclusion that some checks are unnecessary, while they actually were necessary. While here, handle an edge case found by Mark, when on 64-bit platform an incorrect message from userland would underflow length counter, but return without any error. Provide a test case for such message. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35375	2022-06-06 10:05:28 -07:00
Yuichiro NAITO	8d95f50052	smp: Use local copies of the setup function pointer and argument No functional change intended. PR: 264383 Reviewed by: jhb, markj MFC after: 1 week	2022-06-06 11:29:51 -04:00
Gleb Smirnoff	2573e6ced9	unix/dgram: rename unpdg_sendspace to unpdg_maxdgram Matches the meaning of the variable and sysctl node name.	2022-06-03 12:55:44 -07:00
Gleb Smirnoff	a8e286bb5d	sockets: use socket buffer mutexes in struct socket directly Convert more generic socket code to not use sockbuf compat pointer. Continuation of `4328318445`.	2022-06-03 12:55:44 -07:00
Mitchell Horne	35eb9b10c2	Use KERNEL_PANICKED() in more places This is slightly more optimized than checking panicstr directly. For most of these instances performance doesn't matter, but let's make KERNEL_PANICKED() the common idiom. Reviewed by: mjg MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D35373	2022-06-02 10:15:43 -03:00
Gleb Smirnoff	f083739350	soo_aio_*: use socket buffer mutexes in struct socket directly A miss from commit `4328318445`.	2022-05-30 20:46:38 -07:00
Dmitry Chagin	d46174cd88	Finish cpuset_getaffinity() after `f35093f8` Split cpuset_getaffinity() into a two counterparts, where the user_cpuset_getaffinity() is intended to operate on the cpuset_t from user va, while kern_cpuset_getaffinity() expects the cpuset from kernel va. Accordingly, the code that clears the high bits is moved to the user_cpuset_getaffinity(). Linux sched_getaffinity() syscall returns the size of set copied to the user-space and then glibc wrapper clears the high bits. MFC after: 2 weeks	2022-05-28 20:53:08 +03:00
Dmitry Chagin	31d1b816fe	sysent: Get rid of bogus sys/sysent.h include. Where appropriate hide sysent.h under proper condition. MFC after: 2 weeks	2022-05-28 20:52:17 +03:00
Gleb Smirnoff	d64f2f42c1	unix: unp_externalize() can M_WAITOK Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35318	2022-05-27 20:48:38 -07:00
Gleb Smirnoff	d59bc188d6	sockbuf: remove unused mbuf counter and cluster counter With M_EXTPG mbufs these two counters already do not represent the reality. As we are moving towards protocol independent socket buffers, which may not even use mbufs at all, the counters become less and less relevant. The only userland seeing them was 'netstat -x'. PR: 264181 (exp-run) Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35334	2022-05-27 08:20:17 -07:00
Gleb Smirnoff	75e7e3ce34	unix: fix incorrect assertion in `4682ac697c` Pointy hat to: glebius Fixes: `4682ac697c`	2022-05-26 11:35:05 -07:00
Gleb Smirnoff	4682ac697c	unix: turn check in unp_externalize() into assertion In this function we always work with mbufs that we previously created ourselves in unp_internalize(). They must be valid. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35319	2022-05-25 13:29:20 -07:00
Gleb Smirnoff	579b45e203	unix/*: check new control size in unp_internalize() Now that we call sbcreatecontrol() with M_WAITOK, we are expected to pass a valid size. Return same error code, we are returning for an oversized control from sockargs(). Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35317	2022-05-25 13:29:13 -07:00
Gleb Smirnoff	d60ea9a10a	sockets: return EMSGSIZE if control part of message is too large Specification doesn't list an explicit error code for the control size specified by msg_control being too large. But it does list EMSGSIZE as error code for "message is too large to be sent all at once (as the socket requires)". It also lists EINVAL as code for the "The sum of the iov_len values overflows an ssize_t." Given how generic and uninformative EINVAL is, the EMSGSIZE is more appropriate. https://pubs.opengroup.org/onlinepubs/9699919799/functions/sendmsg.html Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35316	2022-05-25 13:29:04 -07:00
Gleb Smirnoff	ad51c47fb4	sockbuf: fix assertion in sbcreatecontrol() Fixes: `6890b58814`	2022-05-25 00:19:41 -07:00
Mark Johnston	524dadf7a8	kevent: Fix an off-by-one in filt_timerexpire_l() Suppose a periodic kevent timer fires close to its deadline, so that now - kc->next is small. Then delta ends up being 1, and the next timer deadline is set to (delta + 1) * kc->to, where kc->to is the timer period. This means that the timer fires at half of the requested rate, and the value returned in kn_data is similarly inaccurate. PR: 264131 Fixes: `7cb40543e9` ("filt_timerexpire: do not iterate over the interval") Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35313	2022-05-24 20:14:33 -04:00
Mateusz Guzik	cdb337b097	vfs: fix copy-pasto in previous Reported by: dchagin	2022-05-20 20:58:11 +00:00
Mateusz Guzik	ec3c225711	vfs: call vn_truncate_locked from kern_truncate This fixes a bug where the syscall would not bump writecount. PR: 263999	2022-05-20 17:25:51 +00:00
Mateusz Guzik	6b715687bd	vfs: make sure truncate always calls NDFREE_* While here convert it to NDFREE_NOTHING.	2022-05-20 17:25:51 +00:00
Mark Johnston	4a3e51335e	cpuset: Fix the KASAN and KMSAN builds Rename the "copyin" and "copyout" fields of struct cpuset_copy_cb to something less generic, since sanitizers define interceptors for copyin() and copyout() using #define. Reported by: syzbot+2db5d644097fc698fb6f@syzkaller.appspotmail.com Fixes: `47a57144af` ("cpuset: Byte swap cpuset for compat32 on big endian architectures") Sponsored by: The FreeBSD Foundation	2022-05-20 10:34:25 -04:00
Dmitry Chagin	eca368ecb6	Retire sv_transtrap Call translate_traps directly from sendsig(). MFC after: 2 weeks	2022-05-20 14:54:03 +03:00
Dmitry Chagin	2479e381cd	kqueue: Trim trailing whitespace MFC after: 1 week	2022-05-19 19:52:02 +03:00
Justin Hibbits	47a57144af	cpuset: Byte swap cpuset for compat32 on big endian architectures Summary: BITSET uses long as its basic underlying type, which is dependent on the compile type, meaning on 32-bit builds the basic type is 32 bits, but on 64-bit builds it's 64 bits. On little endian architectures this doesn't matter, because the LSB is always at the low bit, so the words get effectively concatenated moving between 32-bit and 64-bit, but on big-endian architectures it throws a wrench in, as setting bit 0 in 32-bit mode is equivalent to setting bit 32 in 64-bit mode. To demonstrate: 32-bit mode: BIT_SET(foo, 0): 0x00000001 64-bit sees: 0x0000000100000000 cpuset is the only system interface that uses bitsets, so solve this by swapping the integer sub-components at the copyin/copyout points. Reviewed by: kib MFC after: 3 days Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D35225	2022-05-19 10:49:55 -05:00
Andrew Turner	11a6ecd425	Handle cas failure when the compare succeeds When locking a priority inherit mutex we perform a compare and swap operation to try and acquire the mutex. This may fail even when the compare succeeds. Check and handle this case. PR: 263825 Reviewed by: kib, markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35150	2022-05-19 11:30:21 +01:00
Gleb Smirnoff	6890b58814	sockbuf: improve sbcreatecontrol() o Constify memory pointer. Make length unsigned. o Make it never fail with M_WAITOK and assert that length is sane.	2022-05-17 10:10:42 -07:00
Gleb Smirnoff	b46667c63e	sockbuf: merge two versions of sbcreatecontrol() into one No functional change.	2022-05-17 10:10:42 -07:00
Gleb Smirnoff	eac7f0798b	unix: garbage collect unp_dispose_mbuf() for brevity	2022-05-17 10:10:41 -07:00
Gleb Smirnoff	2e5bf7c49f	unix: fix mbuf leak on close of socket with data Fixes: `1f32cef471`	2022-05-17 10:10:41 -07:00
Vladimir Kondratyev	b6f87b78b5	LinuxKPI: Implement kthread_worker related functions Kthread worker is a single thread workqueue which can be used in cases where specific kthread association is necessary, for example, when it should have RT priority or be assigned to certain cgroup. This change implements Linux v4.9 interface which mostly hides kthread internals from users thus allowing to use ordinary taskqueue(9) KPI. As kthread worker prohibits enqueueing of already pending or canceling tasks some minimal changes to taskqueue(9) were done. taskqueue_enqueue_flags() was added to taskqueue KPI which accepts extra flags parameter. It contains one or more of the following flags: TASKQUEUE_FAIL_IF_PENDING - taskqueue_enqueue_flags() fails if the task is already scheduled to execution. EEXIST is returned and the ta_pending counter value remains unchanged. TASKQUEUE_FAIL_IF_CANCELING - taskqueue_enqueue_flags() fails if the task is in the canceling state and ECANCELED is returned. Required by: drm-kmod 5.10 MFC after: 1 week Reviewed by: hselasky, Pau Amma (docs) Differential Revision: https://reviews.freebsd.org/D35051	2022-05-17 15:10:20 +03:00
Rick Macklem	373511338d	uipc_socket.c: Modify MSG_TLSAPPDATA to only do Alert Records Without this patch, the MSG_TLSAPPDATA flag would cause soreceive_generic() to return ENXIO for any non-application data record in a TLS receive stream. This works ok for TLS1.2, since Alert records appear to be the only non-application data records received. However, for TLS1.3, there can be post-handshake handshake records, such as NewSessionKey sent to the client from the server. These handshake records cannot be handled by the upcall which does an SSL_read() with length == 0. It appears that the client can simply throw away these NewSessionKey records, but to do so, it needs to receive them within the kernel. This patch modifies the semantics of MSG_TLSAPPDATA slightly, so that it only applies to Alert records and not Handshake records. It is needed to allow the krpc to work with KTLS1.3. Reviewed by: hselasky MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D35170	2022-05-14 12:56:50 -07:00
Dmitry Chagin	cb2ae61631	sysvsem: Fix a typo Per jamie@ rpr can be NULL if the jail is created with sysvsem=disable. But at least it doesn't appear to be fatal, since rpr is never dereferenced but is only compared to other prison pointers. Reviewed by: jamie Differential revision: https://reviews.freebsd.org/D35198 MFC after: 2 weeks	2022-05-14 14:07:20 +03:00
Dmitry Chagin	b6c8f461f0	sysvsem: Style(9) MFC after: 2 weeks	2022-05-14 14:06:58 +03:00
Dmitry Chagin	f0b0fdf15e	sysvsem: Trim traiing whitespace MFC after: 2 weeks	2022-05-14 14:06:40 +03:00
Mitchell Horne	db71383b88	kerneldump: remove physical from dump routines It is unused, especially now that the underlying d_dumper methods do not accept the argument. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D35174	2022-05-13 10:43:19 -03:00
Mitchell Horne	489ba22236	kerneldump: remove physical argument from d_dumper The physical address argument is essentially ignored by every dumper method. In addition, the dump routines don't actually pass a real address; every call to dump_append() passes a value of zero for physical. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D35173	2022-05-13 10:42:48 -03:00
Mitchell Horne	0f50da2e09	Drop d_dump from struct cdevsw It appears to be unused. These days struct disk has a d_dump member, which is what gets passed to the kernel dump framework. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D35172	2022-05-13 10:42:17 -03:00
Gleb Smirnoff	bb35a4e11d	unix: microoptimize unp_connectat() - one less lock on success This change is also a preparation for further optimization to allow locked return on success. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35182	2022-05-12 13:22:39 -07:00
Gleb Smirnoff	08f17d1432	unix: make unp_connect2() void Assert that sockets are of the same type. unp_connectat() already did this check. Add the check to uipc_connect2(). Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35181	2022-05-12 13:22:39 -07:00
Gleb Smirnoff	4328318445	sockets: use socket buffer mutexes in struct socket directly Since `c67f3b8b78` the sockbuf mutexes belong to the containing socket, and socket buffers just point to it. In `74a68313b5` macros that access this mutex directly were added. Go over the core socket code and eliminate code that reaches the mutex by dereferencing the sockbuf compatibility pointer. This change requires a KPI change, as some functions were given the sockbuf pointer only without any hint if it is a receive or send buffer. This change doesn't cover the whole kernel, many protocols still use compatibility pointers internally. However, it allows operation of a protocol that doesn't use them. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35152	2022-05-12 13:22:12 -07:00
Gleb Smirnoff	01235012e5	unix/dgram: uipc_listen() is specific for SOCK_STREAM and SOCK_SEQPACKET Rely on pr_usrreqs_init() to init SOCK_DGRAM to pru_listen_notsupp().	2022-05-12 11:04:40 -07:00
Gleb Smirnoff	3c87ba3c3b	unix/dgram: pru_rcvd never called since PR_WANTRCVD not set	2022-05-12 11:04:40 -07:00
Gleb Smirnoff	2e4e5ee23f	sockets: delete stale comment from sofree() First paragraph refers to old past "we used to" and is no longer important today. Second paragraph has just a wrong statement that socket buffer is destroyed before pru_detach.	2022-05-12 11:02:50 -07:00
Gleb Smirnoff	1f32cef471	unix: don't call sbrelease() in uipc_detach() Since `a982ce0442` the socket buffer is already cleared and released in unp_dispose() that is called just before uipc_detach().	2022-05-12 11:02:50 -07:00
Dmitry Chagin	586ed32106	kdump: Decode cpuset_t. Reviewed by: jhb Differential revision: https://reviews.freebsd.org/D34982 MFC after: 2 weeks	2022-05-11 10:40:39 +03:00
Dmitry Chagin	f35093f8d6	Use Linux semantics for the thread affinity syscalls. Linux has more tolerant checks of the user supplied cpuset_t's. Minimum cpuset_t size that the Linux kernel permits in case of getaffinity() is the maximum CPU id, present in the system / NBBY, the maximum size is not limited. For setaffinity(), Linux does not limit the size of the user-provided cpuset_t, internally using only the meaningful part of the set, where the upper bound is the maximum CPU id, present in the system, no larger than the size of the kernel cpuset_t. Unlike FreeBSD, Linux ignores high bits if set in the setaffinity(), so clear it in the sched_setaffinity() and Linuxulator itself. Reviewed by: Pau Amma (man pages) In collaboration with: jhb Differential revision: https://reviews.freebsd.org/D34849 MFC after: 2 weeks	2022-05-11 10:36:01 +03:00
Gleb Smirnoff	7db54446c6	sockbufs: make sbrelease_internal() private	2022-05-09 10:43:01 -07:00
Gleb Smirnoff	a982ce0442	sockets: remove the socket-on-stack hack from sorflush() The hack can be tracked down to 4.4BSD, where copy was performed under splimp() and then after splx() dom_dispose was called. Stevens has a chapter on this function, but he doesn't answer why this trick is necessary. Why can't we call into dom_dispose under splimp()? Anyway, with multithreaded kernel the hack seems to be necessary to avoid LORs between socket buffer lock and different filesystem locks, especially network file systems. The new socket buffers KPI sbcut() from `1d2df300e9` allow us to get rid of the hack. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35125	2022-05-09 10:43:01 -07:00
Gleb Smirnoff	42f2fa9953	sockets: don't call dom_dispose() on a listening socket sorflush() already did the right thing, so only sofree() needed a fix. Turn check into assertion in our only dom_dispose method. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35124	2022-05-09 10:42:57 -07:00
Gleb Smirnoff	c17418a0ba	sockets: assert that any protocol with PR_RIGHTS has dom_dispose() Through the entire history only PF_UNIX has this feature. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35123	2022-05-09 10:42:48 -07:00
Gleb Smirnoff	24df85d29a	unix/*: unp_internalize() can sleep, so allocate mbufs with M_WAITOK	2022-05-09 10:42:48 -07:00
Gleb Smirnoff	97f8198e95	sockets: make SO_SND/SO_RCV a enum Not a functional change now. The enum will also be used for other socket buffer related KPIs.	2022-05-09 10:42:47 -07:00
Warner Losh	45ae223ac6	msgbuf: Allow microsecond granularity timestamps Today, kern.msgbuf_show_timestamp=1 will give 1 second granularity timestamps on dmesg lines. When kern.msgbuf_show_timestamp=2, we'll produce microsecond level graunlarity. For example: old (== 1): [13] Dual Console: Video Primary, Serial Secondary [14] lo0: link state changed to UP [15] bxe0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit [15] bxe0: link state changed to UP new (== 2): [13.807015] Dual Console: Video Primary, Serial Secondary [14.544150] lo0: link state changed to UP [15.272044] bxe0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit [15.272052] bxe0: link state changed to UP Sponsored by: Netflix	2022-05-07 09:32:22 -06:00
Alan Somers	1d2421ad8b	Correctly measure system load averages > 1024 The old fixed-point arithmetic used for calculating load averages had an overflow at 1024. So on systems with extremely high load, the observed load average would actually fall back to 0 and shoot up again, creating a kind of sawtooth graph. Fix this by using 64-bit math internally, while still reporting the load average to userspace as a 32-bit number. Sponsored by: Axcient Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D35134	2022-05-06 17:25:43 -06:00
John Baldwin	2fdcc2ef6f	cpufreq: Remove unused devclass argument to DRIVER_MODULE.	2022-05-06 15:46:58 -07:00
Dmitry Chagin	f04534f5c8	sysvsem: Add a timeout argument to the semop. For future use in the Linux emulation layer for the semtimedop syscall split the sys_semop syscall into two counterparts and add struct timespec *timeout argument to the last one. Reviewed by: jhb, kib Differential revision: https://reviews.freebsd.org/D35121 MFC after: 2 weeks	2022-05-06 19:51:48 +03:00
Kristof Provost	613acc6483	mbuf: do not restore dying interfaces When we remove an interface it is first removed from the interface list V_ifnet (by if_unlink_ifnet()) and marked as IFF_DYING. We then wait for any possible references to stop being used (i.e. epoch_wait/epoch_drain_callbacks) before we tear it fully down. However, the index in ifindex_table is not removed, so m_rcvif_restore() can still find the (now dying) interface. This results in panics, for example when dummynet restores the rcvif pointer and passes a packet to ip6_input() we can panic because the AF_INET6 domain has already been removed (so we end up dereferencing a NULL pointer there). Check that the interface is not dying before we restore it, which is equivalent to checking its presence in V_ifnet, and thus ensures that future accesses (while in NET_EPOCH) are safe. Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34076 (cherry picked from commit `703e533da5`)	2022-05-05 14:38:08 -04:00
Gleb Smirnoff	4d7a1361ef	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show `e6abef0918` (cherry picked from commit `e1882428dc`)	2022-05-05 14:38:07 -04:00
Marko Zec	6c741ffbfa	Revert "mbuf: do not restore dying interfaces" This reverts commit `703e533da5`. Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif" This reverts commit `e1882428dc`. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex	2022-05-03 19:11:40 +02:00
Konstantin Belousov	6fe78ad434	subr_unit.c: make userspace tests buildable by defining a placeholder for UNR_NO_MTX Sponsored by: The FreeBSD Foundation MFC after: 1 week	2022-04-28 03:00:14 +03:00
Konstantin Belousov	709783373e	Fix another race between fork(2) and PROC_REAP_KILL subtree where we might not yet see a new child when signalling a process. Ensure that this cannot happen by stopping all reapping subtree, which ensures that the child is not inside a syscall, in particular fork(2). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35014	2022-04-28 02:27:35 +03:00
Konstantin Belousov	39794d80ad	Fix a race between fork(2) and PROC_REAP_KILL subtree by repeating iteration over the subtree until there are no new processes to signal. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35014	2022-04-28 02:27:35 +03:00
Konstantin Belousov	d1df347368	kern_procctl: add possibility to take stop_all_proc_block() around exec stop_allo_proc_block() must be taken before proctree_lock. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35014	2022-04-28 02:27:35 +03:00
Konstantin Belousov	2e7595ef2f	Add stop_all_proc_block(9) It allows to have more than one consumer of thread_signle(SIGNLE_ALLPROC) by serializing them. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35014	2022-04-28 02:27:35 +03:00
Konstantin Belousov	54a11adbd9	reap_kill(): split children and subtree killers into helpers Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35014	2022-04-28 02:27:34 +03:00
Konstantin Belousov	134529b11b	reap_kill(): rename the reap variable to reaper Suggested and reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35014	2022-04-28 02:27:34 +03:00
Konstantin Belousov	e4ce431e2a	reap_kill(): de-inline LIST_FOREACH(), twice Suggested and reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35014	2022-04-28 02:27:34 +03:00
Konstantin Belousov	b9294a3e15	reaper_abandon_children(): upgrade proctree_lock assert to exclusive p_reapsibling linkage is protected by proctree_lock, and it is modified there. Suggested and reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35014	2022-04-28 02:27:34 +03:00
Konstantin Belousov	e59b940dcb	unr(9): allow to avoid internal locking Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35014	2022-04-28 02:27:34 +03:00
Konstantin Belousov	c4be460e84	init_unrhdr(): make it usable by initializing everything Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35014	2022-04-28 02:27:34 +03:00
John Baldwin	1431239494	Add a __witness_used for variables only used under #ifdef WITNESS. __diagused is now solely used for variables only used under INVARIANTS. Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D35085	2022-04-27 11:46:16 -07:00
Dmitry Chagin	4a700f3c32	sigtimedwait: Prevent timeout math overflows. Our kern_sigtimedwait() calculates absolute sleep timo value as 'uptime+timeout'. So, when the user specifies a big timeout value (LONG_MAX), the calculated timo can be less the the current uptime value. In that case kern_sigtimedwait() returns EAGAIN instead of EINTR, if unblocked signal was caught. While here switch to a high-precision sleep method. Reviewed by: mav, kib In collaboration with: mav Differential revision: https://reviews.freebsd.org/D34981 MFC after: 2 weeks	2022-04-25 10:23:15 +03:00
Dmitry Chagin	91e7bdcdcf	Add timespecvalid_interval macro and use it. Reviewed by: jhb, imp (early rev) Differential revision: https://reviews.freebsd.org/D34848 MFC after: 2 weeks	2022-04-25 10:20:54 +03:00
John Baldwin	a4c5d490f6	KTLS: Move OCF function pointers out of ktls_session. Instead, create a switch structure private to ktls_ocf.c and store a pointer to the switch in the ocf_session. This will permit adding an additional function pointer needed for NIC TLS RX without further bloating ktls_session. Reviewed by: hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35011	2022-04-22 15:52:12 -07:00
John Baldwin	92e40a9b92	busdma_bounce: Batch bounce page free operations when possible. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D34968	2022-04-21 12:01:55 -07:00
John Baldwin	d4ab3a8d4f	busdma_bounce: Add free_bounce_pages helper function. Deduplicate code to iterate over the bpages list in a bus_dmamap_t freeing bounce pages during bus_dmamap_unload. Reviewed by: imp Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34967	2022-04-21 10:42:14 -07:00
John Baldwin	10fe9a1fb4	busdma_bounce: Make the map waiting list per-bounce-zone. When pages are freed to a bounce zone, only maps waiting for pages for that zone can make forward progress. If a map for a different bounce zone is at the head of the global list, then requests that could otherwise make forward progress will be stalled waiting on the other bounce zone. If bounce zones shared bounce pages then a global list would still make sense to prevent "later" requests from starving an earlier request but that is not a concern with per-zone bounce page pools. Reviewed by: imp Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34966	2022-04-21 10:41:09 -07:00
John Baldwin	d11f5d4762	busdma_bounce: Use a simple kproc to invoke deferred requests. Rather than using a software interrupt with a single handler, just create a dedicated kernel process woken up with a simple wakeup(). Reviewed by: imp Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34965	2022-04-21 10:40:35 -07:00
John Baldwin	c7aa0304d5	Run softclock threads at a hardware ithread priority. Add a new PI_SOFTCLOCK for use by softclock threads. Currently this maps to PI_AV which is the second-highest ithread priority. Reviewed by: mav, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D33693	2022-04-21 10:40:01 -07:00
John Baldwin	3d7e90fc20	cpufreq_curr_sysctl: Use devclass_find to lookup cpufreq devclass. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D35002	2022-04-21 10:29:14 -07:00
Kristof Provost	a879e40ca2	callout: fix using shared rmlocks `15b1eb142c` changed the callout code to store the CALLOUT_SHAREDLOCK flag in c_iflags (where it used to be c_flags), but failed to update the check in softclock_call_cc(). This resulted in the callout code always taking the write lock, even if a read lock had been requested (with the CALLOUT_SHAREDLOCK flag in callout_init_rm()). Reviewed by: markj MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34959	2022-04-20 13:06:50 +02:00
John Baldwin	5bdea8826b	devclass_add_driver: Permit NULL to be passed in dcp. This permits a driver module structure that doesn't want to store a pointer to the new driver's devclass. Reviewed by: imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D34962	2022-04-19 10:43:50 -07:00
Mateusz Guzik	c5c981d443	signals: plug a set-but-not-used var Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-04-19 12:45:57 +00:00
John Baldwin	d139909d6e	destroy_dev_sched*: Don't hold Giant for all deferred destroy_dev. Rather than using taskqueue_swi_giant which holds Giant for all deferred destroy_dev calls, create a separate queue for destroyed devices with D_NEEDGIANT set in the corresponding cdevsw. The task for this queue holds Giant whild destroying deferred devices while the task for the default queue does not hold Giant. In addition, switch to taskqueue_thread for destroy_dev_sched. Deferred destroy_dev requests don't need to run at an SWI priority. Reviewed by: imp, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D34915	2022-04-18 12:04:30 -07:00
Konstantin Belousov	362ff9867e	Revert rest of `a5970a529c`: use vrefact() when working on fp->f_vnode Now, since O_PATH-opened file descriptors use use references instead of the hold references, vrefact() chahges from that revision can be reverted. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34906	2022-04-15 16:56:20 +03:00
Ed Maste	f99cc5a389	sysent: regen after `52a1d90c8b`, posix_fadvise in capmode	2022-04-14 15:17:36 -04:00
Ed Maste	52a1d90c8b	Allow posix_fadvise in capability mode posix_fadvise operates only on a provided fd. Noted by Mathieu <sigsys@gmail.com> in review D34761. No new CAP_ rights are added for posix_fadvise(), as 'advice' in general only influences when I/O happens; the fd must have existing CAP_ rights for actual data access. Reviewed by: markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34903	2022-04-14 15:11:21 -04:00
Konstantin Belousov	bf13db086b	Mostly revert `a5970a529c`: Make files opened with O_PATH to not block non-forced unmount Problem is that open(O_PATH) on nullfs -o nocache is broken then, because there is no reference on the vnode after the open syscall exits. Reported and tested by: ambrisko Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2022-04-14 02:47:04 +03:00
John Baldwin	36fb372264	kern: Move variables only used for MAC under #ifdef MAC.	2022-04-13 16:08:23 -07:00
John Baldwin	4aec198420	sched_ule: Inline value of ts in sched_thread_priority. This avoids a set but unused warning in kernels without SMP where TDQ_CPU() doesn't use its argument.	2022-04-13 16:08:23 -07:00
John Baldwin	8758ac757f	sched_4bsd: ts is only used in sched_bind for SMP.	2022-04-13 16:08:22 -07:00
John Baldwin	72ff256c51	sched_4bsd: Remove unused variables.	2022-04-12 14:58:59 -07:00
John Baldwin	dbd51c416a	realloc(9): Move slab and zone under #ifndef DEBUG_REDZONE.	2022-04-12 14:58:59 -07:00
Mark Johnston	d769609620	tty: Remove an incorrect assertion from ttyinq_line_iterate() We may legitimately have tib == NULL if we're at the very end of the queue. PR: 215373 Reported by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-04-12 17:30:04 -04:00
Tom Jones	1ea833a572	kdb: set kdb_why when entered via reboot and panic Reviewed by: jhb Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. X-NetApp-PR: #74 Differential Revision: https://reviews.freebsd.org/D34551	2022-04-12 10:34:40 +01:00
Dmitry Chagin	c6487446d7	getdirentries: return ENOENT for unlinked but still open directory. To be more compatible to IEEE Std 1003.1-2008 (“POSIX.1”). Reviewed by: mjg, Pau Amma (doc) Differential revision: https://reviews.freebsd.org/D34680 MFC after: 2 weeks	2022-04-11 23:30:16 +03:00
Konstantin Belousov	eca39864f7	Add sysctl KERN_LOCKF reporting the shapshot of the active advisory locks. A new VFS ops method vfs_report_lockf if provided in the mount point op table. If it is NULL, as it is currently for all existing filesystems, vfs_report_lockf() function is used, which gathers information from the standard implementation inside kern/kern_lockf.c. Filesystems implementing its own locking (NFSv4 as example) can provide a custom implementation. Reviewed by: markj, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34756	2022-04-10 00:43:53 +03:00
Konstantin Belousov	147e4fe3f1	kern_lockf.c: remove no longer neeeded UFS headers Reviewed by: markj, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34756	2022-04-10 00:43:53 +03:00
Konstantin Belousov	59e85819be	lockf: remove lf_inode from struct lockf_entry The UFS-specific struct inode cannot be used in generic advisory lock code. It was probably used as a shortcut for the debugging, as the remnants of the code around it indicates. Use somewhat more verbose and less concentrated, but universal, VOP_PRINT(), where needed. Reviewed by: markj, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34756	2022-04-10 00:43:53 +03:00
Gordon Bergling	f171938cd6	jail: Remove a double word in a source code comment - s/a a/a/ MFC after: 3 days	2022-04-09 14:19:17 +02:00
Gordon Bergling	c3721292e3	kern: Remove a double word in a source code comment - s/for for/for/ MFC after: 3 days	2022-04-09 10:50:04 +02:00
Gordon Bergling	768f9b8b8b	kern: Fix a typo in a source code comment - s/is is/is/ MFC after: 3 days	2022-04-09 09:14:14 +02:00
Andrew Turner	41e6d2091c	Enable subr_physmem_test on supported architectures Only build where it's supported. While here add support for amd64 to help with testing. Sponsored by: The FreeBSD Foundation	2022-04-07 14:31:51 +01:00
Andrew Turner	d8bff5b67c	Handle non-page aligned/sized memory in physmem In some configurations the firmware may pass memory regions that are not page sized or aligned, e.g. when using 16k pages on arm64. If this is the case we will calculate many small regions because the alignment is applied before being inserted. As we round the start up and end down this will leave a 1 page hole between what should have been a single region. Fix by keeping the original alignment until we are just about to insert the region into the avail array. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34694	2022-04-06 14:13:29 +01:00
Andrew Turner	8c99dfed54	Port subr_physmem to userspace and add tests These give us some confidience we haven't broken anything in early boot code that may be running before the console. Reviewed by: emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34691	2022-04-06 14:13:05 +01:00
Mitchell Horne	eb9d205fa6	livedump: add event handler hooks Add three hooks to the livedump process: before, after, and for each block of dumped data. This allows, for example, quiescing the system before the dump begins or protecting data of interest to ensure its consistency in the final output. Reviewed by: markj, kib (previous version) Reviewed by: debdrup (manpages) Reviewed by: Pau Amma <pauamma@gundo.com> (manpages) MFC after: 3 weeks Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D34067	2022-04-05 15:35:05 -03:00
Mitchell Horne	c9114f9f86	Add new vnode dumper to support live minidumps This dumper can instantiate and write the dump's contents to a file-backed vnode. Unlike existing disk or network dumpers, the vnode dumper should not be invoked during a system panic, and therefore is not added to the global dumper_configs list. Instead, the vnode dumper is constructed ad-hoc when a live dump is requested using the new ioctl on /dev/mem. This is similar in spirit to a kgdb session against the live system via /dev/mem. As described briefly in the mem(4) man page, live dumps are not guaranteed to result in a usuable output file, but offer some debugging value where forcefully panicing a system to dump its memory is not desirable/feasible. A future change to savecore(8) will add an option to save a live dump. Reviewed by: markj, Pau Amma <pauamma@gundo.com> (manpages) Discussed with: kib MFC after: 3 weeks Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D33813	2022-04-05 15:35:05 -03:00
Mitchell Horne	59c27ea18c	Split out dumper allocation from list insertion Add a new function, dumper_create(), to allocate a dumper. dumper_insert() will call this function and retains the existing behaviour. This is desirable for performing live dumps of the system. Here, there is a need to allocate and configure a dumper structure that is invoked outside of the typical debugger context. Therefore, it should be excluded from the list of panic-time dumpers. free_single_dumper() is made public and renamed to dumper_destroy(). Reviewed by: kib, markj MFC after: 1 week Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D34068	2022-04-05 15:35:05 -03:00
Mateusz Guzik	b7262756e2	vfs: fixup WANTIOCTLCAPS on open In some cases vn_open_cred overwrites cn_flags, effectively nullifying initialisation done in NDINIT. This will have to be fixed. In the meantime make sure the flag is passed. Reported by: jenkins Noted by: Mathieu <sigsys@gmail.com>	2022-04-02 20:49:01 +02:00
Gordon Bergling	c9b04ee4f8	kern: Fix two typos in source code comments - s/accomodate/accommodate/ MFC after: 3 days	2022-04-02 14:52:49 +02:00
Gordon Bergling	7181887e82	kern: Fix two typos in source code comments - s/measurment/measurement/ MFC after: 3 days	2022-04-02 14:15:27 +02:00
Mateusz Guzik	0c805718cb	vfs: fix memory leak on lookup with fds with ioctl caps Reviewed by: markj PR: 262515 Noted by: firk@cantconnect.ru Differential Revision: https://reviews.freebsd.org/D34667	2022-04-02 12:09:07 +00:00
Gordon Bergling	669d5ea4e3	kern: Fix a typo in a source code comment - s/paniced/panicked/ MFC after: 3 days	2022-04-02 10:15:02 +02:00
Ed Maste	e5821a2156	syscalls.master: remove obsolete comment about compatibility tables Compatibility ABIs no longer use a separate syscalls.master. Fixes: `be67ea40c5` ("freebsd32: generate from ...") Sponsored by: The FreeBSD Foundation	2022-03-30 11:07:00 -04:00
Brooks Davis	8601fca789	sysent: regen for syscallarg_t	2022-03-28 19:43:03 +01:00
Brooks Davis	b1ad6a9000	syscallarg_t: Add a type for system call arguments This more clearly differentiates system call arguments from integer registers and return values. On current architectures it has no effect, but on architectures where pointers are not integers (CHERI) and may not even share registers (CHERI-MIPS) it is necessiary to differentiate between system call arguments (syscallarg_t) and integer register values (register_t). Obtained from: CheriBSD Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D33780	2022-03-28 19:43:03 +01:00
Andrew Turner	f461b95561	Fix a sign mismatch warning in the physmem code Make sure both sides of a comparison are unsigned. As the values being compared are size_t make the the value in the for loop size_t too. Sponsored by: The FreeBSD Foundation	2022-03-28 11:51:09 +01:00
Mateusz Guzik	2533b5dc82	vfs: add missing bits to vdropl_impl This completes the patch which was originally meant to go in. Spotted by: mhorne Fixes: `c35ec1efdc` ("vfs: [1/2] fix stalls in vnode reclaim by not requeieing from vnlru")	2022-03-27 14:35:37 +00:00
Mateusz Guzik	a4032e2a69	vfs: assorted tidy ups to lookup No functional changes.	2022-03-26 17:06:09 +00:00
Alexander Leidinger	aeb91e95cf	Log euid, rgid and jail on listen queue overflow If you have numerous jails with multiple similar services running, this helps to narrow down which services this log is referring to.	2022-03-26 11:17:55 +01:00
Eric van Gyzen	aca2a7faca	stack_zero is not needed before stack_save The man page was recently clarified to commit to this contract. MFC after: 1 week Sponsored by: Dell EMC Isilon	2022-03-25 20:10:38 -05:00
Eric van Gyzen	863070bbf6	ksiginfo_alloc: pass M_WAITOK or M_NOWAIT to uma_zalloc It expects exactly one of those flags. A future commit will assert this. Reviewed by: rstone MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D34451	2022-03-25 20:10:37 -05:00
Mateusz Guzik	0f60088399	vfs: set cn_namelen when handling degenerate lookups Turns out execve looks at it to store binary name, but in order to trigger the problem one has to be trying to exec '/'. As is the value would be left uninitialized (or rather set to -1 on debug kernels). Fixes: `56244d3574` ("vfs: hoist degenerate path lookups out of the loop")	2022-03-25 18:19:36 +00:00
Mateusz Guzik	4ef6e56ae8	vfs: hoist trailing slash handling out of the loop	2022-03-24 14:36:31 +00:00
Mateusz Guzik	3b6792d28a	vfs: factor symlink traversal out of namei The intent down the road is to eliminate the loop to begin with, pushing traversal down to vfs_lookup, all while not allocating the extra buffer.	2022-03-24 13:11:22 +00:00
Mateusz Guzik	d9ea7e2b1e	vfs: factor FAILIFEXISTS handling out of vfs_lookup	2022-03-24 11:22:20 +00:00
Mateusz Guzik	56244d3574	vfs: hoist degenerate path lookups out of the loop	2022-03-24 11:22:12 +00:00
Mateusz Guzik	bb92cd7bcd	vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)	2022-03-24 10:20:51 +00:00
Mark Johnston	1babcad6bc	elf: Avoid dumping uninitialized bytes in PRSTATUS core dump notes elf_prstatus_t contains pad space. Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34606	2022-03-23 12:53:49 -04:00
Mark Johnston	7524994da0	callout: Remove the CS_EXECUTING flag It is now unused. MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34626	2022-03-23 12:37:02 -04:00
Mark Johnston	b319171861	setitimer: Fix exit race We use the p_itcallout callout, interlocked by the proc lock, to schedule timeouts for the setitimer(2) system call. When a process exits, the callout must be stopped before the process struct is recycled. Currently we attempt to stop the callout in exit1() with the call _callout_stop_safe(&p->p_itcallout, CS_EXECUTING). If this call returns 0, then we sleep in order to drain the callout. However, this happens only if the callout is not scheduled at all. If the callout thread is blocked on the proc lock, then exit1() will not block and the callout may execute after the process has fully exited, typically resulting in a panic. I cannot see a reason to use the CS_EXECUTING flag here. Instead, use the regular callout_stop()/callout_drain() dance to halt the callout. Reported by: ler Tested by: ler, pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34625	2022-03-23 12:36:12 -04:00
Alexander Motin	fd6ca665d2	Fix umtxq_sleep() regression caused by `56070dd2e4`. umtxq_requeue() moves the queue to a different hash chain and different lock, so we can't rely on msleep_sbt() reacquiring the same old lock. We have to use PDROP and update the queue chain and so lock pointer. PR: 262587 MFC after: 2 weeks	2022-03-21 19:55:55 -04:00
firk	bb53dd56c3	kern_tc.c/cputick2usec() (which is used to calculate cputime from cpu ticks) has some imprecision and, worse, huge timestep (about 20 minutes on 4GHz CPU) near 53.4 days of elapsed time. kern_time.c/cputick2timespec() (it is used for clock_gettime() for querying process or thread consumed cpu time) Uses cputick2usec() and then needlessly converting usec to nsec, obviously losing precision even with fixed cputick2usec(). kern_time.c/kern_clock_getres() uses some weird (anyway wrong) formula for getting cputick resolution. PR: 262215 Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D34558	2022-03-21 09:33:46 -04:00
Andrew Turner	cab496e16c	Make SHMMAXPGS an unsigned long This is used to calculate sizes that are then stored in unsigned long fields. Make this unsigned long so the calculations use this type and not an int that can lead to an integer overflow with a large PAGE_SIZE. This allows building this on arm64 with PAGE_SIZE of 16k. Further work will be needed if a 32-bit architecture tries to use a similar sized page. Sponsored by: The FreeBSD Foundation	2022-03-21 10:27:35 +00:00
Colin Percival	2406867f5b	tslog: Add CTLFLAG_SKIP to sysctls The timestamp logs are quite large (often much larger than all the other sysctls combined) so it's unlikely anyone will want to have them displayed by `sysctl -a`. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D34616	2022-03-20 11:31:16 -07:00
Mateusz Guzik	6ff3e8a316	cache: add a comment about a realpath bug	2022-03-19 15:11:25 +00:00
Mateusz Guzik	eb574ba0b6	vfs: replace VFS_NOTIFY_UPPER_* macros with an enum	2022-03-19 13:15:55 +00:00
Mateusz Guzik	cceb91b025	vfs: add missing flags to db show mount	2022-03-19 12:04:44 +00:00
Mateusz Guzik	93a0ba8f49	vfs: retire the no longer used MNTK_LOOKUP_EXCL_DOTDOT flag Reviewed by: markj Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D34466	2022-03-19 10:47:29 +00:00
Mateusz Guzik	1cb0045c97	vfs: add MNTK_UNLOCKED_INSMNTQUE Can be used when the fs at hand can synchronize insmntque with other means than the vnode lock. Reviewed by: markj Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D34466	2022-03-19 10:46:40 +00:00
firk	28d08dc7d0	clock_gettime: Fix CLOCK_THREAD_CPUTIME_ID race Use a spinlock section instead of a critical section to synchronize with statclock(). Otherwise the CLOCK_THREAD_CPUTIME_ID clock can appear to go backwards. PR: 262273 Reviewed by: markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D34568	2022-03-17 15:39:00 -04:00
Mark Johnston	fc7e121d88	file: Move FILEDESC_FOREACH macros to kern_descrip.c They are only used in kern_descrip.c, so make them private. No functional change intended. Discussed with: mjg Sponsored by: The FreeBSD Foundation	2022-03-17 15:39:00 -04:00

... 3 4 5 6 7 ...

19445 Commits