freebsd-skq

Author	SHA1	Message	Date
Mateusz Guzik	598f2b8116	dtrace: stop using eventhandlers for the part compiled into the kernel Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D27311	2020-11-23 18:27:21 +00:00
Mateusz Guzik	a9568cd2bc	thread: stash domain id to work around vtophys problems on ppc64 Adding to zombie list can be perfomed by idle threads, which on ppc64 leads to panics as it requires a sleepable lock. Reported by: alfredo Reviewed by: kib, markj Fixes: r367842 ("thread: numa-aware zombie reaping") Differential Revision: https://reviews.freebsd.org/D27288	2020-11-23 18:26:47 +00:00
Konstantin Belousov	87a9b18d22	Provide ABI modules hooks for process exec/exit and thread exit. Exec and exit are same as corresponding eventhandler hooks. Thread exit hook is called somewhat earlier, while thread is still owned by the process and enough context is available. Note that the process lock is owned when the hook is called. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27309	2020-11-23 17:29:25 +00:00
Edward Tomasz Napierala	9c8c797c1a	Remove the 'wantparent' variable, unused since r145004. Reviewed by: kib MFC after: 2 weeks Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27193	2020-11-23 12:47:23 +00:00
Kyle Evans	dac521ebcf	cpuset_setproc: use the appropriate parent for new anonymous sets As far as I can tell, this has been the case since initially committed in 2008. cpuset_setproc is the executor of cpuset reassignment; note this excerpt from the description: * 1) Set is non-null. This reparents all anonymous sets to the provided * set and replaces all non-anonymous td_cpusets with the provided set. However, reviewing cpuset_setproc_setthread() for some jail related work unearthed the error: if tdset was not anonymous, we were replacing it with `set`. If it was anonymous, then we'd rebase it onto `set` (i.e. copy the thread's mask over and AND it with `set`) but give the new anonymous set the original tdset as the parent (i.e. the base of the set we're supposed to be leaving behind). The primary visible consequences were that: 1.) cpuset_getid() following such assignment returns the wrong result, the setid that we left behind rather than the one we joined. 2.) When a process attached to the jail, the base set of any anonymous threads was a set outside of the jail. This was initially bundled in D27298, but it's a minor fix that's fairly easy to verify the correctness of. A test is included in D27307 ("badparent"), which demonstrates the issue with, effectively: osetid = cpuset_getid() newsetid = cpuset() cpuset_setaffinity(thread) cpuset_setid(osetid) cpuset_getid(thread) -> observe that it matches newsetid instead of osetid. MFC after: 1 week	2020-11-23 02:49:53 +00:00
Kyle Evans	60e60e73fd	freebsd32: take the _umtx_op struct definitions back Providing these in freebsd32.h facilitates local testing/measuring of the structs rather than forcing one to locally recreate them. Sanity checking offsets/sizes remains in kern_umtx.c where these are typically used.	2020-11-23 00:58:14 +00:00
Kyle Evans	f96078b8fe	kern: dup: do not assume oldfde is valid oldfde may be invalidated if the table has grown due to the operation that we're performing, either via fdalloc() or a direct fdgrowtable_exp(). This was technically OK before rS367927 because the old table remained valid until the filedesc became unused, but now it may be freed immediately if it's an unshared table in a single-threaded process, so it is no longer a good assumption to make. This fixes dup/dup2 invocations that grow the file table; in the initial report, it manifested as a kernel panic in devel/gmake's configure script. Reported by: Guy Yur <guyyur gmail com> Reviewed by: rew Differential Revision: https://reviews.freebsd.org/D27319	2020-11-23 00:33:06 +00:00
Kyle Evans	e0cb5b2a77	[2/2] _umtx_op: introduce 32-bit/i386 flags for operations This patch takes advantage of the consolidation that happened to provide two flags that can be used with the native _umtx_op(2): UMTX_OP___32BIT and UMTX_OP__I386. UMTX_OP__32BIT iindicates that we are being provided with 32-bit structures. Note that this flag alone indicates a 64bit time_t, since this is the majority case. UMTX_OP__I386 has been provided so that we can emulate i386 as well, regardless of whether the host is amd64 or not. Both imply a different set of copyops in sysumtx_op. freebsd32__umtx_op simply ignores the flags, since it's already doing a 32-bit operation and it's unlikely we'll be running an emulator under compat32. Future work could consider it, but the author sees little benefit. This will be used by qemu-bsd-user to pass on all _umtx_op calls to the native interface as long as the host/target endianness matches, effectively eliminating most if not all of the remaining unresolved deadlocks for most. This version changed a fair amount from what was under review, mostly in response to refactoring of the prereq reorganization and battle-testing it with qemu-bsd-user. The main changes are as follows: 1.) The i386 flag got renamed to omit '32BIT' since this is redundant. 2.) The flags are now properly handled on 32-bit platforms to emulate other 32-bit platforms. 3.) Robust list handling was fixed, and the 32-bit functionality that was previously gated by COMPAT_FREEBSD32 is now unconditional. 4.) Robust list handling was also improved, including the error reported when a process has already registered 32-bit ABI lists and also detecting if native robust lists have already been registered. Both scenarios now return EBUSY rather than EINVAL, because the input is technically valid but we're too busy with another ABI's lists. libsysdecode/kdump/truss support will go into review soon-ish, along with the associated manpage update. Reviewed by: kib (earlier version) MFC after: 3 weeks	2020-11-22 05:47:45 +00:00
Kyle Evans	15eaec6a5c	_umtx_op: move compat32 definitions back in These are reasonably compact, and a future commit will blur the compat32 lines by supporting 32-bit operations with the native _umtx_op.	2020-11-22 05:34:51 +00:00
Robert Wing	3c85ca21d1	fd: free old file descriptor tables when not shared During the life of a process, new file descriptor tables may be allocated. When a new table is allocated, the old table is placed in a free list and held onto until all processes referencing them exit. When a new file descriptor table is allocated, the old file descriptor table can be freed when the current process has a single-thread and the file descriptor table is not being shared with any other processes. Reviewed by: kevans Approved by: kevans (mentor) Differential Revision: https://reviews.freebsd.org/D18617	2020-11-22 05:00:28 +00:00
Konstantin Belousov	e68c619144	Stop using eventhandlers for itimers subsystem exec and exit hooks. While there, do some minor cleanup for kclocks. They are only registered from kern_time.c, make registration function static. Remove event hooks, they are not used by both registered kclocks. Add some consts. Perhaps we can stop registering kclocks at all and statically initialize them. Reviewed by: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27305	2020-11-21 21:43:36 +00:00
Konstantin Belousov	5a2a4551f5	Remove unused prototype. Missed part of r367918. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-11-21 10:58:19 +00:00
Konstantin Belousov	74a093eb98	Stop using eventhandler to invoke umtx_exec hook. There is no point in dynamic registration, umtx hook is there always. Reviewed by: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27303	2020-11-21 10:32:40 +00:00
Kirk McKusick	e75f0f2b48	Only attempt a VOP_UNLOCK() when the vn_lock() has been successful. No MFC as this code is not present in 12-stable. Reported by: Peter Holm Reviewed by: Mateusz Guzik Tested by: Peter Holm Sponsored by: Netflix	2020-11-20 20:22:01 +00:00
Michal Meloun	d9de80d614	Also pass interrupt binding request to non-root interrupt controllers. There are message based controllers that can bind interrupts even if they are not implemented as root controllers (such as the ITS subblock of GIC). MFC after: 3 weeks	2020-11-20 09:05:36 +00:00
Mateusz Guzik	f9fe7b28bc	pipe: thundering herd problem in pipelock All reads and writes are serialized with a hand-rolled lock, but unlocking it always wakes up all waiters. Existing flag fields get resized to make room for introduction of waiter counter without growing the struct. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D27273	2020-11-19 19:25:47 +00:00
Mark Johnston	a33fef5e25	callout(9): Fix a race between CPU migration and callout_drain() Suppose a running callout re-arms itself, and before the callout finishes running another CPU calls callout_drain() and goes to sleep. softclock_call_cc() will wake up the draining thread, which may not run immediately if there is a lot of CPU load. Furthermore, the callout is still in the callout wheel so it can continue to run and re-arm itself. Then, suppose that the callout migrates to another CPU before the draining thread gets a chance to run. The draining thread is in this loop in _callout_stop_safe(): while (cc_exec_curr(cc) == c) { CC_UNLOCK(cc); sleep(); CC_LOCK(cc); } but after the migration, cc points to the wrong CPU's callout state. Then the draining thread goes off and removes the callout from the wheel, but does so using the wrong lock and per-CPU callout state. Fix the problem by doing a re-lookup of the callout CPU after sleeping. Reported by: syzbot+79569cd4d76636b2cc1c@syzkaller.appspotmail.com Reported by: syzbot+1b27e0237aa22d8adffa@syzkaller.appspotmail.com Reported by: syzbot+e21aa5b85a9aff90ef3e@syzkaller.appspotmail.com Reviewed by: emaste, hselasky Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27266	2020-11-19 18:37:28 +00:00
Mitchell Horne	c8a96cdcd9	Add an option for entering KDB on recursive panics There are many cases where one would choose avoid entering the debugger on a normal panic, opting instead to reboot and possibly save a kernel dump. However, recursive kernel panics are an unusual case that might warrant attention from a human, so provide a secondary tunable, debug.debugger_on_recursive_panic, to allow entering the debugger only when this occurs. For for simplicity in maintaining existing behaviour, the tunable defaults to zero. Reviewed by: cem, markj Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27271	2020-11-19 18:03:40 +00:00
Mateusz Guzik	d116b9f1ad	thread: numa-aware zombie reaping The current global list is a significant problem, in particular induces a lot of cross-domain thread frees. When running poudriere on a 2 domain box about half of all frees were of that nature. Patch below introduces per-domain thread data containing zombie lists and domain-aware reaping. By default it only reaps from the current domain, only reaping from others if there is free TID shortage. A dedicated callout is introduced to reap lingering threads if there happens to be no activity. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D27185	2020-11-19 10:00:48 +00:00
Mateusz Guzik	b8cb628534	pipe: tidy up pipelock	2020-11-19 08:16:45 +00:00
Mateusz Guzik	89744405e6	pipe: allow for lockless pipe_stat pipes get stated all thet time and this avoidably contributed to contention. The pipe lock is only held to accomodate MAC and to check the type. Since normally there is no probe for pipe stat depessimize this by having the flag. The pipe_state field gets modified with locks held all the time and it's not feasible to convert them to use atomic store. Move the type flag away to a separate variable as a simple cleanup and to provide stable field to read. Use short for both fields to avoid growing the struct. While here short-circuit MAC for pipe_poll as well.	2020-11-19 06:30:25 +00:00
Mateusz Guzik	2f5b0b48ac	cred: fix minor nits in r367695 Noted by: jhb	2020-11-19 04:28:39 +00:00
Mateusz Guzik	c48f897bbe	smp: fix smp_rendezvous_cpus_retry usage before smp starts Since none of the other CPUs are running there is nobody to clear their entries and the routine spins indefinitely.	2020-11-19 04:27:51 +00:00
Mark Johnston	a28c28e6ef	Remove NO_EVENTTIMERS support The arm configs that required it have been removed from the tree. Removing this option makes the callout code easier to read and discourages developers from adding new configs without eventtimer drivers. Reviewed by: ian, imp, mav Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27270	2020-11-19 02:50:48 +00:00
Mariusz Zaborski	f488d5b797	Add CTLFLAG_MPSAFE to the suser_enabled sysctl. Pointed out by: mjg	2020-11-18 21:26:14 +00:00
Mariusz Zaborski	05e1e482c7	jail: introduce per jail suser_enabled setting The suser_enable sysctl allows to remove a privileged rights from uid 0. This change introduce per jail setting which allow to make root a normal user. Reviewed by: jamie Previous version reviewed by: kevans, emaste, markj, me_igalic.co Discussed with: pjd Differential Revision: https://reviews.freebsd.org/D27128	2020-11-18 21:07:08 +00:00
Mariusz Zaborski	21fe9441e1	Fix style nits.	2020-11-18 20:59:58 +00:00
John Baldwin	5335f6434b	Fix a few nits in vn_printf(). - Mask out recently added VV_* bits to avoid printing them twice. - Keep VI_LOCKed on the same line as the rest of the flags. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27261	2020-11-18 16:21:37 +00:00
Kyle Evans	27a9392d54	_umtx_op: fix robust lists after r367744 A copy-pasto left us copying in 24-bytes at the address of the rb pointer instead of the intended target. Reported by: sigsys@gmail.com Sighing: kevans	2020-11-18 03:30:31 +00:00
Conrad Meyer	f8f74aaa84	linux(4) clone(2): Correctly handle CLONE_FS and CLONE_FILES The two flags are distinct and it is impossible to correctly handle clone(2) without the assistance of fork1(). This change depends on the pwddesc split introduced in r367777. I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd should be treated the opposite way p_fd is (based on RFFDG flag). This is a little ugly, but the benefit is that existing RFFDG API is preserved. Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are copied, while !RFFDG indicates both should be cloned. In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects independent fd tables. The previous conflation of CLONE_FS and CLONE_FILES was introduced in r163371 (2006). Discussed with: markj, trasz (earlier version) Differential Revision: https://reviews.freebsd.org/D27016	2020-11-17 21:20:11 +00:00
Conrad Meyer	85078b8573	Split out cwd/root/jail, cmask state from filedesc table No functional change intended. Tracking these structures separately for each proc enables future work to correctly emulate clone(2) in linux(4). __FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof. Reviewed by: kib Discussed with: markj, mjg Differential Revision: https://reviews.freebsd.org/D27037	2020-11-17 21:14:13 +00:00
Conrad Meyer	ede4af47ae	unix(4): Enhance LOCAL_CREDS_PERSISTENT ABI As this ABI is still fresh (r367287), let's correct some mistakes now: - Version the structure to allow for future changes - Include sender's pid in control message structure - Use a distinct control message type from the cmsgcred / sockcred mess Discussed with: kib, markj, trasz Differential Revision: https://reviews.freebsd.org/D27084	2020-11-17 20:01:21 +00:00
Conrad Meyer	de774e422e	linux(4): Implement name_to_handle_at(), open_by_handle_at() They are similar to our getfhat(2) and fhopen(2) syscalls. Differential Revision: https://reviews.freebsd.org/D27111	2020-11-17 19:51:47 +00:00
Kyle Evans	bd4bcd14e3	Fix !COMPAT_FREEBSD32 kernel build One of the last shifts inadvertently moved these static assertions out of a COMPAT_FREEBSD32 block, which the relevant definitions are limited to. Fix it. Pointy hat: kevans	2020-11-17 04:22:10 +00:00
Kyle Evans	63ecb272a0	umtx_op: reduce redundancy required for compat32 All of the compat32 variants are substantially the same, save for copyin/copyout (mostly). Apply the same kind of technique used with kevent here by having the syscall routines supply a umtx_copyops describing the operations needed. umtx_copyops carries the bare minimum needed- size of timespec and _umtx_time are used for determining if copyout is needed in the sem2_wait case. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27222	2020-11-17 03:36:58 +00:00
Kyle Evans	4be0a1b587	_umtx_op: fix a compat32 bug in UMTX_OP_NWAKE_PRIVATE Specifically, if we're waking up some value n > BATCH_SIZE, then the copyin(9) is wrong on the second iteration due to upp being the wrong type. upp is currently a uint32_t*, so upp + pos advances it by twice as many elements as it should (host pointer size vs. compat32 pointer size). Fix it by just making upp a uint32_t; it's still technically a double pointer, but the distinction doesn't matter all that much here since we're just doing arithmetic on it. Add a test case that demonstrates the problem, placed with the libthr tests since one messing with _umtx_op should be running these tests. Running under compat32, the new test case will hang as threads after the first 128 get missed in the wake. it's not immediately clear how to hit it in practice, since pthread_cond_broadcast() uses a smaller (sleepq batch?) size observed to be around ~50 -- I did not spend much time digging into it. The uintptr_t change makes no functional difference, but i've tossed it in since it's more accurate (semantically). Reported by: Andrew Gierth (andrew_tao173.riddles.org.uk, inspection) Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27231	2020-11-17 03:34:01 +00:00
Konstantin Belousov	cb596eea82	vmem: trivial warning and style fixes. Add __unused to some args. Change type of the iterator variables to match loop control. Remove excessive {}. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27220	2020-11-17 02:18:34 +00:00
Mateusz Guzik	1a7bb89629	cpuset: refcount-clean	2020-11-17 00:04:05 +00:00
Mateusz Guzik	89deca0a33	malloc: make malloc_large closer to standalone This moves entire large alloc handling out of all consumers, apart from deciding to go there. This is a step towards creating a fast path. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27198	2020-11-16 17:56:58 +00:00
Mateusz Guzik	19d3e47dca	select: call seltdfini on process and thread exit Since thread_zone is marked NOFREE the thread_fini callback is never executed, meaning memory allocated by seltdinit is never released. Adding the call to thread_dtor is not sufficient as exiting processes cache the main thread.	2020-11-16 03:12:21 +00:00
Mateusz Guzik	31b2ac4b5a	select: replace reference counting with memory barriers in selfd Refcounting was added to combat a race between selfdfree and doselwakup, but it adds avoidable overhead. selfdfree detects it can free the object by ->sf_si == NULL, thus we can ensure that the condition only holds after all accesses are completed.	2020-11-16 03:09:18 +00:00
Mateusz Guzik	b77594bbbf	sched: fix an incorrect comparison in sched_lend_user_prio_cond Compare with sched_lend_user_prio.	2020-11-15 01:54:44 +00:00
Mateusz Guzik	f34a2f56c3	thread: batch credential freeing	2020-11-14 19:22:02 +00:00
Mateusz Guzik	fb8ab68084	thread: batch resource limit free calls	2020-11-14 19:21:46 +00:00
Mateusz Guzik	5ef7b7a0f3	thread: rework tid batch to use helpers	2020-11-14 19:20:58 +00:00
Mateusz Guzik	d1ca25be49	thread: pad tid lock On a kernel with other changes this bumps 104-way thread creation/destruction from 0.96 mln ops/s to 1.1 mln ops/s.	2020-11-14 19:19:27 +00:00
Mateusz Guzik	9b9bb9ffa5	malloc: retire MALLOC_PROFILE The global array has prohibitive performance impact on multicore systems. The same data (and more) can be obtained with dtrace. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27199	2020-11-13 19:22:53 +00:00
Konstantin Belousov	441eb16a95	Allow some VOPs to return ERELOOKUP to indicate VFS operation restart at top level. Restart syscalls and some sync operations when filesystem indicated ERELOOKUP condition, mostly for VOPs operating on metdata. In particular, lookup results cached in the inode/v_data is no longer valid and needs recalculating. Right now this should be nop. Assert that ERELOOKUP is catched everywhere and not returned to userspace, by asserting that td_errno != ERELOOKUP on syscall return path. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-13 09:42:32 +00:00
Konstantin Belousov	7cde2ec4fd	Implement vn_lock_pair(). In collaboration with: pho Reviewed by: mckusick (previous version), markj (previous version) Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-13 09:31:57 +00:00
Mateusz Guzik	9aa6d792b5	malloc: retire malloc_last_fail The routine does not serve any practical purpose. Memory can be allocated in many other ways and most consumers pass the M_WAITOK flag, making malloc not fail in the first place. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27143	2020-11-12 20:22:58 +00:00

1 2 3 4 5 ...

17934 Commits