freebsd-nq

Author	SHA1	Message	Date
Jamie Gritton	39c8ef90f6	jail: A jail could be removed without calling OSD methods Fix a long-standing bug where setting nopersist on a process-less jail would remove it without calling the the OSD PR_METHOD_REMOVE methods.	2021-01-22 10:50:10 -08:00
Marius Strobl	679e4cdabd	kvprintf(9): add missing FALLTHROUGH Reported by: Coverity CID: 1005166	2021-01-22 00:18:40 +01:00
Konstantin Belousov	1ac7c34486	malloc_aligned: roundup allocation size up to next power of two to make it use the right aligned zone. Reported by: melifaro Reviewed by: alc, markj (previous version) Discussed with: jrtc27 Tested by: pho (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28219	2021-01-21 23:34:10 +02:00
Konstantin Belousov	0781c79d48	Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE. UMA page_alloc() does not take an alignment, so UMA can only handle alignment less then page size. Noted by: alc Reviewed by: alc, markj (previous version) Discussed with: jrtc27 Tested by: pho (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28219	2021-01-21 23:34:10 +02:00
Jamie Gritton	6754ae2572	jail: Use refcount(9) for prison references. Use refcount(9) for both pr_ref and pr_uref in struct prison. This allows prisons to held and freed without requiring the prison mutex. An exception to this is that dropping the last reference will still lock the prison, to keep the guarantee that a locked prison remains valid and alive (provided it was at the time it was locked). Among other things, this honors the promise made in a comment in crcopy(9), that it will not block, which hasn't been true for two decades.	2021-01-20 15:08:27 -08:00
Vladimir Kondratyev	e3dd8ed77b	devinfo sysctl handler: Do not write zero-length strings in to sbuf twice This fixes missing PnPinfo and location strings in devinfo(8) output for devices with no attached drivers.	2021-01-21 02:06:16 +03:00
Alan Somers	2247f48941	aio: micro-optimize the lio_opcode assignments This allows slightly more efficient opcode testing in-kernel. It is transparent to userland, except to applications that sneakily submit aio fsync or aio mlock operations via lio_listio, which has never been documented, requires the use of deliberately undefined constants (LIO_SYNC and LIO_MLOCK), and is arguably a bug. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D27942	2021-01-20 09:02:25 -07:00
Alex Richardson	7e99c034f7	Emit uprintf() output for initproc if there is no controlling terminal This patch helped me debug why /sbin/init was not being loaded after making changes to the image activator in CheriBSD. Reviewed By: jhb (earlier version), kib Differential Revision: https://reviews.freebsd.org/D28121	2021-01-20 09:54:46 +00:00
Mateusz Guzik	2171b8e8a2	cache: augment sdt probe in cache_fplookup_dot Same as 6d386b4c ("cache: save a branch in cache_fplookup_next")	2021-01-20 07:23:14 +00:00
Mateusz Guzik	aae03cfe64	cache: whitespace nit in cache_fplookup_modifying	2021-01-20 07:22:04 +00:00
Mark Johnston	4dc1b17dbb	ktls: Improve handling of the bind_threads tunable a bit - Only check for empty domains if we actually tried to configure domain affinity in the first place. Otherwise setting bind_threads=1 will always cause the sysctl value to be reported as zero. This is harmless since the threads end up being bound, but it's confusing. - Try to improve the sysctl description a bit. Reviewed by: gallatin, jhb Submitted by: Klara, Inc. Sponsored by: Ampere Computing Differential Revision: https://reviews.freebsd.org/D28161	2021-01-19 21:32:33 -05:00
Mateusz Guzik	38baca17e0	lockmgr: fix upgrade TRYUPGRADE requests kept failing when they should not have due to wrong macro used to count readers. Fixes: f6b091fbbd77cbb0 ("lockmgr: rewrite upgrade to stop always dropping the lock") Noted by: asomers Differential Revision: https://reviews.freebsd.org/D27947	2021-01-19 12:21:38 +00:00
Mateusz Guzik	57dab0292a	cache: fix some typos	2021-01-19 10:17:14 +01:00
Mateusz Guzik	84ab77ad27	cache: drop-write only var from cache_fplookup_preparse	2021-01-19 10:13:30 +01:00
Mateusz Guzik	6d386b4c8a	cache: save a branch in cache_fplookup_next Previously the code would branch on top find out whether it should branch on SDT probe and bumping the numposhits counter, depending on cache_fplookup_cross_mount. Arguably it should be done regardless of what said function returns.	2021-01-19 10:08:24 +01:00
Jamie Gritton	effad35ed1	jail: Clean up some function placement and improve comments. Move prison_hold, prison_hold_locked ,prison_proc_hold, and prison_proc_free to a more intuitive part of the file (together with with prison_free and prison_free_locked), and add or improve comments to these and others, to better describe what's going in the prison reference cycle. No functional changes.	2021-01-18 17:23:51 -08:00
Oleksandr Tymoshenko	248f0cabca	make maximum interrupt number tunable on ARM, ARM64, MIPS, and RISC-V Use a machdep.nirq tunable intead of compile-time constant NIRQ as a value for maximum number of interrupts. It allows keep a system footprint small by default with an option to increase the limit for large systems like server-grade ARM64 Reviewd by: mhorne Differential Revision: https://reviews.freebsd.org/D27844 Submitted by: Klara, Inc. Sponsored by: Ampere Computing	2021-01-18 16:36:39 -08:00
Jamie Gritton	83bc72a04e	jail: Fix a stray mutex from 76ad42abf9d4.	2021-01-18 15:47:09 -08:00
Jamie Gritton	76ad42abf9	jail: Add prison_isvalid() and prison_isalive() prison_isvalid() checks if a prison record can be used at all, i.e. pr_ref > 0. This filters out prisons that aren't fully created, and those that are either in the process of being dismantled, or will be at the next opportunity. While the check for pr_ref > 0 is simple enough to make without a convenience function, this prepares the way for other measures of prison validity. prison_isalive() checks not only validity as far as the useablity of the prison structure, but also whether the prison is visible to user space. It replaces a test for pr_uref > 0, which is currently only used within kern_jail.c, and not often there. Both of these functions also assert that either the prison mutex or allprison_lock is held, since it's generally the case that unlocked prisons aren't guaranteed to remain useable for any length of time. This isn't entirely true, for example a thread can assume its own prison is good, but most exceptions will exist inside of kern_jail.c.	2021-01-18 10:56:20 -08:00
Konstantin Belousov	36bcc44e2c	Add ddb 'show timecounter' command. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-01-18 09:51:48 +02:00
Jamie Gritton	25c2c952e3	jail: Add proper prison locking in mqfs_prison_remove.	2021-01-17 17:41:09 -08:00
Konstantin Belousov	3b15beb30b	Implement malloc_domainset_aligned(9). Change the power-of-two malloc zones to require alignment equal to the size []. Current uma allocator already provides such alignment, so in fact this change does not change anything except providing future-proof setup. Suggested by: markj [] Reviewed by: andrew, jah, markj Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28147	2021-01-17 19:29:05 +02:00
Mateusz Guzik	fe258f23ef	Save on getpid in setproctitle by supporting -1 as curproc.	2021-01-16 09:36:54 +01:00
Kirk McKusick	79a5c790bd	Eliminate a locking panic when cleaning up UFS snapshots after a disk failure. Each vnode has an embedded lock that controls access to its contents. However vnodes describing a UFS snapshot all share a single snapshot lock to coordinate their access and update. As part of mounting a UFS filesystem with snapshots, each of the vnodes describing a snapshot has its individual lock replaced with the snapshot lock. When the filesystem is unmounted the vnode's original lock is returned replacing the snapshot lock. When a disk fails while the UFS filesystem it contains is still mounted (for example when a thumb drive is removed) UFS forcibly unmounts the filesystem. The loss of the drive causes the GEOM subsystem to orphan the provider, but the consumer remains until the filesystem has finished with the unmount. Information describing the snapshot locks was being prematurely cleared during the orphaning causing the return of the snapshot vnode's original locks to fail. The fix is to not clear the needed information prematurely. Sponsored by: Netflix	2021-01-15 16:36:42 -08:00
Mitchell Horne	818390ce0c	arm64: fix early devmap assertion The purpose of this KASSERT is to ensure that we do not run out of space in the early devmap. However, the devmap grew beyond its initial size of 2MB in r336519, and this assertion did not grow with it. A devmap mapping of a 1080p framebuffer requires 1920x1080 bytes, or 1.977 MB, so it is just barely able to fit without triggering the assertion, provided no other devices are mapped before it. With the addition of `options GDB` in GENERIC by bbfa199cbc16, the uart is now mapped for the purposes of a debug port, before mapping the framebuffer. The presence of both these conditions pushes the selected virtual address just below the threshold, triggering the assertion. To fix this, use the correct size of the devmap, defined by PMAP_MAPDEV_EARLY_SIZE. Since this code is shared with RISC-V, define it for that platform as well (although it is a different size). PR: 25241 Reported by: gbe MFC after: 3 days Sponsored by: The FreeBSD Foundation	2021-01-13 17:27:44 -04:00
Mateusz Guzik	ef23df1354	vfs: set NC_KEEPPOSENTRY alongside NOCACHE when creating a file Arguably the entire NOCACHE logic should get retired, in the meantime at least prevent the code from evicting existing entries.	2021-01-13 15:29:34 +00:00
Mateusz Guzik	5753be8e43	fd: add refcount argument to falloc_noinstall This lets callers avoid atomic ops by initializing the count to required value from the get go. While here add falloc_abort to backpedal from this without having to fdrop.	2021-01-13 15:29:34 +00:00
Mateusz Guzik	5171310e66	vfs: use finstall_refed in openat This avoids 2 atomic ops in the common case: 1 to grab an extra reference and 1 to release it.	2021-01-13 03:30:38 +00:00
Mateusz Guzik	530b699a62	fd: add finstall_refed Can be used to consume an already existing reference and consequently avoid atomic ops.	2021-01-13 03:27:03 +01:00
Mateusz Guzik	4faa375cdd	fd: provide a dedicated closef variant for unix socket code This avoids testing for td != NULL.	2021-01-13 03:27:03 +01:00
Konstantin Belousov	0659df6fad	vm_map_protect: allow to set prot and max_prot in one go. This prevents a situation where other thread modifies map entries permissions between setting max_prot, then relocking, then setting prot, confusing the operation outcome. E.g. you can get an error that is not possible if operation is performed atomic. Also enable setting rwx for max_prot even if map does not allow to set effective rwx protection. Reviewed by: brooks, markj (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28117	2021-01-13 01:35:22 +02:00
Mateusz Guzik	70ba77706d	vfs: extend vfs:namei:lookup:return probe with nameidata	2021-01-12 13:35:27 +00:00
Mateusz Guzik	cdb62ab74e	vfs: add NDFREE_NOTHING and convert several NDFREE_PNBUF callers Check the comment above the routine for reasoning.	2021-01-12 13:16:10 +00:00
Mateusz Guzik	6b3a9a0f3d	Convert remaining cap_rights_init users to cap_rights_init_one semantic patch: @@ expression rights, r; @@ - cap_rights_init(&rights, r) + cap_rights_init_one(&rights, r)	2021-01-12 13:16:10 +00:00
Konstantin Belousov	57f22c828e	sigfastblock: do not skip cursig/postsig loop in ast() Even if sigfastblock block is non-zero, non-blockable signals must be checked on ast and delivered now. This also affects debugger ability to attach, because issignal() also calls ptracestop() if there is a pending stop for debugee. Instead of checking for sigfastblock, and either setting PENDING flag for usermode or doing signal delivery loop, always do the loop after checking, and then handle PENDING bit. issignal() already does the right thing for fast-blocked case, allowing only STOPs and SIGKILL delivery to happen. Reported by: Vasily Postnicov <shamaz.mazum@gmail.com>, markj Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28089	2021-01-12 12:45:26 +02:00
Konstantin Belousov	513320c0f1	sigfastblock_setpend(): do not set PEND user flag unless TDP_SIGFASTPENDING is set. User pending bit should not be set if kernel did not noted a pending signal. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28089	2021-01-12 12:43:34 +02:00
Alan Somers	ff1a307801	lio_listio: validate aio_lio_opcode Previously, we would accept any kind of LIO_* opcode, including ones that were intended for in-kernel use only like LIO_SYNC (which is not defined in userland). The situation became more serious with 022ca2fc7fe08d51f33a1d23a9be49e6d132914e. After that revision, setting aio_lio_opcode to LIO_WRITEV or LIO_READV would trigger an assertion. Note that POSIX does not specify what should happen if aio_lio_opcode is invalid. MFC-with: 022ca2fc7fe08d51f33a1d23a9be49e6d132914e Reviewed by: jhb, tmunro, 0mp Differential Revision: <https://reviews.freebsd.org/D28078	2021-01-11 19:53:01 -07:00
Jason A. Harmening	e8a5a1ad71	rctl(4): support throttling resource usage to 0 For rate-based resources that support throttling (e.g. readiops/writeips), this fixes a divide-by-zero panic when rctl(8) passes 0 as the throttle value. For these resources, treat zero-throttle requests as requests to suspend forward progress as long as possible using the duration specified in kern.racct.rctl.throttle_max. PR: 251803 Reported by: chris@cretaforce.gr Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27858	2021-01-11 15:36:57 -08:00
Konstantin Belousov	4ea65707d3	exec_new_vmspace: print useful error message on ctty if stack cannot be mapped. After old vmspace is destroyed during execve(2), but before the new space is fully constructed, an error during image activation cannot be returned because there is no executing program to receive it. In the relatively common case of failure to map stack, print some hints on the control terminal. Note that user has enough knobs to cause stack mapping error, and this is the most common reason for execve(2) aborting the process. Requested by: jhb Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050	2021-01-12 01:15:43 +02:00
Konstantin Belousov	2e1c94aa1f	Implement enforcing write XOR execute mapping policy. It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE \| PROT_EXEC are never specified together, if vm_map has MAP_WX flag set. FreeBSD control flag allows specific binary to request WX exempt, and there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/ disable globally. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050	2021-01-12 01:15:43 +02:00
Robert Watson	30b68ecda8	Changes that improve DTrace FBT reliability on freebsd/arm64: - Implement a dtrace_getnanouptime(), matching the existing dtrace_getnanotime(), to avoid DTrace calling out to a potentially instrumentable function. (These should probably both be under KDTRACE_HOOKS. Also, it's not clear to me that they are correct implementations for the DTrace thread time functions they are used in .. fixes for another commit.) - Don't allow FBT to instrument functions involved in EL1 exception handling that are involved in FBT trap processing: handle_el1h_sync() and do_el1h_sync(). - Don't allow FBT to instrument DDB and KDB functions, as that makes it rather harder to debug FBT problems. Prior to these changes, use of FBT on FreeBSD/arm64 rapidly led to kernel panics due to recursion in DTrace. Reliable FBT on FreeBSD/arm64 is reliant on another change from @andrew to have the aarch64 instrumentor more carefully check that instructions it replaces are against the stack pointer, which can otherwise lead to memory corruption. That change remains under review. MFC after: 2 weeks Reviewed by: andrew, kp, markj (earlier version), jrtc27 (earlier version) Differential revision: https://reviews.freebsd.org/D27766	2021-01-11 15:42:22 +00:00
Robert Watson	4f2cbaf3cd	Track pipe(2) reads and writes as rusage message receives and sends, a feature misplaced during the transition from BSD 4.4's socket implementation to the optimised FreeBSD pipe implementation. MFC after: 1 week Reviewed by: arichardson, imp Differential Revision: https://reviews.freebsd.org/D27878	2021-01-10 12:16:39 +00:00
Jamie Gritton	2a4b225146	jail: Simplify handling of prison_deref() Track the the current lock/reference state in a single variable, rather than deducing the proper prison_deref() flags from a combination of equations and hard-coded values.	2021-01-09 21:05:06 -08:00
Konstantin Belousov	5844bd058a	jobc: rework detection of orphaned groups. Instead of trying to maintain pg_jobc counter on each process group update (and sometimes before), just calculate the counter when needed. Still, for the benefit of the signal delivery code, explicitly mark orphaned groups as such with the new process group flag. This way we prevent bugs in the corner cases where updates to the counter were missed due to complicated configuration of p_pptr/p_opptr/real_parent (debugger). Since we need to iterate over all children of the process on exit, this change mostly affects the process group entry and leave, where we need to iterate all process group members to detect orpaned status. (For MFC, keep pg_jobc around but unused). Reported by: jhb Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:20 +02:00
Konstantin Belousov	cf4f802e77	kinfo_proc: move job-control related data collection into a new helper. This improves code structure and allows to put the lock asserts right into place where the locks are needed. Also move zeroing of the kinfo_proc structure from fill_kinfo_proc_only() to fill_kinfo_proc(), this looks more symmetrical. Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:20 +02:00
Konstantin Belousov	4daea93813	Lock proctree in around fill_kinfo_proc(). Proctree lock is needed for correct calculation and collection of the job-control related data in kinfo_proc. There was even an XXX comment about it. Satisfy locking and lock ordering requirements by taking proctree lock around pass over each bucket in proc_iterate(), and in sysctl_kern_proc() and note_procstat_proc() for individual process reporting. Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:20 +02:00
Konstantin Belousov	a008bdeda3	tty_wait_background: improve locking. Increase the scope of the process group lock ownership. This ensures that we are consistent in returning EIO for tty write from an orphan and delivery of TTYOUT signals. Reviewed by: jilles Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:20 +02:00
Konstantin Belousov	ef739c7373	pgrp: Prevent use after free. Often, we have a process locked and need to get locked process group. In this case, because progress group lock is before process lock, unlocking process allows the group to be freed. See for instance tty_wait_background(). Make pgrp structures allocated from nofree zone, and ensure type stability of the pgrp mutex. Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:19 +02:00
Konstantin Belousov	e0d83cd3e4	issignal(): when handling STOP-like signals, drop sigacts mutex earlier. Reviewed by: jilles Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:19 +02:00
Konstantin Belousov	993a1699b1	Style. Improve some KASSERTs messages. Reviewed by: jilles Tested by: pho MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871	2021-01-10 04:41:19 +02:00

1 2 3 4 5 ...

18110 Commits