freebsd-nq

Author	SHA1	Message	Date
Edward Tomasz Napierala	460b4b550d	Implement unprivileged chroot This builds on recently introduced NO_NEW_PRIVS flag to implement unprivileged chroot, enabled by `security.bsd.unprivileged_chroot`. It allows non-root processes to chroot(2), provided they have the NO_NEW_PRIVS flag set. The chroot(8) utility gets a new flag, -n, which sets NO_NEW_PRIVS before chrooting. Reviewed By: kib Sponsored By: EPSRC Relnotes: yes Differential Revision: https://reviews.freebsd.org/D30130 (cherry picked from commit `a40cf4175c`)	2022-02-14 18:42:21 +00:00
Mark Johnston	2a454b54bf	Fix the build after commit `5fa005e915` Fixes: `5fa005e915` ("exec: Reimplement stack address randomization")	2022-02-16 13:32:18 -05:00
John Baldwin	1a9f14cfa5	Use vmspace->vm_stacktop in place of sv_usrstack in more places. Reviewed by: markj Obtained from: CheriBSD (cherry picked from commit `becaf6433b`)	2022-02-16 11:55:37 -05:00
Mark Johnston	5fa005e915	exec: Reimplement stack address randomization The approach taken by the stack gap implementation was to insert a random gap between the top of the fixed stack mapping and the true top of the main process stack. This approach was chosen so as to avoid randomizing the previously fixed address of certain process metadata stored at the top of the stack, but had some shortcomings. In particular, mlockall(2) calls would wire the gap, bloating the process' memory usage, and RLIMIT_STACK included the size of the gap so small (< several MB) limits could not be used. There is little value in storing each process' ps_strings at a fixed location, as only very old programs hard-code this address; consumers were converted decades ago to use a sysctl-based interface for this purpose. Thus, this change re-implements stack address randomization by simply breaking the convention of storing ps_strings at a fixed location, and randomizing the location of the entire stack mapping. This implementation is simpler and avoids the problems mentioned above, while being unlikely to break compatibility anywhere the default ASLR settings are used. The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack, and is re-enabled by default. PR: 260303 Reviewed by: kib Discussed with: emaste, mw Sponsored by: The FreeBSD Foundation (cherry picked from commit `1811c1e957`)	2022-02-16 11:55:03 -05:00
Mark Johnston	e3b852f99b	ktls: Disallow transmitting empty frames outside of TLS 1.0/CBC mode There was nothing preventing one from sending an empty fragment on an arbitrary KTLS TX-enabled socket, but ktls_frame() asserts that this could not happen. Though the transmit path handles this case for TLS 1.0 with AES-CBC, we should be strict and allow empty fragments only in modes where it is explicitly allowed. Modify sosend_generic() to reject writes to a KTLS-enabled socket if the number of data bytes is zero, so that userspace cannot trigger the aforementioned assertion. Add regression tests to exercise this case. Reported by: syzkaller Reviewed by: gallatin, jhb Sponsored by: The FreeBSD Foundation (cherry picked from commit `5de79eeddb`)	2022-02-16 11:52:31 -05:00
Mark Johnston	7ac2a6354f	file: Make fget() and getvnode() consistent about initializing fpp Most fget() functions initialize the output parameter to NULL. Make the externally visible interface behave consistently, and make fget_unlocked_seq() private to kern_descrip.c. This fixes at least one bug in a consumer, _filemon_wrapper_openat(), which assumes that getvnode() sets the output file pointer to NULL upon an error. Reported by: syzbot+01c0459408f896a5933a@syzkaller.appspotmail.com Reviewed by: kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `300cfb96fc`)	2022-02-16 11:52:31 -05:00
Justin Hibbits	2053dee56a	Fix gzip compressed core dumps on big endian architectures The gzip trailer words (size and CRC) are both little-endian per the spec. MFC after: 3 days Sponsored by: Juniper Networks, Inc. (cherry picked from commit `6db44b0158`)	2022-02-14 13:30:52 -06:00
Dimitry Andric	ae76550171	tty_info: Avoid warning by using logical instead of bitwise operators Since TD_IS_RUNNING() and TS_ON_RUNQ() are defined as logical expressions involving '==', clang 14 warns about them being checked with a bitwise operator instead of a logical one: ``` sys/kern/tty_info.c:124:9: error: use of bitwise '\|' with boolean operands [-Werror,-Wbitwise-instead-of-logical] runa = TD_IS_RUNNING(td) \| TD_ON_RUNQ(td); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \|\| sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING' ^ sys/kern/tty_info.c:124:9: note: cast one or both operands to int to silence this warning sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING' ^ sys/kern/tty_info.c:129:9: error: use of bitwise '\|' with boolean operands [-Werror,-Wbitwise-instead-of-logical] runb = TD_IS_RUNNING(td2) \| TD_ON_RUNQ(td2); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \|\| sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING' ^ sys/kern/tty_info.c:129:9: note: cast one or both operands to int to silence this warning sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING' ^ ``` Fix this by using logical operators instead. No functional change intended. Reviewed by: cem, emaste, kevans, markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34186 (cherry picked from commit `7d8a4eb943`)	2022-02-11 17:43:03 +01:00
Colin Percival	baee6cc181	x86: Speed up clock calibration Prior to this commit, the TSC and local APIC frequencies were calibrated at boot time by measuring the clocks before and after a one-second sleep. This was simple and effective, but had the disadvantage of requiring a one-second sleep. Rather than making two clock measurements (before and after sleeping) we now perform many measurements; and rather than simply subtracting the starting count from the ending count, we calculate a best-fit regression between the target clock and the reference clock (for which the current best available timecounter is used). While we do this, we keep track of an estimate of the uncertainty in the regression slope (aka. the ratio of clock speeds), and stop measuring when we believe the uncertainty is less than 1 PPM. In order to avoid the risk of aliasing resulting from the data-gathering loop synchronizing with (a multiple of) the frequency of the reference clock, we add some additional spinning depending upon the iteration number. For numerical stability and simplicity of implementation, we make use of floating-point arithmetic for the statistical calculations. On the author's Dell laptop, this reduces the time spent in calibration from 2000 ms to 29 ms; on an EC2 c5.xlarge instance, it is reduced from 2000 ms to 2.5 ms. Reviewed by: bde (previous version), kib Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D33802 (cherry picked from commit `c2705ceaeb`)	2022-02-10 22:52:00 -08:00
Kyle Evans	00bc7bbde5	sched: separate out schedinit_ap() schedinit_ap() sets up an AP for a later call to sched_throw(NULL). Currently, ULE sets up some pcpu bits and fixes the idlethread lock with a call to sched_throw(NULL); this results in a window where curthread is setup in platforms' init_secondary(), but it has the wrong td_lock. Typical platform AP startup procedure looks something like: - Setup curthread - ... other stuff, including cpu_initclocks_ap() - Signal smp_started - sched_throw(NULL) to enter the scheduler cpu_initclocks_ap() may have callouts to process (e.g., nvme) and attempt to sched_add() for this AP, but this attempt fails because of the noted violated assumption leading to locking heartburn in sched_setpreempt(). Interrupts are still disabled until cpu_throw() so we're not really at risk of being preempted -- just let the scheduler in on it a little earlier as part of setting up curthread. (cherry picked from commit `589aed00e3`)	2022-02-10 14:55:29 -06:00
Kyle Evans	7393eedb03	execve: disallow argc == 0 The manpage has contained the following verbiage on the matter for just under 31 years: "At least one argument must be present in the array" Previous to this version, it had been prefaced with the weakening phrase "By convention." Carry through and document it the rest of the way. Allowing argc == 0 has been a source of security issues in the past, and it's hard to imagine a valid use-case for allowing it. Toss back EINVAL if we ended up not copying in any args for *execve(). The manpage change can be considered "Obtained from: OpenBSD" (cherry picked from commit `773fa8cd13`) (cherry picked from commit `c9afc7680f`)	2022-02-10 14:21:59 -06:00
Hans Petter Selasky	22ba297076	mbuf(9): Assert receive mbufs don't carry a send tag. Else we would start leaking reference counts. Discussed with: jhb@ Sponsored by: NVIDIA Networking (cherry picked from commit `17cbcf33c3`)	2022-02-10 16:11:22 +01:00
Gordon Bergling	6a3607622e	kern_racct: Fix a typo in a source code comment - s/maxumum/maximum/ (cherry picked from commit `a9bee9c77a`)	2022-02-09 07:19:50 +01:00
Gordon Bergling	b9c307bc77	kern_fflock: Fix a typo in a source code comment - s/foward/forward/ (cherry picked from commit `5a78ec9e7c`)	2022-02-09 07:18:00 +01:00
Ed Maste	94e6d14488	Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights These ones were unambiguous cases where the Foundation was the only listed copyright holder (in the associated license block). Sponsored by: The FreeBSD Foundation (cherry picked from commit `9feff969a0`)	2022-02-08 15:00:55 -05:00
Konstantin Belousov	15def34bd8	Add GB_NOWITNESS flag (cherry picked from commit `c02780b78c`)	2022-02-07 11:38:50 +02:00
Konstantin Belousov	7782d71671	syncer VOP_FSYNC(): unlock syncer vnode around call to VFS_SYNC() (cherry picked from commit `3d68c4e175`)	2022-02-07 11:38:50 +02:00
Konstantin Belousov	4116ae3ece	buf_alloc(): lock the buffer with LK_NOWAIT (cherry picked from commit `5875b94c74`)	2022-02-07 11:38:49 +02:00
Konstantin Belousov	78d27f25c7	Use dedicated lock name for pbufs (cherry picked from commit `531f8cfea0`)	2022-02-07 11:38:49 +02:00
Alexander Motin	c27237d62f	Reduce bufdaemon/bufspacedaemon shutdown time. Before this change bufdaemon and bufspacedaemon threads used kthread_shutdown() to stop activity on system shutdown. The problem is that kthread_shutdown() has no idea about the wait channel and lock used by specific thread to wake them up reliably. As result, up to 9 threads could consume up to 9 seconds to shutdown for no good reason. This change introduces specific shutdown functions, knowing how to properly wake up specific threads, reducing wait for those threads on shutdown/reboot from average 4 seconds to effectively zero. MFC after: 2 weeks Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D33936 (cherry picked from commit `b7ff445ffa`)	2022-02-01 19:53:10 -05:00
Mark Johnston	40d6b2a362	exec: Remove the stack gap implementation ASLR stack randomization will reappear in a forthcoming commit. Rather than inserting a random gap into the stack mapping, the entire stack mapping itself will be randomized in the same way that other mappings are when ASLR is enabled. No functional change intended, as the stack gap implementation is currently disabled by default. Reviewed by: kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `758d98debe`)	2022-01-31 09:48:57 -05:00
Mark Johnston	947e849150	sysent: Add a sv_psstringssz field to struct sysentvec The size of the ps_strings structure varies between ABIs, so this is useful for computing the address of the ps_strings structure relative to the top of the stack when stack address randomization is enabled. Reviewed by: kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `3fc21fdd5f`)	2022-01-31 09:48:11 -05:00
Mark Johnston	d247611467	exec: Introduce the PROC_PS_STRINGS() macro Rather than fetching the ps_strings address directly from a process' sysentvec, use this macro. With stack address randomization the ps_strings address is no longer fixed. Reviewed by: kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `706f4a81a8`)	2022-01-31 09:46:57 -05:00
Konstantin Belousov	fbdc76539b	Add security.bsd.allow_ptrace sysctl (cherry picked from commit `fe6db72708`)	2022-01-29 03:10:44 +02:00
Konstantin Belousov	704d2103c6	p_candebug(), p_cansee(): always allow for curproc (cherry picked from commit `55a0aa2162`)	2022-01-29 03:10:44 +02:00
Jessica Clarke	f63a2e288c	intrng: Use less confusing return value for intr_pic_add_handler Currently intr_pic_add_handler either returns the PIC you gave it (which is useless and risks causing confusion about whether it's creating another PIC) or, on error, NULL. Instead, convert it to return an int error code as one would expect. Note that the only consumer of this API, arm64's gicv3_its, does not use the return value, so no uses need updating to work with the revised API. Reviewed by: markj, mmel MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33341 (cherry picked from commit `a3e828c91d`)	2022-01-24 23:59:55 +00:00
Jessica Clarke	3c7f332f71	Fix buffer overread in preloaded hostuuid parsing Commit `b6be9566d2` stopped prison0_init writing outside of the preloaded hostuuid's bounds. However, the preloaded data will not (normally) have a NUL in it, and so validate_uuid will walk off the end of the buffer in its call to sscanf. Previously if there was any whitespace in the string we'd at least know there's a NUL one past the end due to the off-by-one error, but now no such byte is guaranteed. Fix this by copying to a temporary buffer and explicitly adding a NUL. Whilst here, change the strlcpy call to use a far less suspicious argument for dstsize; in practice it's fine, but it's an unusual pattern and not necessary. Found by: CHERI Reviewed by: emaste, kevans, jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33616 (cherry picked from commit `d2ef377430`)	2022-01-24 23:59:49 +00:00
Mark Johnston	c595625abe	Revert "kern_exec: Add kern.stacktop sysctl." The current ASLR stack gap feature will be removed, and with that the need for the kern.stacktop sysctl is gone. All consumers have been removed. This reverts commit `a97d697122`. Reviewed by: kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `1544f5add8`)	2022-01-24 09:20:21 -05:00
Mark Johnston	1a97674b46	setrlimit: Remove special handling for RLIMIT_STACK with a stack gap This will not be required with a forthcoming reimplementation of ASLR stack randomization. Moreover, this change was not sufficient to enable the use of a stack size limit smaller than the stack gap itself. PR: 260303 Reviewed by: kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `5a8413e779`)	2022-01-24 09:16:52 -05:00
Mark Johnston	9795d85d2e	posixshm: Report output buffer truncation from kern.ipc.posix_shm_list PR: 240573 Reviewed by: kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `dc7526170d`)	2022-01-24 09:16:37 -05:00
Alexander Motin	70201cc45d	Reduce minimum idle hardclock rate from 2Hz to 1Hz. On idle 80-thread system it allows to improve package-level idle state residency and so power consumption by several percent. MFC after: 2 weeks (cherry picked from commit `cb1f5d1136`)	2022-01-23 21:35:58 -05:00
Alexander Motin	15e1d8f69b	Fix inverse sleep logic in buf_daemon(). Before commit `3cec5c77d6` buf_daemon() went to longer 1s sleep if numdirtybuffers <= lodirtybuffers. After that commit new condition !BIT_EMPTY(BUF_DOMAINS, &bdlodirty) got opposite -- true when one or more more domains is above lodirtybuffers. As result, on freshly booted system with no dirty buffers buf_daemon() wakes up 10 times per second and probably only 1 time per second when there is actual work to do. MFC after: 1 week Reviewed by: kib, markj Tested by: pho Differential revision: https://reviews.freebsd.org/D33890 (cherry picked from commit `e76c010899`)	2022-01-23 14:57:35 -05:00
Michal Meloun	2ace1585b0	intrng: remove now redundant shadow variable. Should not be a functional change. Submitted by: ehem_freebsd@m5p.com Discussed in: https://reviews.freebsd.org/D29310 MFC after: 4 weeks (cherry picked from commit `e88c3b1b02`)	2022-01-20 11:08:45 +01:00
Michal Meloun	a3ccd06dd9	intrng: Releasing interrupt source should clear interrupt table full state. The first release of an interrupt in a situation where the interrupt table is full should schedule a full table check the next time an interrupt is allocated. A full check is necessary to ensure maximum separation between the order of allocation and the order of release. Submitted by: ehem_freebsd@m5p.com (initial version) Discussed in: https://reviews.freebsd.org/D29310 MFC after: 4 weeks (cherry picked from commit `a49f208d94`)	2022-01-20 11:07:44 +01:00
Mark Johnston	af30714ff4	fd: Avoid truncating output buffers for KERN_PROC_{CWD,FILEDESC} These sysctls failed to return an error if the caller had provided too short an output buffer. Change them to return ENOMEM instead, to ensure that callers can detect truncation in the face of a concurrently changing fd table. PR: 228432 Discussed with: cem, jhb (cherry picked from commit `36bd49ac4d`)	2022-01-16 10:40:25 -05:00
Konstantin Belousov	a5f6985995	Ignore debugger-injected signals left after detaching PR: 261010 (cherry picked from commit `a24afbb4e6`)	2022-01-15 14:55:32 +02:00
Konstantin Belousov	b4889992d7	Add vfs_remount_ro() (cherry picked from commit `4a4b059a97`)	2022-01-14 20:11:02 +02:00
Stefan Eßer	54e1dc50ec	sys/kern/sched_4bsd.c: fix typo introduced in previous commit (cherry picked from commit `ec3af9d0ca`)	2022-01-14 18:17:31 +02:00
Stefan Eßer	a94baf23cf	Restore variable aliasing in the context of cpu set operations (cherry picked from commit `a19bd8e30e`)	2022-01-14 18:17:31 +02:00
Stefan Eßer	dc4114875e	Make CPU_SET macros compliant with other implementations (cherry picked from commit `e2650af157`)	2022-01-14 18:17:30 +02:00
Konstantin Belousov	6e676b5550	Regen	2022-01-14 18:17:30 +02:00
Konstantin Belousov	a48d9f1900	Add sched_getcpu() (cherry picked from commit `77b2c2f814`)	2022-01-14 18:17:29 +02:00
Mark Johnston	1562fe492a	exec: Simplify sv_copyout_strings implementations a bit Simplify control flow around handling of the execpath length and signal trampoline. Cache the sysentvec pointer in a local variable. No functional change intended. Reviewed by: kib Sponsored by: The FreeBSD Foundation (cherry picked from commit `f04a096049`)	2022-01-14 08:50:06 -05:00
Colin Percival	972796d007	vfs_mountroot: Check for root dev before waiting If GEOM is idle but the root device is not yet present when we enter vfs_mountroot_wait_if_necessary, we call vfs_mountroot_wait to wait for root holds (e.g. CAM or USB initialization). Upon returning from vfs_mountroot_wait, we wait 100 ms at a time until the root device shows up. Since the root device most likely appeared during vfs_mountroot_wait -- waiting for subsystems which may be responsible for the root device is the whole purpose of that function -- it makes sense to check if the device is now present rather than printing a warning and pausing for 100 ms before checking. Reviewed by: trasz Fixes: `a3ba3d09c2` Make root mount wait mechanism smarter Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D33593 (cherry picked from commit `33812d60b9`)	2022-01-12 11:29:51 -08:00
Colin Percival	72f61780a4	vfs_mountroot: Wait for GEOM idle post root holds In the case of a root hold related to the initialization of a disk device, a flurry of GEOM tasting is likely to take place as soon as the device is initialized and the root hold is released. If we don't wait for GEOM idle it's easy for vfs_mountroot to "win" the race and proceed before the root filesystem GEOM is ready. Reviewed by: imp Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D33592 (cherry picked from commit `19a172158c`)	2022-01-12 11:29:48 -08:00
Colin Percival	d4724934f2	vfs_mountroot: Skip 'Root mount waiting' < 1 s While the message is technically correct, it's not particularly helpful in the case where we're only waiting a few ms; this case occurs frequently on EC2 arm64 instances with CAM initialization racing to release its root hold before vfs_mountroot reaches this point. Only print the message if we end up waiting for more than one second. Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D33591 (cherry picked from commit `e6db5eb9ec`)	2022-01-12 11:29:37 -08:00
Alexander Motin	034c2781d5	kern: Remove CTLFLAG_NEEDGIANT from some more sysctls. MFC after: 2 weeks (cherry picked from commit `c6c52d8e39`)	2022-01-09 19:30:09 -05:00
Alexander Motin	5ec6907c0a	kern: Remove CTLFLAG_NEEDGIANT from some sysctls. MFC after: 2 weeks (cherry picked from commit `fe27f1db5f`)	2022-01-08 20:24:10 -05:00
Hans Petter Selasky	a889d262a7	Remove dead code. The variable orig_resid is always set to zero right after the while loop where it is cleared. Reviewed by: gallatin@ and glebius@ Differential Revision: https://reviews.freebsd.org/D33589 Sponsored by: NVIDIA Networking (cherry picked from commit `f9978339d1`)	2022-01-07 14:08:59 +01:00
Alexander Motin	b7668d009e	Make CPU children explicitly share parent unit numbers. Before this device unit number match was coincidental and broke if I disabled some CPU device(s). Aside of cosmetics, for some drivers (may be considered broken) it caused talking to wrong CPUs. (cherry picked from commit `d3a8f98acb`)	2022-01-04 12:21:42 -05:00

1 2 3 4 5 ...

18633 Commits