freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	7e1d3eefd4	vfs: remove the unused thread argument from NDINIT* See `b4a58fbf64` ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.	2021-11-25 22:50:42 +00:00
Mateusz Guzik	c40fee6f7d	vfs: drop the always curthread argument from kern_alternate_path	2021-11-25 22:50:42 +00:00
Mark Johnston	b11e6fd75b	link_elf_obj: Process global ifunc relocs after other global relocs This is needed to ensure that resolvers that reference global symbols return correct results. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33120	2021-11-25 16:53:27 -05:00
Brooks Davis	be67ea40c5	freebsd32: generate from sys/kern/syscalls.master This avoids the need to keep a freebsd32-specific syscalls.master in sync with the default ABI. As evidenced by the number of commits required to sync the two, it is extremely easy for them to get out of sync due to misunderstandings and user errors. Reviewed by: kevans, kib	2021-11-22 22:36:58 +00:00
Brooks Davis	799ce8b8d2	syscalls: annotate args pointing to long, pointer, or time_t Add _Contains_ annotations indicating that the data pointed to by a pointer argument contains types that vary between FreeBSD ABIs. The supported set is long (including size_t), pointer (including intptr_t), and time_t. The first two vary between 32- and 64-bit ABIs. The laste betwen i386 and everything else. These will be used to detect which syscalls require handling on particular ABIs. Reviewed by: kevans, kib	2021-11-22 22:36:58 +00:00
Brooks Davis	6b7c23a026	syscalls: regen	2021-11-22 22:36:57 +00:00
Brooks Davis	6eefabd4ca	syscalls: improve nstat, nfstat, nlstat Optionally return errors when truncating dev_t, ino_t, and nlink_t. In the interest of code reuse, use freebsd11_cvtstat() to perform the truncation and error handling and then convert the resulting struct freebsd11_stat to struct nstat. Add missing freebsd32 compat syscalls. These syscalls require translation because struct nstat contains four instances of struct timespec which in turn contains a time_t and a long. Reviewed by: kib	2021-11-22 22:36:56 +00:00
Brooks Davis	151ddfec6f	freebsd32: add _'s to _umtx_(un)lock This aligns with the default ABI's configuration. Reviewed by: kib	2021-11-22 22:36:55 +00:00
Brooks Davis	d330857439	syscalls: regen	2021-11-22 22:36:54 +00:00
Brooks Davis	00e0a4c0d7	syscalls: abort2 doesn't return so declare as void Reviewed by: kib	2021-11-22 22:36:54 +00:00
Brooks Davis	4b2e1f1480	syscalls: umask returns a mode_t Reviewed by: kib	2021-11-22 22:36:54 +00:00
Brooks Davis	27f5b514a0	syscalls: update a few return types to ssize_t Reviewed by: kib	2021-11-22 22:36:54 +00:00
Brooks Davis	717e7fb27a	syscalls: struct ucontext4 -> struct freebsd4_ucontext This aligns with struct freebsd4_ucontext32 in freebsd32. Reviewed by: kib	2021-11-22 22:36:54 +00:00
Brooks Davis	e58e9a8cbd	syscalls: regen	2021-11-22 22:36:54 +00:00
Brooks Davis	d8bd949beb	sys___sysctl: regularize argument struct Let makesyscalls generate the normal struct __sysctl_args structure. It works fine. Reviewed by: kib	2021-11-22 22:36:54 +00:00
Brooks Davis	97e4bec56d	syscalls: regen	2021-11-22 22:36:53 +00:00
Brooks Davis	88dfcfa2a0	sys_sigaltstack: use struct sigaltstack arg This is idential to stack_t and more amenable to prepending "32" to for freebsd32. Reviewed by: kib	2021-11-22 22:36:53 +00:00
Robert Wing	8981a100e6	mount: retire kernel_vmount() The last usage of this function was removed in `e3b1c847a4`. There are no in-tree consumers of kernel_vmount(). Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D32607	2021-11-20 10:22:28 -09:00
Mark Johnston	3339950117	timecounter: Initialize tc_lock earlier Hyper-V wants to register its MSR-based timecounter during SI_SUB_HYPERVISOR, before SI_SUB_LOCK, since an emulated 8254 may not be available for DELAY(). So we cannot use MTX_SYSINIT to initialize the timecounter lock. PR: 259878 Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33014	2021-11-19 17:29:28 -05:00
Mitchell Horne	588ab3c774	Allow minidumps to be performed on the live system Add a boolean parameter to minidumpsys(), to indicate a live dump. When requested, take a snapshot of important global state, and pass this to the machine-dependent minidump function. For now this includes the kernel message buffer, and the bitset of pages to be dumped. Beyond this, we don't take much action to protect the integrity of the dump from changes in the running system. A new function msgbuf_duplicate() is added for snapshotting the message buffer. msgbuf_copy() is insufficient for this purpose since it marks any new characters it finds as read. For now, nothing can actually trigger a live minidump. A future patch will add the mechanism for this. For simplicity and safety, live dumps are disallowed for mips. Reviewed by: markj, jhb MFC after: 2 weeks Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D31993	2021-11-19 15:05:53 -04:00
Mitchell Horne	1adebe3cd6	minidump: Parameterize minidumpsys() The minidump code is written assuming that certain global state will not change, and rightly so, since it executes from a kernel debugger context. In order to support taking minidumps of a live system, we should allow copies of relevant global state that is likely to change to be passed as parameters to the minidumpsys() function. This patch does the work of parameterizing this function, by adding a struct minidumpstate argument. For now, this struct allows for copies of the kernel message buffer, and the bitset that tracks which pages should be dumped (vm_page_dump). Follow-up changes will actually make use of these arguments. Notably, dump_avail[] does not need a snapshot, since it is not expected to change after system initialization. The existing minidumpsys() definitions are renamed, and a thin MI wrapper is added to kern_dump.c, which handles the construction of the state struct. Thus, calling minidumpsys() remains as simple as before. Reviewed by: kib, markj, jhb Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D31989	2021-11-19 15:05:52 -04:00
Gordon Bergling	15b5c347f1	sched_ule(4): Fix two typo in source code comments - s/conditons/conditions/ - s/unconditonally/unconditionally/ MFC after: 3 days	2021-11-19 19:13:28 +01:00
Wuyang Chung	8587d75255	Correct the name of the second parameter of biowait to wmesg This parameter is passed directly to msleep, and the name of the msleep parameter is wmesg. Make them match. Pull Request: https://github.com/freebsd/freebsd-src/pull/557	2021-11-18 23:26:33 -07:00
Konstantin Belousov	a7e4eb1422	Kernel linkers: some style Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential revision: https://reviews.freebsd.org/D32878	2021-11-18 15:56:23 +02:00
Alex Richardson	4082b189d2	elf*_brand_inuse: Change return type to bool. Reviewed by: kib Obtained from: CheriBSD Sponsored by: The University of Cambridge, Google Inc. Differential Revision: https://reviews.freebsd.org/D33052	2021-11-17 15:51:40 -08:00
Alex Richardson	1962164584	imgact_elf: Use bool instead of boolean_t. Reviewed by: kib Obtained from: CheriBSD Sponsored by: The University of Cambridge, Google Inc. Differential Revision: https://reviews.freebsd.org/D33051	2021-11-17 15:51:29 -08:00
Brooks Davis	265a4b8341	freebsd32: semid_t -> int32_t semid_t is historically an intptr_t so it should be an int32_t. Reviewed by: kevans	2021-11-17 20:12:26 +00:00
Brooks Davis	2b9d052d3e	freebsd32: fix getfsstat sign extension bugs Add freebsd32 versions of getfsstat and freebsd11_getfsstat so that bufsize is properly sign-extended if a negative value is passed. Reject negative values before passing to kern_getfsstat as a size_t. Reviewed by: kevans	2021-11-17 20:12:26 +00:00
Brooks Davis	e02f64d9b8	freebsd32: add real abort2 Previously, the code would copy twice as many pointers as specified and print pairs of them a single 64-bit pointer. abort2 doesn't return so make the return type void freebsd32_abort2 is in it's own file with a 2-clause BSD license based on a discussion with Wojciech many years ago. Reviewed by: kevans	2021-11-17 20:12:25 +00:00
Brooks Davis	0ebea13928	freebsd32: include `__` in semctl names This mirrors sys/kern/syscall.master and will simplify generation of freebsd32 files. Reviewed by: kevans	2021-11-17 20:12:24 +00:00
Brooks Davis	d35a771660	freebsd32: sync _umtx_op args with default ABI Reviewed by: kevans	2021-11-17 20:12:24 +00:00
Brooks Davis	3b0cd7e503	freebsd32: rename old SysV IPC types Move the 32 from ...32_old to ..._old32 to aid automatic generation. Reviewed by: kevans	2021-11-17 20:12:23 +00:00
Brooks Davis	e5b0997650	freebsd32: add a union semun_old32 Use this for COMPAT7 support. In practice it's the same as union semun32 since the pointers become uint32_t's the it's more symetric and is the logical thing to generate from semun_old. Reviewed by: kevans	2021-11-17 20:12:23 +00:00
Brooks Davis	85d1d2a675	syscalls: use struct siginfo rather than siginfo_t This allows freebsd32 to use struct siginfo32 with an automatable conversion. Reviewed by: kevans	2021-11-17 20:12:22 +00:00
Brooks Davis	f503288262	syscalls: fix type of osendmsg osendmsg takes an struct omsghdr * not a void *. Reviewed by: kevans	2021-11-17 20:12:22 +00:00
Brooks Davis	2385f4d172	syscalls: use __socklen_t as appropriate No functional change as __socklen_t is an int. Obtained from: CheriBSD Reviewed by: kevans	2021-11-17 20:12:22 +00:00
Brooks Davis	b64f3dc26c	syscalls: [gs]etitimer takes an int which Match the function decleration which takes an int not a signed int. No functional change as the range of valid values is 0-2. Obtained from: CheriBSD Reviewed by: kevans	2021-11-17 20:12:21 +00:00
Brooks Davis	b7fd86118f	syscalls: sprinkle in const values Add missing const qualifiers to a number of syscall arguments. Obtained from: CheriBSD Reviewed by: kevans	2021-11-17 20:12:21 +00:00
Kristof Provost	b6cbbcae40	m_get3(): actually use the selected zone Reported by: markj	2021-11-17 03:09:20 +01:00
Marcin Wojtas	b014e0f15b	Enable ASLR by default for 64-bit executables Address Space Layout Randomization (ASLR) is an exploit mitigation technique implemented in the majority of modern operating systems. It involves randomly positioning the base address of an executable and the position of libraries, heap, and stack, in a process's address space. Although over the years ASLR proved to not guarantee full OS security on its own, this mechanism can make exploitation more difficult. Tests on the tier 1 64-bit architectures demonstrated that the ASLR is stable and does not result in noticeable performance degradation, therefore it should be safe to enable this mechanism by default. Moreover its effectiveness is increased for PIE (Position Independent Executable) binaries. Thanks to commit `9a227a2fd6` ("Enable PIE by default on 64-bit architectures"), building from src is not necessary to have PIE binaries. It is enough to control usage of ASLR in the OS solely by setting the appropriate sysctls. This patch toggles the kernel settings to use address map randomization for PIE & non-PIE 64-bit binaries. It also disables SBRK, in order to allow utilization of the bss grow region for mappings. The latter has no effect if ASLR is disabled, so apply it to all architectures. As for the drawbacks, a consequence of using the ASLR is more significant VM fragmentation, hence the issues may be encountered in the systems with a limited address space in high memory consumption cases, such as buildworld. As a result, although the tests on 32-bit architectures with ASLR enabled were mostly on par with what was observed on 64-bit ones, the defaults for the former are not changed at this time. Also, for the sake of safety keep the feature disabled for 32-bit executables on 64-bit machines, too. The committed change affects the overall OS operation, so the following should be taken into consideration: * Address space fragmentation. * A changed ABI due to modified layout of address space. * More complicated debugging due to: * Non-reproducible address space layout between runs. * Some debuggers automatically disable ASLR for spawned processes, making target's environment different between debug and non-debug runs. In order to confirm/rule-out the dependency of any encountered issue on ASLR it is strongly advised to re-run the test with the feature disabled - it can be done by setting the following sysctls in the /etc/sysctl.conf file: kern.elf64.aslr.enable=0 kern.elf64.aslr.pie_enable=0 Co-developed by: Dawid Gorecki <dgr@semihalf.com> Reviewed by: emaste, kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential revision: https://reviews.freebsd.org/D27666	2021-11-16 23:16:09 +01:00
Mark Johnston	32854e528a	mbuf: Properly set the default value for mb_use_ext_pgs Reported by: Jenkins Fixes: `fcaa890c44` ("mbuf: Only allow extpg mbufs if the system has a direct map") Pointy hat: markj	2021-11-16 16:23:11 -05:00
Mark Johnston	fcaa890c44	mbuf: Only allow extpg mbufs if the system has a direct map Some upcoming changes will modify software checksum routines like in_cksum() to operate using m_apply(), which uses the direct map to access packet data for unmapped mbufs. This approach of course does not work on platforms without a direct map, so we have to disallow the use of unmapped mbufs on such platforms. I believe this is the right tradeoff: we only configure KTLS on amd64 and arm64 today (and one KTLS consumer, NFS TLS, requires a direct map already), and the use of unmapped mbufs with plain sendfile is a recent optimization. If need be, m_apply() could be modified to create CPU-private mappings of extpg mbuf pages as a fallback. So, change mb_use_ext_pgs to be hard-wired to zero on systems without a direct map. Note that PMAP_HAS_DMAP is not a compile-time constant on some systems, so the default value of mb_use_ext_pgs has to be determined during boot. Reviewed by: jhb Discussed with: gallatin MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32940	2021-11-16 13:31:04 -05:00
Mark Johnston	42188bb5c1	unix: Remove a write-only local variable Reported by: clang MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-11-16 13:30:22 -05:00
Kirk McKusick	f10a8d0971	Allow the MNT_FORCE flag to be passed through to an initial mount. When doing an initial mount(8) with its -f (force) flag, the MNT_FORCE flag is not passed through to the underlying filesystem mount routine. MNT_FORCE is only passed through on later updates to an existing mount. With this commit the MNT_FORCE flag is now passed through on the initial mount. Sanity check: kib Sponsored by: Netflix	2021-11-15 15:45:56 -08:00
Mark Johnston	2287ced2f5	clock: Group the "clocks" SYSINIT with the function definition This is how most SYSINITs are defined. Also annotate the dummy parameter with __unused. No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-11-15 16:13:24 -05:00
John Baldwin	900a28fe33	ktls: Reject some invalid cipher suites. - Reject AES-CBC cipher suites for TLS 1.0 and TLS 1.1 using auth algorithms other than SHA1-HMAC. - Reject AES-GCM cipher suites for TLS versions older than 1.2. Reviewed by: markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D32842	2021-11-15 11:30:12 -08:00
Brooks Davis	3f8ced5bce	syscalls: regen	2021-11-15 18:34:27 +00:00
Brooks Davis	8e4a3add99	struct kevent_freebsd11 -> struct freebsd11_kevent Rename to match the naming of syscalls and allow 32 to be appended without making an ugly name like kevent_freebsd1132. While here, make the kevent changelist argument const. Reviewed by: kib	2021-11-15 18:34:27 +00:00
Brooks Davis	f0da2a1467	syscalls: unwrap a long line Style dictates that each variable is on a single line Reviewed by: kib	2021-11-15 18:34:27 +00:00
Mark Johnston	d28af1abf0	vm: Add a mode to vm_object_page_remove() which skips invalid pages This will be used to break a deadlock in ZFS between the per-mountpoint teardown lock and page busy locks. In particular, when purging data from the page cache during dataset rollback, we want to avoid blocking on the busy state of invalid pages since the busying thread may be blocked on the teardown lock in zfs_getpages(). Add a helper, vn_pages_remove_valid(), for use by filesystems. Bump __FreeBSD_version so that the OpenZFS port can make use of the new helper. PR: 258208 Reviewed by: avg, kib, sef Tested by: pho (part of a larger patch) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32931	2021-11-15 13:01:30 -05:00
Mark Johnston	1fb99e97e9	bus: Make BUS_TRANSLATE_RESOURCE behave more like other bus methods - Return an errno value upon failure, instead of 1. - Provide a bus_translate_resource() wrapper. - Implement the generic version, which traverses the hierarchy until a bus driver with a non-trivial implementation is found, in subr_bus.c like other similar default implementations. - Make ofw_pcib_translate_resource() return an error if a matching PCI address range is not found. - Make generic_pcie_translate_resource_common() return an int instead of a bool. Fix up callers. No functional change intended. Reviewed by: imp, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32855	2021-11-15 13:01:30 -05:00
Konstantin Belousov	8660813153	start_init: use 'p' Sponsored by: The FreeBSD Foundation MFC after: 3 days	2021-11-15 02:33:01 +02:00
Mateusz Guzik	e9c7ec2287	aio: whack "set but not used" warnings	2021-11-14 16:59:53 +00:00
Mateusz Guzik	7e9680d3be	cache: whack "set but not used" warnings	2021-11-14 16:57:43 +00:00
Konstantin Belousov	d032cda0d0	DEBUG_VFS_LOCKS: stop excluding devfs and doomed vnode from asserts We do not require devvp vnode locked for metadata io. It is typically not needed indeed, since correctness of the file system using corresponding block device ensures that there is no incorrect or racy manipulations. But right now DEBUG_VFS_LOCKS option excludes both character device vnodes and completely destroyed (VBAD) vnodes from asserts. This is not too bad since WITNESS still ensures that we do not leak locks. On the other hand, asserts do not mean what they should, to the reader, and reliance on them being enforced might result in wrong code. Note that ASSERT_VOP_LOCKED() still silently accepts NULLVP, I think it is worth fixing as well, in the next round. In collaboration with: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D32761	2021-11-13 01:02:42 +02:00
Konstantin Belousov	47b248ac65	Make locking assertions for VOP_FSYNC() and VOP_FDATASYNC() more correct For devfs vnodes, it is fine to not lock vnodes for VOP_FSYNC(). Otherwise vnode must be locked exclusively, except for MNT_SHARED_WRITES() where the shared lock is enough. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32761	2021-11-13 01:02:13 +02:00
Konstantin Belousov	d1d675cb30	freevnode(): lock the freeing vnode around destroy_vpollinfo() to satisfy locking requirements of knlist manipulations. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32761	2021-11-13 01:01:02 +02:00
Konstantin Belousov	a7b4a54d2c	getblk(): do not require devvp vnodes to be locked Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32761	2021-11-13 01:00:24 +02:00
Mark Johnston	ac2b544417	mbuf: Fix an offset calculation in m_apply_extpg_one() We were not including the requested starting offset in the page offset. Reviewed by: jhb Fixes: `3c7a01d773` ("Extend m_apply() to support unmapped mbufs.") Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32922	2021-11-10 16:57:12 -05:00
Konstantin Belousov	439c3d9563	Regen	2021-11-10 21:18:54 +02:00
Konstantin Belousov	77b2c2f814	Add sched_getcpu() for compatibility with Linux. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32901	2021-11-10 21:18:54 +02:00
John Baldwin	e3ba94d4f3	Don't require the socket lock for sorele(). Previously, sorele() always required the socket lock and dropped the lock if the released reference was not the last reference. Many callers locked the socket lock just before calling sorele() resulting in a wasted lock/unlock when not dropping the last reference. Move the previous implementation of sorele() into a new sorele_locked() function and use it instead of sorele() for various places in uipc_socket.c that called sorele() while already holding the socket lock. The sorele() macro now uses refcount_release_if_not_last() try to drop the socket reference without locking the socket. If that shortcut fails, it locks the socket and calls sorele_locked(). Reviewed by: kib, markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32741	2021-11-09 10:50:12 -08:00
John Baldwin	57093f9366	vfs: Consistently validate AT_* flags in kern_* functions. Some syscalls checked for invalid AT_* flags in sys_* and others in kern_*. Reviewed by: kib Obtained from: CheriBSD Sponsored by: The University of Cambridge, Google Inc. Differential Revision: https://reviews.freebsd.org/D32864	2021-11-09 09:42:12 -08:00
Rick Macklem	f0c9847a6c	vfs: Add "ioflag" and "cred" arguments to VOP_ALLOCATE When the NFSv4.2 server does a VOP_ALLOCATE(), it needs the operation to be done for the RPC's credential and not td_ucred. It also needs the writing to be done synchronously. This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE() and modifies vop_stdallocate() to use these arguments. The VOP_ALLOCATE.9 man page will be patched separately. Reviewed by: khng, kib Differential Revision: https://reviews.freebsd.org/D32865	2021-11-06 13:26:43 -07:00
Kyle Evans	6a8ea6d174	sched: split sched_ap_entry() out of sched_throw() sched_throw() can no longer take a NULL thread, APs enter through sched_ap_entry() instead. This completely removes branching in the common case and cleans up both paths. No functional change intended. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D32829	2021-11-05 15:45:51 -05:00
Hans Petter Selasky	dd31400c3c	Factor out flags preserved during mbuf demote into a separate define. This define will later on be used by coming TLS RX hardware offload patches. No functional change intended. Reviewed by: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking	2021-11-04 18:53:49 +01:00
Allan Jude	c441592a0e	Allow kern.ipc.maxsockets to be set to current value without error Normally setting kern.ipc.maxsockets returns EINVAL if the new value is not greater than the previous value. This can cause spurious error messages when sysctl.conf is processed multiple times, or when automation systems try to ensure the sysctl is set to the correct value. If the value is unchanged, then just do nothing. PR: 243532 Reviewed by: markj MFC after: 3 days Sponsored by: Modirum MDPay Sponsored by: Klara Inc. Differential Revision: https://reviews.freebsd.org/D32775	2021-11-04 12:56:09 +00:00
Konstantin Belousov	7ac82c96fe	proc_get_binpath(): provide syntaxically correct value for unused NDINIT arg Sponsored by: The FreeBSD Foundation MFC after: 3 days	2021-11-04 02:55:33 +02:00
Warner Losh	072d5b98c4	sysbeep: Adjust interface to take a duration as a sbt Change the 'period' argument to 'duration' and change its type to sbintime_t so we can more easily express different durations. Reviewed by: tsoome, glebius Differential Revision: https://reviews.freebsd.org/D32619	2021-11-03 16:03:51 -06:00
Kyle Evans	589aed00e3	sched: separate out schedinit_ap() schedinit_ap() sets up an AP for a later call to sched_throw(NULL). Currently, ULE sets up some pcpu bits and fixes the idlethread lock with a call to sched_throw(NULL); this results in a window where curthread is setup in platforms' init_secondary(), but it has the wrong td_lock. Typical platform AP startup procedure looks something like: - Setup curthread - ... other stuff, including cpu_initclocks_ap() - Signal smp_started - sched_throw(NULL) to enter the scheduler cpu_initclocks_ap() may have callouts to process (e.g., nvme) and attempt to sched_add() for this AP, but this attempt fails because of the noted violated assumption leading to locking heartburn in sched_setpreempt(). Interrupts are still disabled until cpu_throw() so we're not really at risk of being preempted -- just let the scheduler in on it a little earlier as part of setting up curthread. Reviewed by: alfredo, kib, markj Triage help from: andrew, markj Smoke-tested by: alfredo (ppc), kevans (arm64, x86), mhorne (arm) Differential Revision: https://reviews.freebsd.org/D32797	2021-11-03 15:54:59 -05:00
Mark Johnston	175d3380a3	amd64: Deduplicate routines for expanding KASAN/KMSAN shadow maps When working on the ports these functions were slightly different, but now there's no reason for them to be separate. No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-11-03 12:36:02 -04:00
Konstantin Belousov	be10c0a910	fexecve(2): allow O_PATH file descriptors opened without O_EXEC This improves compatibility with Linux. Noted by: Drew DeVault <sir@cmpwn.com> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32821	2021-11-03 18:00:42 +02:00
Konstantin Belousov	02de91d740	proc_get_binpath(): return empty string instead of NULL for strange case where queried process does not have text. Reported by: Michael Butler <imb@protected-networks.net> Sponsored by: The FreeBSD Foundation MFC after: 3 days	2021-11-03 17:30:10 +02:00
Konstantin Belousov	e4ce23b238	fexecve(2): restore the attempts to calculate the executable path vn_fullpath() call was not converted to pass newtextvp, instead it used imgp->vp which is still NULL there. As result vn_fullpath() always returned EINVAL and execpath was recorded from the value of arg0. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2021-11-03 15:10:22 +02:00
Kyle Evans	7771f2a0c9	kern: physmem: improve region coalescing logic The existing logic didn't take into account newly inserted mappings wholly contained by an existing region (or vice versa), nor did it account for weird overlap scenarios. The latter is probably unlikely to happen, but the former may happen in UEFI: BootServicesData allocated within a large chunk of ConventionalMemory. This situation blows up vm initialization. While we're here, remove the "exact match" logic as it's likely wrong; if an exact match exists with conflicting flags, for instance, then we should probably be doing something else. The new logic takes into account exact matches as part of the overlapping efforts. Reviewed by: kib, mhorne (both earlier version) Differential Revision: https://reviews.freebsd.org/D32701	2021-11-03 02:32:46 -05:00
Konstantin Belousov	f34fc6ba06	Extract proc_get_binpath() from sysctl_kern_proc_pathname() Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32738	2021-10-31 03:05:14 +02:00
Edward Tomasz Napierala	8bbc0600cc	linux: Add additional ptracestop only if the debugger is Linux In `6e66030c4c`, additional ptracestop was added in order to implement PTRACE_EVENT_EXEC. Make it only apply to cases where the debugger is a Linux processes; native FreeBSD debuggers can trace Linux processes too, but they don't expect that additonal ptracestop. Fixes: `6e66030c4c` Reported By: kib Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32726	2021-10-30 09:54:17 +01:00
Mark Johnston	26f76aea2d	timecounter: Load the currently selected tc once in tc_windup() Reported by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32729	2021-10-29 14:30:15 -04:00
Sebastian Huber	ae750fbac7	kern_tc.c: Scaling/large delta recalculation This change is a slight performance optimization for systems with a slow 64-bit division. The th->th_scale and th->th_large_delta values only depend on the timecounter frequency and the th->th_adjustment. The timecounter frequency of a timehand only changes when a new timecounter is activated for the timehand. The th->th_adjustment is only changed by the NTP second update. The NTP second update is not done for every call of tc_windup(). Move the code block to recalculate the scaling factor and the large delta of a timehand to the new helper function recalculate_scaling_factor_and_large_delta(). Call recalculate_scaling_factor_and_large_delta() when a new timecounter is activated and a NTP second update occurred. MFC after: 1 week	2021-10-29 00:31:14 +03:00
Konstantin Belousov	1c69690319	Unmap shared page manually before doing vm_map_remove() on exit or exec This allows the pmap_remove(min, max) call to see empty pmap and exploit empty pmap optimization. Reviewed by: markj Tested by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32569	2021-10-28 22:01:59 +03:00
Konstantin Belousov	4d675b80f0	i386: fix struct proc layout asserts after `351d5f7fc5` Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-28 21:56:21 +03:00
Konstantin Belousov	ee92c8a842	sysctl kern.proc.procname: report right hardlink name PR: 248184 Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:50:02 +03:00
Konstantin Belousov	351d5f7fc5	exec: store parent directory and hardlink name of the binary in struct proc While doing it, also move all the code to resolve pathnames and obtain text vp and dvp, into single place. Besides simplifying the code, it avoids spurious vnode relocks and validates the explanation why a transient text reference on the script vnode is not harmful. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:56 +03:00
Konstantin Belousov	0c10648fbb	exec: provide right hardlink name in AT_EXECPATH For this, use vn_fullpath_hardlink() to resolve executable name for execve(2). This should provide the right hardlink name, used for execution, instead of random hardlink pointing to this binary. Also this should make the AT_EXECNAME reliable for execve(2), since kernel only needs to resolve parent directory path, which should always succeed (except pathological cases like unlinking a directory). PR: 248184 Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:31 +03:00
Konstantin Belousov	9a0bee9f6a	Make vn_fullpath_hardlink() externally callable Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:26 +03:00
Konstantin Belousov	15bf81f354	struct image_params: use bool type for boolean members Also re-align comments, and group booleans and char members together. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:21 +03:00
Konstantin Belousov	9d58243fbc	do_execve(): switch boolean locals to use bool type Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:16 +03:00
Konstantin Belousov	143dba3a91	kern_exec.c: style Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:10 +03:00
Mateusz Guzik	628c3b307f	cache: only let non-dir descriptors through when doing EMPTYPATH lookups Otherwise things like realpath against a file and '.' end up with an illegal state of having a regular vnode for the parent. Reported by: syzbot+9aa5439dd9c708aeb1a8@syzkaller.appspotmail.com	2021-10-27 18:27:47 +00:00
Mark Johnston	71f31d784e	rmslock: Update td_locks during lock and unlock operations Reviewed by: mjg MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32692	2021-10-27 11:18:13 -04:00
Gordon Bergling	70de1003da	jail(8): Fix a few common typos in source code comments - s/phyiscal/physical/ MFC after: 3 days	2021-10-27 06:16:06 +02:00
Kirk McKusick	dfd704b7fb	Allow biodone() to be used as a completion routine. An ordered series of BIO_READ and BIO_WRITE operations are typically done as: while (work to do) { setup bp for I/O g_io_request(bp, consumer); biowait(bp); } Here you need to have biodone() called at the completion of the I/O to set the BIO_DONE flag and awaken the biowait(). The obvious way to do this would be to set bio_done = biodone, but biodone() will only take the desired action if bio_done == NULL. The relevant code at the end of biodone() is: done = bp->bio_done; if (done == NULL) { mtxp = mtx_pool_find(mtxpool_sleep, bp); mtx_lock(mtxp); bp->bio_flags \|= BIO_DONE; wakeup(bp); mtx_unlock(mtxp); } else done(bp); This code would infinitely recurse if biodone() is specified as the routine to use at completion. So before this change, a wrapper done function had to be written: static void g_io_done(struct bio *bp) { bp->bio_done = NULL; biodone(bp); bp->bio_done = g_io_done; } This commit changes if (done == NULL) to if (done == NULL \|\| done == biodone) which eliminates the need for the wrapper function. Reviewed by: kib Sponsored by: Netflix	2021-10-23 14:11:57 -07:00
Edward Tomasz Napierala	6e66030c4c	linux: implement PTRACE_EVENT_EXEC This fixes strace(1) from Ubuntu Focal. Reviewed By: jhb Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32367	2021-10-23 19:46:26 +01:00
Konstantin Belousov	3b5331dd8d	uipc_shm: silent warnings about write-only variables in largepage code In shm_largepage_phys_populate(), the result from vm_page_grab() is only needed for assertion. In shm_dotruncate_largepage(), there is a commented-out prototype code for managed largepages. The oldobjsz is saved for its sake, so mark the variable as __unused directly. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-21 21:40:46 +03:00
Konstantin Belousov	3d2778515a	sig_ast_checksusp(): mark the local p as __diagused It is only used to assert that the (current) process is locked Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-21 21:40:46 +03:00
Konstantin Belousov	6776747a0e	subr_firmware.c::unloadentry(): remote write-only variable The function ignores result returned by linker_release_module(). The FW_UNLOAD flag on the file is cleared, so even on error it would not be tried again. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-21 21:40:46 +03:00
Konstantin Belousov	993446638c	alq_open_flags(): mark local td variable as unused It is passed to the NDINIT() macro which ignores the thread argument for some time. Sponsored by: The FreeBSD Foundation	2021-10-21 21:40:46 +03:00
Konstantin Belousov	bded8fa300	umtxq_requeue: remove write-only variable uh2 umtxq_queue_lookup() does not change state. It is redone inside umtxq_insert() later, anyway. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-21 21:40:46 +03:00
John Baldwin	96668a81ae	ktls: Always create a software backend for receive sessions. A future change to TOE TLS will require a software fallback for the first few TLS records received. Future support for NIC TLS on receive will also require a software fallback for certain cases. Reviewed by: gallatin, hselasky Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32566	2021-10-21 09:37:17 -07:00
John Baldwin	c57dbec69a	ktls: Add a routine to query information in a receive socket buffer. In particular, ktls_pending_rx_info() determines which TLS record is at the end of the current receive socket buffer (including not-yet-decrypted data) along with how much data in that TLS record is not yet present in the socket buffer. This is useful for future changes to support NIC TLS receive offload and enhancements to TOE TLS receive offload. Those use cases need a way to synchronize a state machine on the NIC with the TLS record boundaries in the TCP stream. Reviewed by: gallatin, hselasky Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32564	2021-10-21 09:36:29 -07:00
Mark Johnston	84c3922243	Convert consumers to vm_page_alloc_noobj_contig() Remove now-unneeded page zeroing. No functional change intended. Reviewed by: alc, hselasky, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32006	2021-10-19 21:22:56 -04:00
Mark Johnston	a4667e09e6	Convert vm_page_alloc() callers to use vm_page_alloc_noobj(). Remove page zeroing code from consumers and stop specifying VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to simply use VM_ALLOC_WAITOK. Similarly, convert vm_page_alloc_domain() callers. Note that callers are now responsible for assigning the pindex. Reviewed by: alc, hselasky, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31986	2021-10-19 21:22:56 -04:00
Konstantin Belousov	c7f38a2df1	procctl: stop using SA_LOCKED, define local enum Using SA_LOCKED constants breaks !INVARIANT builds Reported by: cy Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-20 00:25:19 +03:00
Konstantin Belousov	49db81aa05	kern_procctl: skip zombies for process group operations When iterating over the process group members, skip zombies same as it is done by pfind() for single-process operation. Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:34 +03:00
Konstantin Belousov	3692877a6c	kern_procctl.c: use td->td_proc instead of curproc Suggested by: markj Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:34 +03:00
Konstantin Belousov	f5bb6e5a6d	procctl: actually require debug privileges over target for state control over TRACE, TRAPCAP, ASLR, PROTMAX, STACKGAP, NO_NEWPRIVS, and WXMAP. Reported by: emaste Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:34 +03:00
Konstantin Belousov	1c4dbee5dd	procctl: make it possible to specify that some operations require debug privilege over the target Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:34 +03:00
Konstantin Belousov	32026f5983	sys_procctl(): zero the data buffer once, on syscall entry and remove zeroing of it from specific functions. This way it is guaranteed that we do not leak kernel data. Suggested by: markj Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:34 +03:00
Konstantin Belousov	56d5323b4d	sys_procctl(): use table data to do copyin/copyout Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:34 +03:00
Konstantin Belousov	68dc5b381a	kern_procctl_single(): convert to use table data Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:34 +03:00
Konstantin Belousov	34f39a8c0e	procctl: convert PDEATHSIG_CTL/STATUS to regular kern_procctl_single() cases Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:34 +03:00
Konstantin Belousov	f833ab9dd1	procctl(2): add consistent shortcut P_ID:0 as curproc Reported by: bdrewery, emaste Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:34 +03:00
Konstantin Belousov	7ae879b14a	kern_procctl(): convert the function to be table-driven Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:34 +03:00
Konstantin Belousov	31faa565ed	sys_procctl(2): remove sysproto and argused Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32513	2021-10-19 23:04:33 +03:00
Mark Johnston	621fd9dcb2	timecounter: Lock the timecounter list Timecounter registration is dynamic, i.e., there is no requirement that timecounters must be registered during single-threaded boot. Loadable drivers may in principle register timecounters (which can be switched to automatically). Timecounters cannot be unregistered, though this could be implemented. Registered timecounters belong to a global linked list. Add a mutex to synchronize insertions and the traversals done by (mpsafe) sysctl handlers. No functional change intended. Reviewed by: imp, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32511	2021-10-18 09:56:59 -04:00
Mark Johnston	81f2e9063d	signal: Add SIG_FOREACH and refactor issignal() Add a SIG_FOREACH macro that can be used to iterate over a signal set. This is a bit cleaner and more efficient than calling sig_ffs() in a loop. The implementation is based on BIT_FOREACH_ISSET(), except that the bitset limbs are always 32 bits wide, and signal sets are 1-indexed rather than 0-indexed like bitset(9) sets. issignal() cannot really be modified to use SIG_FOREACH() directly. Take this opportunity to split the function into two explicit loops. I've always found this function hard to read and think that this change is an improvement. Remove sig_ffs(), nothing uses it now. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32473	2021-10-18 09:56:58 -04:00
Colin Percival	52e125c2bd	TSLOG: Report final execname, not first In cases such as daemons launched via limits(1), a process may call exec multiple times; the last name of the last binary executed is usually (always?) more informative. Fixes: `46dd801acb` Add userland boot profiling to TSLOG Sponsored by: https://www.patreon.com/cperciva	2021-10-17 13:36:38 -07:00
Jessica Clarke	682c00a6ce	riscv: Implement pmap_mapdev_attr This is needed for LinuxKPI's _ioremap_attr. This reuses the generic implementation introduced for aarch64, and itself requires implementing pmap_kenter, which is trivial to do given riscv currently treats all mapping attributes the same due to the Svpbmt extension not yet being ratified and in hardware. Reviewed by: markj, mhorne MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D32445	2021-10-17 15:31:35 +01:00
Mateusz Guzik	1045352f15	cache: only assert on flags when dealing with EMPTYPATH Reported by: syzbot+bd48ee0843206a09e6b8@syzkaller.appspotmail.com Fixes: `7dd419cabc` ("cache: add empty path support")	2021-10-17 08:42:47 +00:00
Mateusz Guzik	7dd419cabc	cache: add empty path support This avoids spurious drop offs as EMPTY is passed regardless of the actual path name. Pushign the work inside the lookup instead of just ignorign the flag allows avoid checking for empty pathname for all other lookups.	2021-10-16 20:08:37 +00:00
Colin Percival	46dd801acb	Add userland boot profiling to TSLOG On kernels compiled with 'options TSLOG', record for each process ID: * The timestamp of the fork() which creates it and the parent process ID, * The first path passed to execve(), if any, * The first path resolved by namei, if any, and * The timestamp of the exit() which terminates the process. Expose this information via a new sysctl, debug.tslog_user. On kernels lacking 'options TSLOG' (the default), no information is recorded and the sysctl does not exist. Note that recording namei is needed in order to obtain the names of rc.d scripts being launched, as the rc system sources them in a subshell rather than execing the scripts. With this commit it is now possible to generate flamecharts of the entire boot process from the start of the loader to the end of /etc/rc. The code needed to perform this processing is currently found in github: https://github.com/cperciva/freebsd-boot-profiling Reviewed by: mhorne Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D32493	2021-10-16 11:47:34 -07:00
Dawid Gorecki	a97d697122	kern_exec: Add kern.stacktop sysctl. With stack gap enabled top of the stack is moved down by a random amount of bytes. Because of that some multithreaded applications which use kern.usrstack sysctl to calculate address of stacks for their threads can fail. Add kern.stacktop sysctl, which can be used to retrieve address of the stack after stack gap is applied to it. Returns value identical to kern.usrstack for processes which have no stack gap. Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31897	2021-10-15 10:21:55 +02:00
Dawid Gorecki	889b56c8cd	setrlimit: Take stack gap into account. Calling setrlimit with stack gap enabled and with low values of stack resource limit often caused the program to abort immediately after exiting the syscall. This happened due to the fact that the resource limit was calculated assuming that the stack started at sv_usrstack, while with stack gap enabled the stack is moved by a random number of bytes. Save information about stack size in struct vmspace and adjust the rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max, then the value is truncated to rlim_max. PR: 253208 Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31516	2021-10-15 10:21:47 +02:00
John Baldwin	a72ee35564	ktls: Defer creation of threads and zones until first use. Run ktls_init() when the first KTLS session is created rather than unconditionally during boot. This avoids creating unused threads and allocating unused resources on systems which do not use KTLS. Reviewed by: gallatin, markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32487	2021-10-14 15:48:34 -07:00
Konstantin Belousov	1adebca1fc	Style Sponsored by: The FreeBSD Foundation MFC after: 3 days	2021-10-14 23:07:32 +03:00
Brooks Davis	04c91ac48a	selsocket: handle sopoll() errors correctly Without this change, unmounting smbfs filesystems with an INVARIANTS kernel would panic after `10e64782ed`. Found by: markj Reviewed by: markj, jhb Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D32492	2021-10-14 00:43:48 +01:00
John Baldwin	9f03d2c001	ktls: Ensure FIFO encryption order for TLS 1.0. TLS 1.0 records are encrypted as one continuous CBC chain where the last block of the previous record is used as the IV for the next record. As a result, TLS 1.0 records cannot be encrypted out of order but must be encrypted as a FIFO. If the later pages of a sendfile(2) request complete before the first pages, then TLS records can be encrypted out of order. For TLS 1.1 and later this is fine, but this can break for TLS 1.0. To cope, add a queue in each TLS session to hold TLS records that contain valid unencrypted data but are waiting for an earlier TLS record to be encrypted first. - In ktls_enqueue(), check if a TLS record being queued is the next record expected for a TLS 1.0 session. If not, it is placed in sorted order in the pending_records queue in the TLS session. If it is the next expected record, queue it for SW encryption like normal. In addition, check if this new record (really a potential batch of records) was holding up any previously queued records in the pending_records queue. Any of those records that are now in order are also placed on the queue for SW encryption. - In ktls_destroy(), free any TLS records on the pending_records queue. These mbufs are marked M_NOTREADY so were not freed when the socket buffer was purged in sbdestroy(). Instead, they must be freed explicitly. Reviewed by: gallatin, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D32381	2021-10-13 12:30:15 -07:00
John Baldwin	a63752cce6	ktls: Reject attempts to enable AES-CBC with TLS 1.3. AES-CBC cipher suites are not supported in TLS 1.3. Reported by: syzbot+ab501c50033ec01d53c6@syzkaller.appspotmail.com Reviewed by: tuexen, markj Differential Revision: https://reviews.freebsd.org/D32404	2021-10-13 12:12:58 -07:00
Mark Johnston	03d5820f73	mount: Check for !VDIR mount points before handling -o emptydir To implement -o emptydir, vfs_emptydir() checks that the passed directory is empty. This should be done after checking whether the vnode is of type VDIR, though, or vfs_emptydir() may end up calling VOP_READDIR on a non-directory. Reported by: syzbot+4006732c69fb0f792b2c@syzkaller.appspotmail.com Reviewed by: kib, imp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32475	2021-10-13 09:33:35 -04:00
John Baldwin	d1b6fef075	Stop creating socket aio kprocs during boot. Create the initial pool of kprocs on demand when the first socket AIO request is submitted instead. The pool of kprocs used for other AIO requests is similarly created on first use. Reviewed by: asomers Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32468	2021-10-12 14:03:07 -07:00
Kyle Evans	7259ca3104	fifos: delegate unhandled kqueue filters to underlying filesystem This gives the vfs layer a chance to provide handling for EVFILT_VNODE, for instance. Change pipe_specops to use the default vop_kqfilter to accommodate fifoops that don't specify the method (i.e. all in-tree). Based on a patch by Jan Kokemüller. PR: 225934 Reviewed by: kib, markj (both pre-KASSERT) Differential Revision: https://reviews.freebsd.org/D32271	2021-10-12 02:43:07 -05:00
Greg V	98dae405de	O_PATH: allow vfs_extattr syscalls These calls do operate on vnodes only, not file contents. This is useful for e.g. the xdg-document-portal fuse filesystem. Reviewed by: kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D32438	2021-10-11 20:09:49 +03:00
Mateusz Guzik	2b68eb8e1d	vfs: remove thread argument from VOP_STAT and fo_stat.	2021-10-11 13:22:32 +00:00
Mateusz Guzik	b4a58fbf64	vfs: remove cn_thread It is always curthread. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D32453	2021-10-11 13:21:47 +00:00
Andrew Turner	a85ce4ad72	Add pmap_change_prot on arm64 Support changing the protection of preloaded kernel modules by implementing pmap_change_prot on arm64 and calling it from preload_protect. Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32026	2021-10-11 10:26:45 +01:00
Mateusz Guzik	93e0523499	vfs: add predicts to getvnode and getvnode_path	2021-10-10 18:24:29 +00:00
Mateusz Guzik	a0558fe90d	Retire code added to support CloudABI CloudABI was removed in `cf0ee8738e`	2021-10-10 18:24:29 +00:00
Konstantin Belousov	5fb54d2fc8	readlinkat(2): allow O_PATH fd PR: 258856 Reported by: ashish Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32390	2021-10-09 22:31:37 +03:00
Mark Johnston	fa9da1f590	timecounter: Let kern.timecounter.stepwarnings be set as a tunable MFC after: 1 week	2021-10-09 12:34:06 -04:00
Konstantin Belousov	b5cadc643e	Make core dump writes interruptible with SIGKILL This can be disabled by sysctl kern.core_dump_can_intr Reported and tested by: pho Reviewed by: imp, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32313	2021-10-08 03:21:43 +03:00
Konstantin Belousov	244ab56611	Add curproc_sigkilled() Function returns an indicator that the process was killed with SIGKILL Reviewed by: imp, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32313	2021-10-08 03:21:43 +03:00
Konstantin Belousov	dc2d0899bb	kern_sig.c: Remove unused SIGPROP_CANTMASK Reviewed by: imp, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32313	2021-10-08 03:21:42 +03:00
Mark Johnston	880b670c6f	malloc: Unmark KASAN redzones if the full allocation size was requested Consumers that want the full allocation size will typically access the full buffer, so mark the entire allocation as valid to avoid useless KASAN reports. Sponsored by: The FreeBSD Foundation	2021-10-06 16:09:41 -04:00
Konstantin Belousov	9b86d3e5de	When queuing ignored signal, only abort target thread' sleep if it is inside sigwait() Reported and tested by: trasz Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32252	2021-10-06 17:05:22 +03:00
Konstantin Belousov	f17eb93d55	When sending ignored signal, arrange for zero return code from sleep Otherwise consumers get unexpected EINTR errors without seeing a properly discarded signal. Reported and tested by: trasz Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32252	2021-10-06 17:05:22 +03:00
Konstantin Belousov	b599982b65	Move td_pflags2 TDP2_SIGWAIT to td_flags TDF_SIGWAIT The flag should be accessible from non-current threads. Reviewed by: markj Tested by: trasz Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32252	2021-10-06 17:05:22 +03:00
Alexander Motin	7835b2cb4a	sbuf(9): Microoptimize sbuf_put_byte() This function is actively used by sbuf_vprintf(), so this simple inlining in half reduces time of kern.geom.confxml generation. MFC after: 2 weeks Sponsored by: iXsystem, Inc.	2021-10-05 14:47:38 -04:00
Alexander Motin	6df1359e55	sleepqueue(9): Remove sbinuptime() from sleepq_timeout(). Callout c_time is always bigger or equal than the scheduled time. It is also smaller than sbinuptime() and can't change while the callback is running. So we reliably can use it instead of sbinuptime() here. In case there was a race and the callout was rescheduled to the later time, the callback will be called again. According to profiles it saves ~5% of the timer interrupt time even with fast TSC timecounter. MFC after: 1 month	2021-10-02 21:08:41 -04:00
Alexander Motin	1c119e173d	sched_ule(4): Fix possible significance loss. Before this change kern.sched.interact sysctl setting above 32 gave all interactive threads identical priority of PRI_MIN_INTERACT due to ((PRI_MAX_INTERACT - PRI_MIN_INTERACT + 1) / sched_interact) turning zero. Setting the sysctl lower reduced the range of used priority levels up to half, that is not great either. Change of the operations order should fix the issue, always using full range of priorities, while overflow is impossible there since both score and priority values are small. While there, make the variables unsigned as they really are. MFC after: 1 month	2021-10-02 00:09:45 -04:00
Mateusz Guzik	c9536389d7	vfs: hoist cn_thread assert in namei Making it condtional on whether ktrace happens to be enabled makes no sense.	2021-10-01 21:56:29 +00:00
Gleb Smirnoff	a37e4fd1ea	Re-style `dfcef87714` to keep the code and variables related to listening sockets separated from code for generic sockets. No objection: markj	2021-10-01 13:38:24 -07:00
Justin Hibbits	63cb9308a7	Fix segment size in compressing core dumps A core segment is bounded in size only by memory size. On 64-bit architectures this means a segment can be much larger than 4GB. However, compress_chunk() takes only a u_int, clamping segment size to 4GB-1, resulting in a truncated core. Everything else, including the compressor internally, uses size_t, so use size_t at the boundary here. This dates back to the original refactor back in 2015 (r279801 / `aa14e9b7`). MFC after: 1 week Sponsored by: Juniper Networks, Inc.	2021-10-01 14:16:33 -05:00
Kyle Evans	2f4dbe279f	kqueue: fix recent assertion NOTE_ABSTIME may also have a zero timeout, which indicates that we should still fire immediately as an absolute time in the past. A test has been added for this one as well. Fixes: `9c999a259f` ("kqueue: don't arbitrarily restrict long-past...") Point hat: kevans Reported by: syzbot+1c8d1154f560b3930042@syzkaller.appspotmail.com	2021-10-01 13:17:30 -05:00
Kyle Evans	9c999a259f	kqueue: don't arbitrarily restrict long-past values for NOTE_ABSTIME NOTE_ABSTIME values are converted to values relative to boottime in filt_timervalidate(), and negative values are currently rejected. We don't reject times in the past in general, so clamp this up to 0 as needed such that the timer fires immediately rather than imposing what looks like an arbitrary restriction. Another possible scenario is that the system clock had to be adjusted by ~minutes or ~hours and we have less than that in terms of uptime, making a reasonable short-timeout suddenly invalid. Firing it is still a valid choice in this scenario so that applications can at least expect a consistent behavior. Reviewed by: kib, markj Discussed with: allanjude Differential Revision: https://reviews.freebsd.org/D32230	2021-09-30 21:31:24 -05:00
Mateusz Guzik	85c855d31b	fd: add pwd_hold_proc	2021-09-30 12:49:51 +02:00
Mitchell Horne	ab4ed843a3	minidump: De-duplicate the progress bar The implementation of the progress bar is simple, but duplicated for most minidump implementations. Extract the common bits to kern_dump.c. Ensure that the bar is reset with each subsequent dump; this was only done on some platforms previously. Reviewed by: markj MFC after: 2 weeks Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D31885	2021-09-29 16:42:21 -03:00
Jamie Gritton	747a47261e	Fix error return of kern.ipc.posix_shm_list, which caused it (and thus "posixshmcontrol ls") to fail for all jails that didn't happen to own the last shm object in the list.	2021-09-29 10:20:36 -07:00
Mitchell Horne	800e74955d	boot(9): update to match reality This function was renamed to kern_reboot() in 2010, but the man page has failed to keep in sync. Bring it up to date on the rename, add the shutdown hooks to the synopsis, and document the (obvious) fact that kern_reboot() does not return. Fix an outdated reference to the old name in kern_reboot(), and leave a reference to the man page so future readers might find it before any large changes. Reviewed by: imp, markj MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32085	2021-09-28 11:36:09 -03:00
Kirk McKusick	1c8d670cb6	Bring the tags and links entries for amd64 up to date. MFC after: 1 week Sponsored by: Netflix	2021-09-27 20:04:51 -07:00
Elliott Mitchell	bcddaadbef	rman: fix overflow in rman_reserve_resource_bound() If the default range of [0, ~0] is given, then (~0 - 0) + 1 == 0. This in turn will cause any allocation of non-zero size to fail. Zero-sized allocations are prohibited, so add a KASSERT to this effect. History indicates it is part of the original rman code. This bug may in fact be older than some contributors. Reviewed by: mhorne MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30280	2021-09-27 14:38:26 -03:00
Alexander Motin	08063e9f98	sched_ule(4): Fix hang with steal_thresh < 2. `e745d729be` caused infinite loop with interrupts disabled in load stealing code if steal_thresh set below 2. Such configuration should not generally be used, but appeared some people are using it to workaround some problems. To fix the problem explicitly pass to sched_highest() minimum number of transferrable threads, supported by the caller, instead of guessing. MFC after: 25 days	2021-09-26 12:03:05 -04:00
Gordon Bergling	8771ff7538	jail(9): Fix a typo in a comment - s/erorr/error/ MFC after: 3 days	2021-09-26 15:17:41 +02:00
Yoshihiro Ota	cb17f4a6bd	kern_ctf: Use zlib's uncompress function for simpler code. Reviewed by: markj, delphij MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D21531	2021-09-25 23:33:00 -07:00
Mateusz Guzik	d71e1a883c	fifo: support flock This evens it up with Linux. Original patch by: Greg V <greg@unrelenting.technology> Differential Revision: https://reviews.freebsd.org/D24255#565302	2021-09-25 14:58:31 +00:00
Konstantin Belousov	71d31f1cf6	malloc_aligned(9): allow zero size and alignment For alignment we do not need to do anything to make it operational. For size, upgrade zero sized request to one byte so that we do not request insane amount of memory for placeholder. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32127	2021-09-25 15:58:12 +03:00
Gordon Bergling	2aad906266	ubsan: Fix a typo in an error message - s/asumption/assumption/ Obtained from: NetBSD MFC after: 1 week	2021-09-25 11:47:24 +02:00
Alexander Motin	d3a8f98acb	Make CPU children explicitly share parent unit numbers. Before this device unit number match was coincidental and broke if I disabled some CPU device(s). Aside of cosmetics, for some drivers (may be considered broken) it caused talking to wrong CPUs.	2021-09-24 23:31:51 -04:00
Alexander Motin	f73c2bbf81	bus: Cleanup device_probe_child() When device driver probe method returns 0, i.e. absolute priority, do not remove its class from the device just to set it back few lines later, that may change the device unit number, etc. and after which we'd better call the probe again. If during search we found some driver with absolute priority, we do not need to set device driver and class since we haven't removed them before. It should not happen, but if second probe method call failed, remove the driver and possibly the class from the device as it was when we started. Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D32125	2021-09-24 20:34:56 -04:00
Warner Losh	67a9e76da6	bus: Fix LINT / BUS_DEBUG build Fix `0389e9be63` for LINT built. Removed an arg only from code under BUS_DEBUG w/o rebuilding LINT... Sponsored by: Netflix Fixes: `0389e9be63`	2021-09-24 14:04:39 -06:00
Warner Losh	0389e9be63	bus: retire DF_REBID I did DF_REBID to allow for 'hoover' drivers that would attach to otherwise unattached devices in the tree. This notion didn't catch on as it was tricky to make work well and it was easier to just publish a /dev node of some flavor by the parent device. It's been nothing but dead weight for a long time. Reviewed by: mav Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D32056	2021-09-24 12:15:34 -06:00
Nathaniel Wesley Filardo	0321a7990b	kqueue: Add EV_KEEPUDATA flag When this flag is set, operations that update an existing kevent will not change the udata field. This can be used to NOTE_TRIGGER or EV_{EN,DIS}ABLE events without overwriting the stashed pointer. Reviewed by: Domagoj Stolfa <domagoj.stolfa@gmail.com> Obtained from: CheriBSD Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D30286	2021-09-23 17:31:39 -07:00
Konstantin Belousov	45c2c7c484	aio_aqueue(): avoid ucred leak on failure path PR: 258698 Submitted by: sigsys@gmail.com MFC after: 1 week	2021-09-24 03:18:34 +03:00
Alexander Motin	ef50d5fbc3	x86: Add NUMA nodes into CPU topology. Depending on hardware, NUMA nodes may match last level caches, or they may be above them (AMD Zen 2/3) or below (Intel Xeon w/ SNC). This information is provided by ACPI instead of CPUID, and it is provided for each CPU individually instead of mask widths, but this code should be able to properly handle all the above cases. This change should immediately allow idle stealing in sched_ule(4) to prefer load from NUMA-local CPUs to remote ones when the node does not match LLC. Later we may think of how to better handle it on sched_pickcpu() side. MFC after: 1 month	2021-09-23 14:31:38 -04:00
Wojciech Macek	7bc13692a2	hwpmc: fix performance issues Differential revision: https://reviews.freebsd.org/D32025 Avoid using atomics as it_wait is guarded by td_lock. Report threshold calculation is done only if at least one PMC hook is installed Fixes: * avoid unnecessary branching (if frame != null ...) by having PMC_HOOK_INSTALLED_ANY condition on the top of them, which should hint the core not to execute speculatively anything which us underneath; * access intr_hwpmc_waiting_report_threshold cacheline only if at least one hook is loaded;	2021-09-23 07:15:42 +02:00
Alexander Motin	884f38590c	Fix false device_set_unit() error. It should silently succeed if the current unit number is the same as requested, not fail immediately. MFC after: 1 week	2021-09-22 08:44:39 -04:00
Alexander Motin	8db1669959	Fix build without SMP. MFC after: 1 month	2021-09-21 22:13:33 -04:00
Alexander Motin	e745d729be	sched_ule(4): Improve long-term load balancer. Before this change long-term load balancer was unable to migrate running threads, only ones waiting on run queues. But with growing number of CPU cores it is quite typical now for system to not have many waiting threads. But same time if due to some coincidence two long-running CPU-bound threads ended up sharing same physical CPU core, they could suffer from the SMT penalty indefinitely, and the load balancer couldn't help. Improve that by teaching the load balancer to hint running threads to migrate by marking them with TDF_NEEDRESCHED and new TDF_PICKCPU flag, making sched_pickcpu() to search for better CPU later, when it is convenient. Fix CPU search logic when balancing to limit round-robin migrations in case of almost equal load to the group of physical cores. The previous code bounced threads across all the system, that should be pretty bad for caches and NUMA affinity, while additional fairness was almost invisible, diminishing with number of cores in the group. MFC after: 1 month	2021-09-21 18:19:20 -04:00
Konstantin Belousov	397f188936	Remove SV_CAPSICUM It was only needed for cloudabi Reviewed by: emaste Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D31923	2021-09-22 00:18:44 +03:00
Konstantin Belousov	cf0ee8738e	Drop cloudabi According to https://github.com/NuxiNL/cloudlibc: CloudABI is no longer being maintained. It was an awesome experiment, but it never got enough traction to be sustainable. There is no reason to keep it in FreeBSD. Approved by: ed (private mail) Reviewed by: emaste Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D31923	2021-09-22 00:18:44 +03:00
Alexander Motin	bd84094a51	sched_ule(4): Fix interactive threads stealing. In scenarios when first thread in the queue can migrate to specified CPU, but later ones can't runq_steal_from() incorrectly returned NULL. MFC after: 2 weeks	2021-09-21 16:03:32 -04:00
Konstantin Belousov	bd9e0f5df6	amd64: eliminate td_md.md_fpu_scratch For signal send, copyout from the user FPU save area directly. For sigreturn, we are in sleepable context and can do temporal allocation of the transient save area. We cannot copying from userspace directly to user save area because XSAVE state needs to be validated, also partial copyins can corrupt it. Requested by: jhb Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31954	2021-09-21 20:20:15 +03:00
Konstantin Belousov	df8dd6025a	amd64: stop using top of the thread' kernel stack for FPU user save area Instead do one more allocation at the thread creation time. This frees a lot of space on the stack. Also do not use alloca() for temporal storage in signal delivery sendsig() function and signal return syscall sys_sigreturn(). This saves equal amount of space, again by the cost of one more allocation at the thread creation time. A useful experiment now would be to reduce KSTACK_PAGES. Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31954	2021-09-21 20:20:15 +03:00
Konstantin Belousov	2933a7ca03	aio_fsync_vnode: handle ERELOOKUP after VOP_FSYNC() Reported by: tmunro Reviewed by: jhb, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32023	2021-09-20 21:40:17 +03:00
Konstantin Belousov	922bee44e4	aio_fsync_vnode: use for(;;) loop instead of label Reviewed by: jhb, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32023	2021-09-20 21:39:46 +03:00
Bartlomiej Grzesik	3f9a00e3b5	device: add device_get_property and device_has_property Generialize bus specific property accessors. Those functions allow driver code to access device specific information. Currently there is only support for FDT and ACPI buses. Reviewed by: manu, mw Sponsored by: Semihalf Differential revision: https://reviews.freebsd.org/D31597	2021-09-20 17:17:57 +02:00
Mateusz Guzik	7b2ac8eb9b	vfs: add missing VIRF_MOUNTPOINT in vfs_mountroot_shuffle Reported by: mav	2021-09-18 21:13:51 +02:00
Mateusz Guzik	0d9e99ce3b	vfs: add the missing vnode interlock in vfs_mountroot_shuffle Around v_mountedhere assignment.	2021-09-18 21:13:51 +02:00
Mark Johnston	50b07c1f71	unix: Fix a use-after-free in unp_drop() We need to load the socket pointer after locking the PCB, otherwise the socket may have been detached and freed by the time that unp_drop() sets so_error. This previously went unnoticed as the socket zone was _NOFREE. Reported by: pho MFC after: 1 week	2021-09-18 10:38:39 -04:00
Mateusz Guzik	f902e4bb04	lockmgr: fix lock profiling of face adaptive spinning	2021-09-18 10:16:58 +00:00
Mateusz Guzik	a2cb65b8fe	cache: count vnodes in cache_purgevfs	2021-09-18 10:16:50 +00:00
Mateusz Guzik	5d8e32a66c	vfs: retire VNODE_REFCOUNT_FENCE_* macros They are unused as of last year.	2021-09-18 10:16:00 +00:00
Mark Johnston	2bd9826995	vfs: Permit unix sockets to be opened with O_PATH As with FIFOs, a path descriptor for a unix socket cannot be used with kevent(). In principle connectat(2) and bindat(2) could be modified to support an AT_EMPTY_PATH-like mode which operates on the socket referenced by an O_PATH fd referencing a unix socket. That would eliminate the path length limit imposed by sockaddr_un. Update O_PATH tests. Reviewed by: kib MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31970	2021-09-17 14:19:06 -04:00
Mark Johnston	ade1daa5c0	socket: Synchronize soshutdown() with listen(2) and AIO To handle shutdown(SHUT_RD) we flush the receive buffer of the socket. This may involve searching for control messages of type SCM_RIGHTS, since we need to close the file references. Closing arbitrary files with socket buffer locks held is undesirable, mainly due to lock ordering issues, so we instead make a copy of the socket buffer and operate on that without any locks. Fields in the original buffer are cleared. This behaviour clobbered the AIO job queue associated with a receive buffer. It could also cause us to leak a KTLS session reference. Reorder socket buffer fields to address this. An alternate solution would be to remove the hack in sorflush(), but this is not quite feasible (yet). In particular, though sorflush() flags the sockbuf with SBS_CANTRCVMORE, it is possible for more data to be queued - the flag just prevents userspace from reading more data. I suspect we should fix this; SBS_CANTRCVMORE represents a terminal state and protocols can likely just drop any data destined for such a buffer. Many of them already do, but in some cases the check is racy, and some KPI churn will be needed to fix everything. This approach is more straightforward for now. Reported by: syzbot+104d8ee3430361cb2795@syzkaller.appspotmail.com Reported by: syzbot+5bd2e7d05f84a59d0d1b@syzkaller.appspotmail.com Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31976	2021-09-17 14:19:06 -04:00
Mark Johnston	883761f0a8	socket: Remove NOFREE from the socket zone This flag was added during the transition away from the legacy zone allocator, commit `c897b81311`. The old zone allocator effectively provided _NOFREE semantics, but it seems that they are not required for sockets. In particular, we use reference counting to keep sockets live. One somewhat dangerous case is sonewconn(), which returns a pointer to a socket with reference count 0. This socket is still effectively owned by the listening socket. Protocols must therefore be careful to synchronize sonewconn() calls with their pru_close implementations, since for listening sockets soclose() will abort the child sockets. For example, TCP holds the listening socket's PCB read locked across the sonewconn() call, which blocks tcp_usr_close(), and sofree() synchronizes with a concurrent soabort() of the nascent socket. However, _NOFREE semantics are not required here. Eliminating _NOFREE has several benefits: it enables use-after-free detection (e.g., by KASAN) and lets the system reclaim memory from the socket zone under memory pressure. No functional change intended. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31975	2021-09-17 14:19:06 -04:00
Mark Johnston	6b288408ca	socket: Add assertions around naked refcount decrements Sockets in a listen queue hold a reference to the parent listening socket. Several code paths release this reference manually when moving a child socket out of the queue. Replace comments about the expected post-decrement refcount value with assertions. Use refcount_load() instead of a plain load. No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31974	2021-09-17 14:19:06 -04:00
Mark Johnston	dfcef87714	socket: Fix a use-after-free in soclose() After releasing the fd reference to a socket "so", we should avoid testing SOLISTENING(so) since the socket may have been freed. Instead, directly test whether the list of unaccepted sockets is empty. Fixes: `f4bb1869dd` ("Consistently use the SOLISTENING() macro") Pointy hat: markj MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31973	2021-09-17 14:19:05 -04:00
Mark Johnston	bf25678226	ktls: Fix error/mode confusion in TCP_*TLS_MODE getsockopt handlers ktls_get_(rx\|tx)_mode() can return an errno value or a TLS mode, so errors are effectively hidden. Fix this by using a separate output parameter. Convert to the new socket buffer locking macros while here. Note that the socket buffer lock is not needed to synchronize the SOLISTENING check here, we can rely on the PCB lock. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31977	2021-09-17 14:19:05 -04:00
Mark Johnston	40fcdb9366	kcov: Disable address and memory sanitizers in get_kinfo() get_kinfo() is only called from the coverage sanitizer callbacks, which are similarly uninstrumented. Sponsored by: The FreeBSD Foundation	2021-09-17 14:19:05 -04:00
Konstantin Belousov	197a4f29f3	buffer pager: allow get_blksize method to return error Reported and reviewed by: asomers Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31998	2021-09-17 20:29:55 +03:00
Konstantin Belousov	796a8e1ad1	procctl(2): Add PROC_WXMAP_CTL/STATUS It allows to override kern.elf{32,64}.allow_wx on per-process basis. In particular, it makes it possible to run binaries without PT_GNU_STACK and without elfctl note while allow_wx = 0. Reviewed by: brooks, emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31779	2021-09-17 15:42:01 +03:00

... 2 3 4 5 6 ...

18931 Commits