freebsd-dev

Author	SHA1	Message	Date
Andrew Turner	72b66398fa	Create a common function to handle freeing the kcov info struct. Both places that may free the kcov info struct are identical. Create a new common function to hold the code. Sponsored by: DARPA, AFRL	2019-02-19 17:03:34 +00:00
Mark Johnston	18a7de663b	Move a racy assertion in filt_pipewrite(). EVFILT_WRITE knotes for pipes live on the knlist for the other end of the pipe. Since they do not hold a reference on the corresponding file structure, they may be removed from the knlist by pipeclose() while still remaining active. In this case, there is no knlist lock acquired before filt_pipewrite() is called, so the assertion fails. Fix the problem by first checking whether that end of the pipe has been closed. These checks are memory safe since the knote holds a reference on one end of the pipe, and the pipe structure is not freed until both ends are closed. The checks are not racy since PIPE_EOF is never cleared after being set, and pipe_present is never set back to PIPE_ACTIVE after pipeclose() has been called. PR: 235640 Reported and tested by: pho Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19224	2019-02-19 15:46:43 +00:00
Mark Johnston	648890835c	Remove a write-only variable orphaned by r340677.	2019-02-17 16:56:41 +00:00
Bruce Evans	23e5e43ccd	Finish the fix for overflow in calcru1(). The previous fix was unnecessarily very slow up to 105 hours where the simple formula used previously worked, and unnecessarily slow by a factor of about 5/3 up to 388 days, and didn't work above 388 days. 388 days is not a long time, since it is a reasonable uptime, and for processes the times being calculated are aggregated over all threads, so with N CPUs running the same thread a runtime of 388 days is reachable after only 388 / N physical days. The PRs document overflow at 388 days, but don't try to fix it. Use the simple formula up to 76 hours. Then use a complicated general method that reduces to the simple formula up to a bit less than 105 hours, then reduces to the previous method without its extra work up to almost 388 days, then does more complicated reductions, usually many bits at a time so that this is not slow. This works up to half of maximum representable time (292271 years), with accumulated rounding errors of at most 32 usec. amd64 can do all this with no avoidable rounding errors in an inline asm with 2 instructions, but this is too special to use. __uint128_t can do the same with 100's of instructions on 64-bit arches. Long doubles with at least 64 bits of precision are the easiest method to use on i386 userland, but are hard to use in the kernel. PR: 76972 and duplicates Reviewed by: kib	2019-02-14 19:07:08 +00:00
Marius Strobl	f855ec814d	Make taskqgroup_attach{,_cpu}(9) work across architectures So far, intr_{g,s}etaffinity(9) take a single int for identifying a device interrupt. This approach doesn't work on all architectures supported, as a single int isn't sufficient to globally specify a device interrupt. In particular, with multiple interrupt controllers in one system as found on e. g. arm and arm64 machines, an interrupt number as returned by rman_get_start(9) may be only unique relative to the bus and, thus, interrupt controller, a certain device hangs off from. In turn, this makes taskqgroup_attach{,_cpu}(9) and - internal to the gtaskqueue implementation - taskqgroup_attach_deferred{,_cpu}() not work across architectures. Yet in turn, iflib(4) as gtaskqueue consumer so far doesn't fit architectures where interrupt numbers aren't globally unique. However, at least for intr_setaffinity(..., CPU_WHICH_IRQ, ...) as employed by the gtaskqueue implementation to bind an interrupt to a particular CPU, using bus_bind_intr(9) instead is equivalent from a functional point of view, with bus_bind_intr(9) taking the device and interrupt resource arguments required for uniquely specifying a device interrupt. Thus, change the gtaskqueue implementation to employ bus_bind_intr(9) instead and intr_{g,s}etaffinity(9) to take the device and interrupt resource arguments required respectively. This change also moves struct grouptask from <sys/_task.h> to <sys/gtaskqueue.h> and wraps struct gtask along with the gtask_fn_t typedef into #ifdef _KERNEL as userland likes to include <sys/_task.h> or indirectly drags it in - for better or worse also with _KERNEL defined -, which with device_t and struct resource dependencies otherwise is no longer as easily possible now. The userland inclusion problem probably can be improved a bit by introducing a _WANT_TASK (as well as a _WANT_MOUNT) akin to the existing _WANT_PRISON etc., which is orthogonal to this change, though, and likely needs an exp-run. While at it: - Change the gt_cpu member in the grouptask structure to be of type int as used elswhere for specifying CPUs (an int16_t may be too narrow sooner or later), - move the gtaskqueue_enqueue_fn typedef from <sys/gtaskqueue.h> to the gtaskqueue implementation as it's only used and needed there, - change the GTASK_INIT macro to use "gtask" rather than "task" as argument given that it actually operates on a struct gtask rather than a struct task, and - let subr_gtaskqueue.c consistently use __func__ to print functions names. Reported by: mmel Reviewed by: mmel Differential Revision: https://reviews.freebsd.org/D19139	2019-02-12 21:23:59 +00:00
Conrad Meyer	e0d164c7a6	Prevent overflow for usertime/systime in caclru1 PR: 76972 and duplicates Reported by: Dr. Christopher Landauer <cal AT aero.org>, Steinar Haug <sthaug AT nethelp.no> Submitted by: Andrey Zonov <andrey AT zonov.org> (earlier version) MFC after: 2 weeks	2019-02-10 23:07:46 +00:00
Konstantin Belousov	fa50a3552d	Implement Address Space Layout Randomization (ASLR) With this change, randomization can be enabled for all non-fixed mappings. It means that the base address for the mapping is selected with a guaranteed amount of entropy (bits). If the mapping was requested to be superpage aligned, the randomization honours the superpage attributes. Although the value of ASLR is diminshing over time as exploit authors work out simple ASLR bypass techniques, it elimintates the trivial exploitation of certain vulnerabilities, at least in theory. This implementation is relatively small and happens at the correct architectural level. Also, it is not expected to introduce regressions in existing cases when turned off (default for now), or cause any significant maintaince burden. The randomization is done on a best-effort basis - that is, the allocator falls back to a first fit strategy if fragmentation prevents entropy injection. It is trivial to implement a strong mode where failure to guarantee the requested amount of entropy results in mapping request failure, but I do not consider that to be usable. I have not fine-tuned the amount of entropy injected right now. It is only a quantitive change that will not change the implementation. The current amount is controlled by aslr_pages_rnd. To not spoil coalescing optimizations, to reduce the page table fragmentation inherent to ASLR, and to keep the transient superpage promotion for the malloced memory, locality clustering is implemented for anonymous private mappings, which are automatically grouped until fragmentation kicks in. The initial location for the anon group range is, of course, randomized. This is controlled by vm.cluster_anon, enabled by default. The default mode keeps the sbrk area unpopulated by other mappings, but this can be turned off, which gives much more breathing bits on architectures with small address space, such as i386. This is tied with the question of following an application's hint about the mmap(2) base address. Testing shows that ignoring the hint does not affect the function of common applications, but I would expect more demanding code could break. By default sbrk is preserved and mmap hints are satisfied, which can be changed by using the kern.elf{32,64}.aslr.honor_sbrk sysctl. ASLR is enabled on per-ABI basis, and currently it is only allowed on FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support for additional architectures will be added after further testing. Both per-process and per-image controls are implemented: - procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS; - NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible to force ASLR off for the given binary. (A tool to edit the feature control note is in development.) Global controls are: - kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2); - kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings; - kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2); - vm.cluster_anon - enables anon mapping clustering. PR: 208580 (exp runs) Exp-runs done by: antoine Reviewed by: markj (previous version) Discussed with: emaste Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5603	2019-02-10 17:19:45 +00:00
Andrew Turner	c50c26aa07	Fix the spelling of cov_unregister_pc. When unregistering kcov from the coverage interface we should use the unregister function, not the register function. Sponsored by: DARPA, AFRL	2019-02-08 16:18:17 +00:00
Konstantin Belousov	7cdb0b9d82	Fix renameat(2) for CAPABILITIES kernels. When renameat(2) is used with: - absolute path for to; - tofd not set to AT_FDCWD; - the target exists kern_renameat() requires CAP_UNLINK capability on tofd, but corresponding namei ni_filecap is not initialized at all because the lookup is absolute. As result, the check was done against empty filecap and syscall fails erronously. Fix it by creating a return flags namei member and reporting if the lookup was absolute, then do not touch to.ni_filecaps at all. PR: 222258 Reviewed by: jilles, ngie Sponsored by: The FreeBSD Foundation MFC after: 1 week X-MFC-note: KBI breakage Differential revision: https://reviews.freebsd.org/D19096	2019-02-08 04:18:17 +00:00
Konstantin Belousov	6f26dd50c3	do_execve(): lock vnode when needed. Code after exec_fail_dealloc label expects that the image vnode is locked if present. When copyout() of the strings or auxv vectors fails, goto to the error handling did not relocked the vnode as required. The copyout() can be made failing e.g. by creating an ELF image with PT_GNU_STACK segment disabling the write. Reported by: Jonathan Stuart <n0t.jcs@gmail.com> (found by fuzzing) Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-02-08 04:06:48 +00:00
Konstantin Belousov	eb785fab3b	Port sysctl kern.elf32.read_exec from amd64 to i386. Make it more comprehensive on i386, by not setting nx bit for any mapping, not just adding PF_X to all kernel-loaded ELF segments. This is needed for the compatibility with older i386 programs that assume that read access implies exec, e.g. old X servers with hand-rolled module loader. Reported and tested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-07 02:17:34 +00:00
Mark Johnston	401ca034cd	Avoid leaking fp references when truncating SCM_RIGHTS control messages. Reported by: pho Approved by: so MFC after: 0 minutes Security: CVE-2019-5596 Sponsored by: The FreeBSD Foundation	2019-02-05 17:55:08 +00:00
Bruce Evans	6fd2dcd428	Fix zapping of static hints and env in init_static_kenv(). Environments are terminated by 2 NULs, but only 1 NUL was zapped. Zapping only 1 NUL just splits the first string into an empty string and a corrupted string. All other strings in static hints and env remained live early in the boot when they were supposed to be disabled. Support calling init_static_kenv() very early in the boot, so as to use the env very early in the boot. Then the pointer to the loader env may change after the first call due to enabling paging or otherwise remapping the pointer. Another call is needed to register the change. Don't use the previous pointer in this (or any) later call. Reviewed by: kib	2019-02-05 15:34:55 +00:00
Conrad Meyer	e682df5397	extattr_list_vp: Narrow locked section somewhat Suggested by: mjg Reviewed by: kib, mjg Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19083	2019-02-05 04:47:21 +00:00
Conrad Meyer	c3eb848ce3	extattr_list_vp: Only take shared vnode lock List is a 'read'-type operation that does not modify shared state; it's safe for multiple thread to proceed concurrently. This is reflected in the vnode operation LISTEXTATTR locking protocol specification, which only requires a shared lock. (Similar to previous r248933.) Reported by: Case van Rij <case.vanrij AT isilon.com> Reviewed by: kib, mjg Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19082	2019-02-05 03:32:58 +00:00
Warner Losh	52467047aa	Regularize the Netflix copyright Use recent best practices for Copyright form at the top of the license: 1. Remove all the All Rights Reserved clauses on our stuff. Where we piggybacked others, use a separate line to make things clear. 2. Use "Netflix, Inc." everywhere. 3. Use a single line for the copyright for grep friendliness. 4. Use date ranges in all places for our stuff. Approved by: Netflix Legal (who gave me the form), adrian@ (pmc files)	2019-02-04 21:28:25 +00:00
Konstantin Belousov	f02bc51c09	Do not call PHOLD() while owning the allproc_lock sx. Otherwise the lock might recurse in faultin() if the process is swapped out. Reported by: zeising Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-03 21:31:40 +00:00
Brooks Davis	12aec82c09	Remove iBCS2: also remove xenix syscall function support. Missed in r342243.	2019-01-31 23:01:12 +00:00
Brooks Davis	90f2d5012a	Regen after r342190. Differential Revision: https://reviews.freebsd.org/D18444	2019-01-31 22:58:17 +00:00
Gleb Smirnoff	eec189c70b	Add new m_ext type for data for M_NOFREE mbufs, which doesn't actually do anything except several assertions. This type is going to be used for temporary on stack mbufs, that point into data in receive ring of a NIC, that shall not be freed. Such mbuf can not be stored or reallocated, its life time is current context.	2019-01-31 22:37:28 +00:00
Mark Johnston	919e7b5359	Prevent some kobj memory allocation failures from panicking the system. Parts of the kobj(9) KPI assume a non-sleepable context for the purpose of internal memory allocations, but currently have no way to signal an allocation failure to the caller, so they just panic in this case. This can occur even when kobj_create() is called with M_WAITOK. Fix some instances of the problem by plumbing wait flags from kobj_create() through internal subroutines. Change kobj_class_compile() to assume a sleepable context when called externally, since all existing callers use it in a sleepable context. To fix the problem fully the kobj_init() KPI must be changed. Reported and tested by: pho Reviewed by: kib (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19023	2019-01-31 22:27:39 +00:00
Alexander Motin	6afd921090	Only sort requests of types that have concept of offset. Other types, such as BIO_FLUSH or BIO_ZONE, or especially new/unknown ones, may imply some degree of ordering even if strict ordering is not requested explicitly. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-01-30 17:24:50 +00:00
Andrew Turner	524553f56d	Extract the coverage sanitizer KPI to a new file. This will allow multiple consumers of the coverage data to be compiled into the kernel together. The only requirement is only one can be registered at a given point in time, however it is expected they will only register when the coverage data is needed. A new kernel conflig option COVERAGE is added. This will allow kcov to become a module that can be loaded as needed, or compiled into the kernel. While here clean up the #include style a little. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D18955	2019-01-29 11:04:17 +00:00
Andriy Voskoboinyk	58c43838f7	m_getm2: correct a comment. The comment states that function always return a top of allocated mbuf; however, the function actually return the overall mbuf chain top pointer. Since there are already existing users of it (via m_getm(4) macro), rephrase the comment and leave behavior unchanged. PR: 134335 MFC after: 12 days	2019-01-27 16:44:27 +00:00
Konstantin Belousov	e5ac304989	Bump SPECNAMELEN to MAXNAMLEN. This includes the bump for cdevsw d_version. Otherwise, the impact on the ABI (not KBI) is surprisingly low. The most important affected interface is devname(3) and ttyname(3) which already correctly handle long names (and ttyname(3) should not be affected at all). Still, due to the d_version bump, I argue that the change is not MFC-able. Requested by: mmacy Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D18932	2019-01-27 00:46:06 +00:00
Kirk McKusick	dab83bd1e8	Add printing of b_ioflags to DDB `show buffer' command. Sponsored by: Netflix	2019-01-25 21:24:09 +00:00
Konstantin Belousov	be8dd1428e	Re-wrap long line after r341827. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-01-17 04:51:05 +00:00
Gleb Smirnoff	d1bb5d7d50	Fix mistake in r343030: move nswbuf calculation back to kern_vfs_bio_buffer_alloc(), because in init_param2() nbuf isn't really initialized yet. Pointed out by: bde	2019-01-16 20:20:38 +00:00
Konstantin Belousov	ea7e7006db	Implement shmat(2) flag SHM_REMAP. Based on the description in Linux man page. Reviewed by: markj, ngie (previous version) Sponsored by: Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18837	2019-01-16 05:15:57 +00:00
Xin LI	305bb04ee4	Use TD_IS_IDLETHREAD instead of unrolled version. MFC after: 2 weeks	2019-01-15 06:44:37 +00:00
Gleb Smirnoff	756a541279	Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho	2019-01-15 01:02:16 +00:00
Gleb Smirnoff	46713135ae	Add flag LK_NEW for lockinit() that is converted to LO_NEW and passed down to lock_init(). This allows for lockinit() on a not prezeroed memory.	2019-01-15 00:35:19 +00:00
Konstantin Belousov	28b740da38	Handle overflow in calculating max kmem size. vm_kmem_size is u_long, and it might be not capable of holding page count times PAGE_SIZE, even when scaled down by VM_KMEM_SIZE_SCALE. As bde reported, 12G PAE config ends up with zero for kmem size. Explicitly check for overflow and clamp kmem size at vm_kmem_size_max. If we end up at zero size because VM_KMEM_SIZE_MAX is not defined, panic with clear explanation rather then failing in a way which is hard to relate. Reported by: bde, pho Tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18767	2019-01-14 07:31:19 +00:00
Jason A. Harmening	7dff7eda1a	Handle SIGIO for listening sockets r319722 separated struct socket and parts of the socket I/O path into listening-socket-specific and dataflow-socket-specific pieces. Listening socket connection notifications are now handled by solisten_wakeup() instead of sowakeup(), but solisten_wakeup() does not currently post SIGIO to the owning process. PR: 234258 Reported by: Kenneth Adelman MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18664	2019-01-13 20:33:54 +00:00
Olivier Houchard	7045ac437b	Instead of using an incomplete list of platforms that uses 64bits time_t in 32bits mode, special case amd64, as i386 is the only arch that still uses 32bits time_t.	2019-01-13 00:19:15 +00:00
Andrew Turner	be860eae0f	Fix the check for the offset of td_frame and td_emuldata in struct thread. Pointy hat: andrew Sponsored by: DARPA, AFRL	2019-01-12 20:41:57 +00:00
Andrew Turner	b3c0d957a2	Add support for the Clang Coverage Sanitizer in the kernel (KCOV). When building with KCOV enabled the compiler will insert function calls to probes allowing us to trace the execution of the kernel from userspace. These probes are on function entry (trace-pc) and on comparison operations (trace-cmp). Userspace can enable the use of these probes on a single kernel thread with an ioctl interface. It can allocate space for the probe with KIOSETBUFSIZE, then mmap the allocated buffer and enable tracing with KIOENABLE, with the trace mode being passed in as the int argument. When complete KIODISABLE is used to disable tracing. The first item in the buffer is the number of trace event that have happened. Userspace can write 0 to this to reset the tracing, and is expected to do so on first use. The format of the buffer depends on the trace mode. When in PC tracing just the return address of the probe is stored. Under comparison tracing the comparison type, the two arguments, and the return address are traced. The former method uses on entry per trace event, while the later uses 4. As such they are incompatible so only a single mode may be enabled. KCOV is expected to help fuzzing the kernel, and while in development has already found a number of issues. It is required for the syzkaller system call fuzzer [1]. Other kernel fuzzers could also make use of it, either with the current interface, or by extending it with new modes. A man page is currently being worked on and is expected to be committed soon, however having the code in the kernel now is useful for other developers to use. [1] https://github.com/google/syzkaller Submitted by: Mitchell Horne <mhorne063@gmail.com> (Earlier version) Reviewed by: kib Testing by: tuexen Sponsored by: DARPA, AFRL Sponsored by: The FreeBSD Foundation (Mitchell Horne) Differential Revision: https://reviews.freebsd.org/D14599	2019-01-12 11:21:28 +00:00
Gleb Smirnoff	bcc3cec43c	Simplify sosetopt() so that function has single return point. No functional change.	2019-01-10 00:25:12 +00:00
Brooks Davis	4f4ef03f5f	style(9): fix the indent of a return.	2019-01-09 17:23:59 +00:00
Michael Tuexen	735835ed5c	Avoid overfow in vtruncbuf() Using daddr_t instead of int avoids trunclbn to become negative when it shouldn't. This isssue was found by running syzkaller. Reviewed by: mckusick, kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18763	2019-01-08 09:04:27 +00:00
Kristof Provost	de2d0d297a	Remove unneeded NULL check for td_ucred td_ucred is always set, so we don't need the ternary expression to check for it.	2019-01-04 21:12:17 +00:00
Conrad Meyer	6b83069e05	Expose threads-per-core and physical core count information With new sysctls (to the best of our ability do detect them). Restructured smp.4 slightly for clarity (keep relevant stuff closer to the top) while documenting. Reviewed by: markj, jhibbits (ppc parts) MFC after: 3 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D18322	2019-01-04 18:31:17 +00:00
Mark Johnston	2f2ddd68a5	Support MSG_DONTWAIT in send(2). As it does for recv(2), MSG_DONTWAIT indicates that the call should not block, returning EAGAIN instead. Linux and OpenBSD both implement this, so the change makes porting easier, especially since we do not return EINVAL or so when unrecognized flags are specified. Submitted by: Greg V <greg@unrelenting.technology> Reviewed by: tuexen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18728	2019-01-04 17:31:50 +00:00
Kristof Provost	4ae4822d6a	Simplify jail ID printing on process exit As suggested by kib@, we don't need to check p_ucred, because that's only NULL during process creation, and cr_prison is never NULL.	2018-12-29 21:36:02 +00:00
Conrad Meyer	a0483764f3	Update to Zstandard 1.3.8 This merge brings in a couple new files, which needed to be attached to the build; a new dependency on <limits.h>, which must be stubbed; and a name change in the Context parameter constants, from ZSTD_p_foo to ZSTD_c_foo. Significantly, it fixes a kernel build error with GCC where floating-point functions were included in the kernel build, by hiding them under the same compile-time #ifdef that already covered their invocation. That issue was introduced to FreeBSD in the 1.3.7 update and tracked upstream here: https://github.com/facebook/zstd/issues/1386 The full 1.3.8 release notes can be found on Github: https://github.com/facebook/zstd/releases/tag/v1.3.8 Relnotes: yes	2018-12-29 21:18:01 +00:00
Konstantin Belousov	7a6322e10d	For hw.{physmem,realmem,usermem} MIBs, clamp instead truncating. If the memory size does not fit into u_long, current code truncates the returned value and returns complete nonsense. Make the result slightly more useful by clamping it at ULONG_MAX. Reported and tested : pho MFC after: 1 week Sponsored by: The FreeBSD Foundation	2018-12-29 15:55:44 +00:00
Kristof Provost	af8becca15	Make kernel print jail ID when logging a process exit Kernel now includes jail ID when logging a process exit. jid is 0 for unjailed processes. Submitted by: Marie Helene Kvello-Aune <freebsd@mhka.no> Relnotes: yes Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D18618	2018-12-29 14:48:51 +00:00
Jilles Tjoelker	0abc7e41ba	pfind, pfind_any: Correct zombie logic SVN r340744 erroneously changed pfind() to return any process including zombies and pfind_any() to return only non-zombie processes. In particular, this caused kill() on a zombie process to fail with [ESRCH]. There is no direct test case for this but /usr/tests/bin/sh/builtins/kill1.0 occasionally triggers it (as reported by lwhsu). Conversely, returning zombies from pfind() seems likely to violate invariants and cause panics, but I have not looked at this. PR: 233646 Reviewed by: mjg, kib, ngie Differential Revision: https://reviews.freebsd.org/D18665	2018-12-28 13:32:14 +00:00
Kirk McKusick	c0029546f8	When loading an inode from disk, verify that its mode is valid. If invalid, return EINVAL. Note that inode check-hashes greatly reduce the chance that these errors will go undetected. Reported by: Christopher Krah <krah@protonmail.com> Reported as: FS-5-UFS-2: Denial Of Service in nmount-3 (ffs_read) Reviewed by: kib MFC after: 1 week Sponsored by: Netflix M sys/fs/ext2fs/ext2_vnops.c M sys/kern/vfs_subr.c M sys/ufs/ffs/ffs_snapshot.c M sys/ufs/ufs/ufs_vnops.c	2018-12-27 07:18:53 +00:00
Alexander Motin	abeb9f61f9	Increase MTX_POOL_SLEEP_SIZE from 128 to 1024. This value remained unchanged for 15 years, and now this bump reduces lock spinning in GEOM and BIO layers while doing ~1.6M IOPS to 4 NVMe on 72-core system from ~25% to ~5% by the cost of additional 28KB RAM. While there, align struct mtx_pool fields to cache lines. MFC after: 1 month	2018-12-24 23:52:35 +00:00

1 2 3 4 5 ...

16504 Commits