freebsd-dev

Author	SHA1	Message	Date
Edward Tomasz Napierala	1a13f2e6b4	Introduce stats(3), a flexible statistics gathering API. This provides a framework to define a template describing a set of "variables of interest" and the intended way for the framework to maintain them (for example the maximum, sum, t-digest, or a combination thereof). Afterwards the user code feeds in the raw data, and the framework maintains these variables inside a user-provided, opaque stats blobs. The framework also provides a way to selectively extract the stats from the blobs. The stats(3) framework can be used in both userspace and the kernel. See the stats(3) manual page for details. This will be used by the upcoming TCP statistics gathering code, https://reviews.freebsd.org/D20655. The stats(3) framework is disabled by default for now, except in the NOTES kernel (for QA); it is expected to be enabled in amd64 GENERIC after a cool down period. Reviewed by: sef (earlier version) Obtained from: Netflix Relnotes: yes Sponsored by: Klara Inc, Netflix Differential Revision: https://reviews.freebsd.org/D20477	2019-10-07 19:05:05 +00:00
Mateusz Guzik	dc20b834ca	vfs: add optional root vnode caching Root vnodes looekd up all the time, e.g. when crossing a mount point. Currently used routines always perform a costly lookup which can be trivially avoided. Reviewed by: jeff (previous version), kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:14:32 +00:00
Kyle Evans	e3f35d562f	Remove the remnants of SI_CHEAPCLONE SI_CHEAPCLONE was introduced in r66067 for use with cloned bpfs. It was later also used in tty, tun, tap at points. The rough timeline for being removed in each of these is as follows: - r181690: bpf switched to use cdevpriv API by ed@ - r181905: ed@ rewrote the TTY later to be mpsafe - r204464: kib@ removes it from tun/tap, declaring it unused I've not yet been able to dig up any other consumers in the intervening 9 years. It is no longer set on any devices in the tree and leaves an interesting situation in make_dev_sv where we're ok with the device already being set SI_NAMED.	2019-10-05 21:52:06 +00:00
Kyle Evans	d42fecb5c1	kern_conf: fully initialize cloned devices with make_dev_args, too Attempting to initialize si_drv{1,2} with mda_si_drv{1,2} does not work if you are operating on cloned devices. clone_create must be called prior to the make_dev* family to create/return the device on the clonelist as needed. This device is later returned early in newdev(), prior to si_drv{0,1,2} initialization. This patch simply breaks out of the loop if we've found a device and finishes init. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D21904	2019-10-05 21:44:18 +00:00
Mateusz Guzik	dfa8dae493	devfs: plug redundant bwillwrite avoidance vn_write already checks for vnode type to see if bwillwrite should be called. This effectively reverts r244643. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21905	2019-10-05 17:44:33 +00:00
Eric van Gyzen	e61e783b83	Add CTLFLAG_STATS to some vfs sysctl OIDs Add CTLFLAG_STATS to the following OIDs: vfs.altbufferflushes vfs.recursiveflushes vfs.barrierwrites vfs.flushwithdeps vfs.reassignbufcalls Refer to r353111. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2019-10-04 21:43:43 +00:00
Ed Maste	f91dd6091b	simplify path handling in sysctl_try_reclaim_vnode MAXPATHLEN / PATH_MAX includes space for the terminating NUL, and namei verifies the presence of the NUL. Thus there is no need to increase the buffer size here. The sysctl passes the string excluding the NUL, so req->newlen equal to PATH_MAX is too long. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21876	2019-10-02 21:01:23 +00:00
Mark Johnston	5131cba6d6	Use OBJT_PHYS VM objects for kernel modules. OBJT_DEFAULT incurs some unnecessary overhead given that kernel module pages cannot be paged out. Reviewed by: alc, kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21862	2019-10-02 16:34:42 +00:00
Mark Johnston	4a7b33ecf4	Disallow fcntl(F_READAHEAD) when the vnode is not a regular file. The mountpoint may not have defined an iosize parameter, so an attempt to configure readahead on a device file can lead to a divide-by-zero crash. The sequential heuristic is not applied to I/O to or from device files, and posix_fadvise(2) returns an error when v_type != VREG, so perform the same check here. Reported by: syzbot+e4b682208761aa5bc53a@syzkaller.appspotmail.com Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21864	2019-10-02 15:45:49 +00:00
Kyle Evans	5a391b572b	shm_open2(2): completely unbreak kern_shm_open2(), since conception, completely fails to pass the mode along to kern_shm_open(). This breaks most uses of it. Add tests alongside this that actually check the mode of the returned files. PR: 240934 [pulseaudio breakage] Reported by: ler, Andrew Gierth [postgres breakage] Diagnosed by: Andrew Gierth (great catch) Tested by: ler, tmunro Pointy hat to: kevans	2019-10-02 02:37:34 +00:00
Ed Maste	f403831e6c	sysalls.master: remove superfluous ellipsis in comment A single period is sufficient in this comment, and making this change lets us find references to varargs syscalls by searching for ...	2019-10-01 17:05:21 +00:00
Brooks Davis	3a94552174	Restore the ability to set capenabled directly in syscalls.conf. This fixes generation of cloudabi syscall tables broken in r340424. Reviewed by: kevans, emaste MFC after: 3 days Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D21821	2019-09-30 20:58:29 +00:00
Kyle Evans	11fd6a60e7	syscalls.master: consistency, move ); to newline (no functional change)	2019-09-30 13:26:16 +00:00
Mark Johnston	1aa696babc	Fix some problems with the SPARSE_MAPPING option in the kernel linker. - Ensure that the end of the mapping passed to vm_page_wire() is page-aligned. vm_page_wire() expects this. - Wire pages before reading data into them. - Apply protections specified in the segment descriptor using vm_map_protect() once relocation processing is done. - On amd64, ensure that we load KLDs above KERNBASE, since they are compiled with the "kernel" memory model by default. Reviewed by: kib MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21756	2019-09-28 01:42:59 +00:00
Andrew Gallatin	b2dba6634b	kTLS: Fix a bug where we would not encrypt anon data inplace. Software Kernel TLS needs to allocate a new destination crypto buffer when encrypting data from the page cache, so as to avoid overwriting shared clear-text file data with encrypted data specific to a single socket. When the data is anonymous, eg, not tied to a file, then we can encrypt in place and avoid allocating a new page. This fixes a bug where the existing code always assumes the data is private, and never encrypts in place. This results in unneeded page allocations and potentially more memory bandwidth consumption when doing socket writes. When the code was written at Netflix, ktls_encrypt() looked at private sendfile flags to determine if the pages being encrypted where part of the page cache (coming from sendfile) or anonymous (coming from sosend). This was broken internally at Netflix when the sendfile flags were made private, and the M_WRITABLE() check was added. Unfortunately, M_WRITABLE() will always be false for M_NOMAP mbufs, since one cannot just mtod() them. This change introduces a new flags field to the mbuf_ext_pgs struct by stealing a byte from the tls hdr. Note that the current header is still 2 bytes larger than the largest header we support: AES-CBC with explicit IV. We set MBUF_PEXT_FLAG_ANON when creating an unmapped mbuf in m_uiotombuf_nomap() (which is the path that socket writes take), and we check for that flag in ktls_encrypt() when looking for anon pages. Reviewed by: jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21796	2019-09-27 20:08:19 +00:00
Andrew Gallatin	6554362c66	kTLS support for TLS 1.3 TLS 1.3 requires a few changes because 1.3 pretends to be 1.2 with a record type of application data. The "real" record type is then included at the end of the user-supplied plaintext data. This required adding a field to the mbuf_ext_pgs struct to save the record type, and passing the real record type to the sw_encrypt() ktls backend functions. Reviewed by: jhb, hselasky Sponsored by: Netflix Differential Revision: D21801	2019-09-27 19:17:40 +00:00
Mateusz Guzik	708cf7eb6c	cache: decrease ncnegfactor to 5 The current mechanism is bogus in several ways: - the limit is a percentage of total entries added, which means negative entries get evicted all the time even if there are plenty of resources - evicting code is almost not concurrent, which makes it unable to remove entries fast enough when doing something as simple as -j 104 buildworld - there is no support for performing mass removal if necessary Vast majority of negative entries never get any hits. Only evicting them when the filesystem demands it results in a significant growth of the namecache with almost no improvement in the hit ratio. Sample result about afer 90 minutes of poudriere -j 104: current no evict % of the original numneg 219737 2013157 916 numneghits 266711906 263544562 98 [1] [1] this may look funny but there is a certain dose of variation to the build The number was chosen as something which mostly eliminates spurious evictions during lighter workloads but still keeps the total at bay. Sponsored by: The FreeBSD Foundation	2019-09-27 19:14:03 +00:00
Mateusz Guzik	e643141838	cache: stop requeuing negative entries on the hot list Turns out it does not improve hit ratio, but it does come with a cost induces stemming from dirtying hit entries. Sample result: hit counts of evicted entries after 2 buildworlds before: value ------------- Distribution ------------- count -1 \| 0 0 \|@@@@@@@@@@@@@@@@@@@@@@@@@ 180865 1 \|@@@@@@@ 49150 2 \|@@@ 19067 4 \|@ 9825 8 \|@ 7340 16 \|@ 5952 32 \|@ 5243 64 \|@ 4446 128 \| 3556 256 \| 3035 512 \| 1705 1024 \| 1078 2048 \| 365 4096 \| 95 8192 \| 34 16384 \| 26 32768 \| 23 65536 \| 8 131072 \| 6 262144 \| 0 after: value ------------- Distribution ------------- count -1 \| 0 0 \|@@@@@@@@@@@@@@@@@@@@@@@@@ 184004 1 \|@@@@@@ 47577 2 \|@@@ 19446 4 \|@ 10093 8 \|@ 7470 16 \|@ 5544 32 \|@ 5475 64 \|@ 5011 128 \| 3451 256 \| 3002 512 \| 1729 1024 \| 1086 2048 \| 363 4096 \| 86 8192 \| 26 16384 \| 25 32768 \| 24 65536 \| 7 131072 \| 5 262144 \| 0 Sponsored by: The FreeBSD Foundation	2019-09-27 19:13:22 +00:00
Mateusz Guzik	312196df0f	cache: make negative list shrinking a little bit concurrent Continue protecting demotion from the hotlist and selection of the target list with the ncneg_shrink_lock lock, but drop it before relocking to zap the node. While here count how many times we skipped shrinking due to the lock being already taken. Sponsored by: The FreeBSD Foundation	2019-09-27 19:12:43 +00:00
Mateusz Guzik	95c6dd890a	cache: stop recalculating upper limit each time a new entry is added Sponsored by: The FreeBSD Foundation	2019-09-27 19:12:20 +00:00
Konstantin Belousov	df08823d07	Improve MD page fault handlers. Centralize calculation of signal and ucode delivered on unhandled page fault in new function vm_fault_trap(). MD trap_pfault() now almost always uses the signal numbers and error codes calculated in consistent MI way. This introduces the protection fault compatibility sysctls to all non-x86 architectures which did not have that bug, but apparently they were already much more wrong in selecting delivered signals on protection violations. Change the delivered signal for accesses to mapped area after the backing object was truncated. According to POSIX description for mmap(2): The system shall always zero-fill any partial page at the end of an object. Further, the system shall never write out any modified portions of the last page of an object which are beyond its end. References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal. An implementation may generate SIGBUS signals when a reference would cause an error in the mapped object, such as out-of-space condition. Adjust according to the description, keeping the existing compatibility code for SIGSEGV/SIGBUS on protection failures. For situations where kernel cannot handle page fault due to resource limit enforcement, SIGBUS with a new error code BUS_OBJERR is delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on amd64 due to protection key access violation. vm_fault_hold() is renamed to vm_fault(). Fixed some nits in trap_pfault()s like mis-interpreting Mach errors as errnos. Removed unneeded truncations of the fault addresses reported by hardware. PR: 211924 Reviewed by: alc Discussed with: jilles, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21566	2019-09-27 18:43:36 +00:00
Andrew Turner	50bb04b750	Check the vfs option length is valid before accessing through When a VFS option passed to nmount is present but NULL the kernel will place an empty option in its internal list. This will have a NULL pointer and a length of 0. When we come to read one of these the kernel will try to load from the last address of virtual memory. This is normally invalid so will fault resulting in a kernel panic. Fix this by checking if the length is valid before dereferencing. MFC after: 3 days Sponsored by: DARPA, AFRL	2019-09-27 16:22:28 +00:00
David Bright	c4571256af	sysent: regenerate after r352747. Sponsored by: Dell EMC Isilon	2019-09-26 15:41:10 +00:00
Mark Johnston	55248d32f2	Fix handling of invalid pages in exec_map_first_page(). exec_map_first_page() would unconditionally free an unbacked, invalid page from the executable image. However, it is possible that the page is wired, in which case it is incorrect to free the page, so check for additional wirings first. Reported by: syzkaller Tested by: pho Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21767	2019-09-26 15:35:35 +00:00
David Bright	9afb12bab4	Add an shm_rename syscall Add an atomic shm rename operation, similar in spirit to a file rename. Atomically unlink an shm from a source path and link it to a destination path. If an existing shm is linked at the destination path, unlink it as part of the same atomic operation. The caller needs the same permissions as shm_unlink to the shm being renamed, and the same permissions for the shm at the destination which is being unlinked, if it exists. If those fail, EACCES is returned, as with the other shm_* syscalls. truss support is included; audit support will come later. This commit includes only the implementation; the sysent-generated bits will come in a follow-on commit. Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: jilles (earlier revision) Reviewed by: brueffer (manpages, earlier revision) Relnotes: yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21423	2019-09-26 15:32:28 +00:00
Toomas Soome	11fc80a098	kernel terminal should initialize fg and bg variables before calling TUNABLE_INT_FETCH We have two ways to check if kenv variable exists - either we check return value from TUNABLE_INT_FETCH, or we pre-initialize the variable and check if this value did change. In terminal_init() it is more convinient to use pre-initialized variables. Problem was revealed by older loader.efi, which did not set teken.* variables. Reported by: tuexen	2019-09-26 07:19:26 +00:00
Alexander Motin	176dd236dc	Microoptimize sched_pickcpu() CPU affinity on SMT. Use of CPU_FFS() to implement CPUSET_FOREACH() allows to save up to ~0.5% of CPU time on 72-thread SMT system doing 80K IOPS to NVMe from one thread. MFC after: 1 month Sponsored by: iXsystems, Inc.	2019-09-26 00:35:06 +00:00
Alexander Motin	c55dc51c37	Microoptimize sched_pickcpu() after r352658. I've noticed that I missed intr check at one more SCHED_AFFINITY(), so instead of adding one more branching I prefer to remove few. Profiler shows the function CPU time reduction from 0.24% to 0.16%. MFC after: 1 month Sponsored by: iXsystems, Inc.	2019-09-25 19:29:09 +00:00
Kyle Evans	079c5b9ed8	rfork(2): add RFSPAWN flag When RFSPAWN is passed, rfork exhibits vfork(2) semantics but also resets signal handlers in the child during creation to avoid a point of corruption of parent state from the child. This flag will be used by posix_spawn(3) to handle potential signal issues. Reviewed by: jilles, kib Differential Revision: https://reviews.freebsd.org/D19058	2019-09-25 19:20:41 +00:00
Gleb Smirnoff	dd902d015a	Add debugging facility EPOCH_TRACE that checks that epochs entered are properly nested and warns about recursive entrances. Unlike with locks, there is nothing fundamentally wrong with such use, the intent of tracer is to help to review complex epoch-protected code paths, and we mean the network stack here. Reviewed by: hselasky Sponsored by: Netflix Pull Request: https://reviews.freebsd.org/D21610	2019-09-25 18:26:31 +00:00
Kyle Evans	a9ac5e1424	sysent: regenerate after r352705 This also implements it, fixes kdump, and removes no longer needed bits from lib/libc/sys/shm_open.c for the interim.	2019-09-25 18:09:19 +00:00
Kyle Evans	234879a7e3	Mark shm_open(2) as COMPAT12, succeeded by shm_open2 Implementation and regenerated files will follow.	2019-09-25 18:06:48 +00:00
Kyle Evans	460211e730	sysent: regenerate after r352700	2019-09-25 17:59:58 +00:00
Kyle Evans	20f7057685	Add a shm_open2 syscall to support upcoming memfd_create shm_open2 allows a little more flexibility than the original shm_open. shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate shmflag argument that can be expanded later. Currently the only shmflag is to allow file sealing on the returned fd. shm_open and memfd_create will both be implemented in libc to use this new syscall. __FreeBSD_version is bumped to indicate the presence. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21393	2019-09-25 17:59:15 +00:00
Kyle Evans	0cd95859c8	[2/3] Add an initial seal argument to kern_shm_open() Now that flags may be set on posixshm, add an argument to kern_shm_open() for the initial seals. To maintain past behavior where callers of shm_open(2) are guaranteed to not have any seals applied to the fd they're given, apply F_SEAL_SEAL for existing callers of kern_shm_open. A special flag could be opened later for shm_open(2) to indicate that sealing should be allowed. We currently restrict initial seals to F_SEAL_SEAL. We cannot error out if F_SEAL_SEAL is re-applied, as this would easily break shm_open() twice to a shmfd that already existed. A note's been added about the assumptions we've made here as a hint towards anyone wanting to allow other seals to be applied at creation. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21392	2019-09-25 17:35:03 +00:00
Kyle Evans	af755d3e48	[1/3] Add mostly Linux-compatible file sealing support File sealing applies protections against certain actions (currently: write, growth, shrink) at the inode level. New fileops are added to accommodate seals - EINVAL is returned by fcntl(2) if they are not implemented. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D21391	2019-09-25 17:32:43 +00:00
Kyle Evans	85c5f3cb57	Add COMPAT12 support to makesyscalls.sh Reviewed by: kib, imp, brooks (all without syscalls.master edits) Differential Revision: https://reviews.freebsd.org/D21366	2019-09-25 17:29:45 +00:00
Toomas Soome	3001e0c942	kernel: terminal_init() should check for teken colors from kenv Check for teken.fg_color and teken.bg_color and prepare the color attributes accordingly. When white background is used, make it light to improve visibility. When black background is used, make kernel messages light.	2019-09-25 13:21:07 +00:00
Alexander Motin	bb3dfc6ae9	Fix wrong assertion in r352658. MFC after: 1 month	2019-09-25 11:58:54 +00:00
Alexander Motin	c9205e3500	Fix/improve interrupt threads scheduling. Doing some tests with very high interrupt rates I've noticed that one of conditions I added in r232207 to make interrupt threads in most cases run on local CPU never worked as expected (worked only if previous time it was executed on some other CPU, that is quite opposite). It caused additional CPU usage to run full CPU search and could schedule interrupt threads to some other CPU. This patch removes that code and instead reuses existing non-interrupt code path with some tweaks for interrupt case: - On SMT systems, if current thread is idle, don't look on other threads. Even if they are busy, it may take more time to do fill search and bounce the interrupt thread to other core then execute it locally, even sharing CPU resources. It is other threads should migrate, not bound interrupts. - Try hard to keep interrupt threads within LLC of their original CPU. This improves scheduling cost and supposedly cache and memory locality. On a test system with 72 threads doing 2.2M IOPS to NVMe this saves few percents of CPU time while adding few percents to IOPS. MFC after: 1 month Sponsored by: iXsystems, Inc.	2019-09-24 20:01:20 +00:00
Randall Stewart	35c7bb3407	This commit adds BBR (Bottleneck Bandwidth and RTT) congestion control. This is a completely separate TCP stack (tcp_bbr.ko) that will be built only if you add the make options WITH_EXTRA_TCP_STACKS=1 and also include the option TCPHPTS. You can also include the RATELIMIT option if you have a NIC interface that supports hardware pacing, BBR understands how to use such a feature. Note that this commit also adds in a general purpose time-filter which allows you to have a min-filter or max-filter. A filter allows you to have a low (or high) value for some period of time and degrade slowly to another value has time passes. You can find out the details of BBR by looking at the original paper at: https://queue.acm.org/detail.cfm?id=3022184 or consult many other web resources you can find on the web referenced by "BBR congestion control". It should be noted that BBRv1 (which this is) does tend to unfairness in cases of small buffered paths, and it will usually get less bandwidth in the case of large BDP paths(when competing with new-reno or cubic flows). BBR is still an active research area and we do plan on implementing V2 of BBR to see if it is an improvement over V1. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D21582	2019-09-24 18:18:11 +00:00
Mateusz Guzik	93a85508ad	cache: tidy up handling of negative entries - track the total count of hot entries - pre-read the lock when shrinking since it is typically already taken - place the lock in its own cacheline - shorten the hold time of hot lock list when zapping Sponsored by: The FreeBSD Foundation	2019-09-23 20:50:04 +00:00
Mark Johnston	38dae42c26	Use elf_relocaddr() when handling R_X86_64_RELATIVE relocations. This is required for DPCPU and VNET data variable definitions to work when KLDs are linked as DSOs. R_X86_64_RELATIVE relocations should not appear in object files, so assert this in elf_relocaddr(). Reviewed by: kib MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21755	2019-09-23 14:14:43 +00:00
Mateusz Guzik	afe257e3ca	cache: count evictions of negatve entries Sponsored by: The FreeBSD Foundation	2019-09-23 08:53:14 +00:00
Sean Eric Fagan	ba7a55d934	Add two options to allow mount to avoid covering up existing mount points. The two options are * nocover/cover: Prevent/allow mounting over an existing root mountpoint. E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local is already a mountpoint. * emptydir/noemptydir: Prevent/allow mounting on a non-empty directory. E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail. Neither of these options is intended to be a default, for historical and compatibility reasons. Reviewed by: allanjude, kib Differential Revision: https://reviews.freebsd.org/D21458	2019-09-23 04:28:07 +00:00
Mateusz Guzik	7505cffa56	cache: try to avoid vhold if locks held Sponsored by: The FreeBSD Foundation	2019-09-22 20:50:24 +00:00
Mateusz Guzik	cd2112c305	cache: jump in negative success instead of positive Sponsored by: The FreeBSD Foundation	2019-09-22 20:49:17 +00:00
Mateusz Guzik	d2be3ef05c	lockprof: move per-cpu data to dpcpu Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21747	2019-09-22 20:44:24 +00:00
Konstantin Belousov	f33533da8c	kern.elf{32,64}.pie_base sysctl: enforce page alignment. Requested by: rstone Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-09-21 20:03:17 +00:00
Mateusz Guzik	cbba2cb367	lockprof: use CPUFOREACH and drop always false lp_cpu NULL checks Sponsored by: The FreeBSD Foundation	2019-09-21 19:05:38 +00:00
Konstantin Belousov	95aafd6900	Make non-ASLR pie base tunable. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-09-21 18:00:23 +00:00
Alexander Motin	36d151a237	Allocate callout wheel from the respective memory domain. MFC after: 1 week	2019-09-21 15:38:08 +00:00
Andrew Gallatin	61b8a4af71	remove redundant "ktls" in KTLS thr name This reducesthe string width of the ktls thread name and improves "ps" output. Glanced at by: jhb Event: EuroBSDCon hackathon Sponsored by: Netflix	2019-09-20 09:36:07 +00:00
Mateusz Guzik	b488246b45	vfs: group fields used for per-cpu ops in one cacheline Sponsored by: The FreeBSD Foundation	2019-09-19 21:23:14 +00:00
Konstantin Belousov	382e01c8dc	sysctl: use names instead of magic numbers. Replace magic numbers with symbols for internal sysctl operations. Convert in-kernel and libc consumers. Submitted by: Pawel Biernacki MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21693	2019-09-18 16:13:10 +00:00
Konstantin Belousov	55894117b1	Return EISDIR when directory is opened with O_CREAT without O_DIRECTORY. Reviewed by: bcr (man page), emaste (previous version) PR: 240452 Sponsored by: The FreeBSD Foundation MFC after: 1 week DIfferential revision: https://reviews.freebsd.org/D21634	2019-09-17 18:32:18 +00:00
Kirk McKusick	100369071d	The VFS-level clustering code collects together sequential blocks by issuing delayed-writes (bdwrite()) until a non-sequential block is written or the maximum cluster size is reached. At that point it collects the delayed buffers together (using bread()) to write them in a single operation. The assumption was that since we just looked at them they will still be in memory so there is no need to check for a read error from bread(). Very occationally (apparently every 10-hours or so when being pounded by Peter Holm's tests) this assumption is wrong. The fix is to check for errors from bread() and fail the cluster write thus falling back to the default individual flushing of any still dirty buffers. Reported by: Peter Holm and Chuck Silvers Reviewed by: kib MFC after: 3 days	2019-09-17 17:44:50 +00:00
Mateusz Guzik	d245aa1e72	vfs: apply r352437 to the fast path as well This one is very hard to run into. If the filesystem is being unmounted or the mount point is freed the vfs_op_thread_enter will fail. For it to succeed the mount point itself would have to be reallocated in the time window between the initial read and the attempt to enter. Sponsored by: The FreeBSD Foundation	2019-09-17 15:53:40 +00:00
Mateusz Guzik	7f65185940	vfs: fix braino resulting in NULL pointer deref in r352424 The breakage was added after all the testing and the testing which followed was not sufficient to find it. Reported by: pho Sponsored by: The FreeBSD Foundation	2019-09-17 08:09:39 +00:00
Mateusz Guzik	4cace859c2	vfs: convert struct mount counters to per-cpu There are 3 counters modified all the time in this structure - one for keeping the structure alive, one for preventing unmount and one for tracking active writers. Exact values of these counters are very rarely needed, which makes them a prime candidate for conversion to a per-cpu scheme, resulting in much better performance. Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on a 104-way 2 socket Skylake system: before: 852393 ops/s after: 76682077 ops/s Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21637	2019-09-16 21:37:47 +00:00
Mateusz Guzik	e87f3f72f1	vfs: manage mnt_writeopcount with atomics See r352424. Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21575	2019-09-16 21:33:16 +00:00
Mateusz Guzik	ee831b2543	vfs: manage mnt_lockref with atomics See r352424. Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21574	2019-09-16 21:32:21 +00:00
Mateusz Guzik	a8c8e44bf0	vfs: manage mnt_ref with atomics New primitive is introduced to denote sections can operate locklessly on aspects of struct mount, but which can also be disabled if necessary. This provides an opportunity to start scaling common case modifications while providing stable state of the struct when facing unmount, write suspendion or other events. mnt_ref is the first counter to start being managed in this manner with the intent to make it per-cpu. Reviewed by: kib, jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21425	2019-09-16 21:31:02 +00:00
Kyle Evans	3155f2f0e2	rangelock: add rangelock_cookie_assert A future change to posixshm to add file sealing (in DIFF_21391[0] and child) will move locking out of shm_dotruncate as kern_shm_open() will require the lock to be held across the dotruncate until the seal is actually applied. For this, the cookie is passed into shm_dotruncate_locked which asserts RCA_WLOCKED. [0] Name changed to protect the innocent, hopefully, from getting autoclosed due to this reference... Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D21628	2019-09-15 02:59:53 +00:00
Mateusz Guzik	ce3ba63f67	vfs: release usecount using fetchadd 1. If we release the last usecount we take ownership of the hold count, which means the vnode will remain allocated until we vdrop it. 2. If someone else vrefs they will find no usecount and will proceed to add their own hold count. 3. No code has a problem with v_usecount transitioning to 0 without the interlock These facts combined mean we can fetchadd instead of having a cmpset loop. Reviewed by: kib (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21528	2019-09-13 15:49:04 +00:00
Mark Johnston	45cdd437ae	Remove a redundant NULL pointer check in cpuset_modify_domain(). cpuset_getroot() is guaranteed to return a non-NULL pointer. Reported by: Mark Millard <marklmi@yahoo.com> MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-09-12 16:47:38 +00:00
Hans Petter Selasky	11b57401e6	Use REFCOUNT_COUNT() to obtain refcount where appropriate. Refcount waiting will set some flag bits in the refcount value. Make sure these bits get cleared by using the REFCOUNT_COUNT() macro to obtain the actual refcount. Differential Revision: https://reviews.freebsd.org/D21620 Reviewed by: kib@, markj@ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-09-12 16:26:59 +00:00
Kyle Evans	5163b1a75c	Follow up r352244: kenv: tighten up assertions As I like to forget: static kenv var formatting is actually such that an empty environment would be double null bytes. We should make sure that a non-zero buffer has at least enough for this, though most of the current usage is with a 4k buffer.	2019-09-12 14:34:46 +00:00
Kyle Evans	436c46875d	kenv: assert that an empty static buffer passed in is "empty" Garbage in the passed-in buffer can cause problems if any attempts to read the kenv are inadvertently made between init_static_kenv and the first kern_setenv -- assuming there is one. This is cheap and easy, so do it. This also helps rule out some class of bugs as one tries to debug; tunables fetch from the static environment up until SI_SUB_KMEM + 1, and many of these buffers are global ~4k buffers that rely on BSS clearing while others just grab a page of free memory and use it (e.g. xen).	2019-09-12 13:51:43 +00:00
Conrad Meyer	aaa3852435	buf: Add B_INVALONERR flag to discard data Setting the B_INVALONERR flag before a synchronous write causes the buf cache to forcibly invalidate contents if the write fails (BIO_ERROR). This is intended to be used to allow layers above the buffer cache to make more informed decisions about when discarding dirty buffers without successful write is acceptable. As a proof of concept, use in msdosfs to handle failures to mark the on-disk 'dirty' bit during rw mount or ro->rw update. Extending this to other filesystems is left as future work. PR: 210316 Reviewed by: kib (with objections) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21539	2019-09-11 21:24:14 +00:00
Mateusz Guzik	b088a4d6f9	cache: avoid excessive relocking on entry removal during lookup Due to lock ordering issues (bucket lock held, vnode locks wanted) the code starts with trylocking which in face of contention often fails. Prior to the change it would loop back with a possible yield. Instead note we know what locks are needed and can take them in the right order, avoiding retries. Then we can safely re-lookup and see if the entry we are looking for is still there. On a 104-way box poudriere would result in constant retries during an 11h run as seen in the vfs.cache.zap_and_exit_bucket_fail counter. before: 408866592 after : 0 However, a new stat reports: vfs.cache.zap_and_exit_bucket_relock_success: 32638 Note this is only a bandaid over current design issues. Tested by: pho Sponsored by: The FreeBSD Foundation	2019-09-10 20:19:29 +00:00
Mateusz Guzik	a6cacb0dca	cache: change the formula for calculating lock array sizes It used to be mp_ncpus * 64, but this gives unnecessarily big values for small machines and at the same time constraints bigger ones. In particular this helps on a 104-way box for which the count is now doubled. While here make cache_purgevfs less likely. Currently it is not efficient in face of contention due to lock ordering issues. These are fixable but not worth it at the moment. Sponsored by: The FreeBSD Foundation	2019-09-10 20:11:00 +00:00
Mateusz Guzik	1214618c05	cache: assorted cleanups Sponsored by: The FreeBSD Foundation	2019-09-10 20:08:24 +00:00
Jeff Roberson	c75757481f	Replace redundant code with a few new vm_page_grab facilities: - VM_ALLOC_NOCREAT will grab without creating a page. - vm_page_grab_valid() will grab and page in if necessary. - vm_page_busy_acquire() automates some busy acquire loops. Discussed with: alc, kib, markj Tested by: pho (part of larger branch) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21546	2019-09-10 19:08:01 +00:00
Jeff Roberson	4cdea4a853	Use the sleepq lock rather than the page lock to protect against wakeup races with page busy state. The object lock is still used as an interlock to ensure that the identity stays valid. Most callers should use vm_page_sleep_if_busy() to handle the locking particulars. Reviewed by: alc, kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21255	2019-09-10 18:27:45 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Konstantin Belousov	6c46ce7ea3	Initialize timehands linkage much earlier. Reported and tested by: trasz Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-09-09 12:42:48 +00:00
Konstantin Belousov	4b23dec4c2	Make timehands count selectable at boottime. Tested by: O'Connor, Daniel <darius@dons.net.au> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21563	2019-09-09 11:29:58 +00:00
Konstantin Belousov	1040254b75	In do_execve(), use shared text vnode lock consistently. Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21560	2019-09-07 16:10:57 +00:00
Konstantin Belousov	1c36b72874	In do_execve(), clear imgp->textset when restarting for interpreter. Otherwise, we might left the boolean set, which would affect cleanup after an error on interpreter activation. Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21560	2019-09-07 16:05:17 +00:00
Konstantin Belousov	1073d17eeb	When loading ELF interpreter, initialize whole nested image_params with zero. Otherwise we could mishandle imgp->textset. Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21560	2019-09-07 16:03:26 +00:00
Philip Paeps	bdc786cc7c	riscv: restore default HZ=1000, keep QEMU at HZ=100 This reverts r351918 and r351919. Discussed with: br, ian, imp	2019-09-07 05:13:31 +00:00
Philip Paeps	7f0851ab19	riscv: default to HZ=100 Most current RISC-V development platforms are not fast enough to benefit from the increased granularity provided by HZ=1000. Sponsored by: Axiado	2019-09-06 01:19:31 +00:00
Conrad Meyer	a6935d085c	Remove long-dead BUF_ASSERT_{,UN}HELD assertions These were fully neutered in r177676 (2008), but not removed at the time for unclear reasons. They're totally dead code, so go ahead and yank them now. No functional change.	2019-09-05 21:43:33 +00:00
Mateusz Guzik	68c3c1abe1	vfs: temporarily revert r351825 There are 2 problems: - it introduces a funny bug where it can end up trylocking the same vnode [1] - it exposes a pre-existing softdep deadlock [2] Both are easier to run into that the bug which got fixed, so revert until a complete solution is worked out. Reported by: cy [1], pho [2] Sponsored by: The FreeBSD Foundation	2019-09-05 18:19:51 +00:00
Stephen J. Kiernan	d57cd5ccd3	Bump up the low range of cpuset numbers to account for the kernel cpuset. Reviewed by: jeff Obtained from: Juniper Networks, Inc.	2019-09-05 17:48:39 +00:00
Mateusz Guzik	c07d4a0a68	vfs: fully hold vnodes in vnlru_free_locked Currently the code only bumps holdcnt and clears the VI_FREE flag, not performing actual vhold. Since the vnode is still visible elsewhere, a potential new user can find it and incorrectly assume it is properly held. Use vholdl instead to correctly hold the vnode. Another place recycling (vlrureclaim) does this already. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21522	2019-09-04 19:23:18 +00:00
Andriy Gapon	387df3b805	shutdown_halt: make sure that watchdog timer is stopped The point of halt is to keep the machine in limbo. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D21222	2019-09-04 13:26:59 +00:00
Kyle Evans	dca52ab480	posixshm: start counting writeable mappings r351650 switched posixshm to using OBJT_SWAP for shm_object r351795 added support to the swap_pager for tracking writeable mappings Take advantage of this and start tracking writeable mappings; fd sealing will use this to reject a seal on writing with EBUSY if any such mapping exist. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21456	2019-09-03 20:33:38 +00:00
Kyle Evans	fe7bcbaf50	vm pager: writemapping accounting for OBJT_SWAP Currently writemapping accounting is only done for vnode_pager which does some accounting on the underlying vnode. Extend this to allow accounting to be possible for any of the pager types. New pageops are added to update/release writecount that need to be implemented for any pager wishing to do said accounting, and we implement these methods now for both vnode_pager (unchanged) and swap_pager. The primary motivation for this is to allow other systems with OBJT_SWAP objects to check if their objects have any write mappings and reject operations with EBUSY if so. posixshm will be the first to do so in order to reject adding write seals to the shmfd if any writable mappings exist. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21456	2019-09-03 20:31:48 +00:00
Konstantin Belousov	fe69291ff4	Add procctl(PROC_STACKGAP_CTL) It allows a process to request that stack gap was not applied to its stacks, retroactively. Also it is possible to control the gaps in the process after exec. PR: 239894 Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21352	2019-09-03 18:56:25 +00:00
Mateusz Guzik	e3c3248cc7	vfs: implement usecount implying holdcnt vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting freed and usecount to denote it is actively used. Previously all operations bumping usecount would also bump holdcnt, which is not necessary. We can detect if usecount is already > 1 (in which case holdcnt is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves on atomic ops. Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21471	2019-09-03 15:42:11 +00:00
Mateusz Guzik	d05b53e0ba	Add sysctlbyname system call Previously userspace would issue one syscall to resolve the sysctl and then another one to actually use it. Do it all in one trip. Fallback is provided in case newer libc happens to be running on an older kernel. Submitted by: Pawel Biernacki Reported by: kib, brooks Differential Revision: https://reviews.freebsd.org/D17282	2019-09-03 04:16:30 +00:00
Mateusz Guzik	1874c61b90	vfs: restore mp null check in vop_stdgetwritemount The initially read mount point can already be NULL. Reported by: markj Fixes: r351656 ("vfs: stop refing freed mount points in vop_stdgetwritemount") Sponsored by: The FreeBSD Foundation	2019-09-02 15:24:25 +00:00
Mateusz Guzik	9576ff5803	proc: clear pid bitmap entry after dropping proctree lock There is no correctness change here, but the procid lock is contended in the fork path and taking it while holding proctree avoidably extends its hold time. Note that there are other ids which can end up getting cleared with the lock. Sponsored by: The FreeBSD Foundation	2019-09-02 12:46:43 +00:00
Mark Johnston	08cfa56ea3	Extend uma_reclaim() to permit different reclamation targets. The page daemon periodically invokes uma_reclaim() to reclaim cached items from each zone when the system is under memory pressure. This is important since the size of these caches is unbounded by default. However it also results in bursts of high latency when allocating from heavily used zones as threads miss in the per-CPU caches and must access the keg in order to allocate new items. With r340405 we maintain an estimate of each zone's usage of its (per-NUMA domain) cache of full buckets. Start making use of this estimate to avoid reclaiming the entire cache when under memory pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only items in excess of the estimate are reclaimed. Draining a zone reclaims all of the cached full buckets (the previous behaviour of uma_reclaim()), and may further drain the per-CPU caches in extreme cases. Now, when under memory pressure, the page daemon will trim zones rather than draining them. As a result, heavily used zones do not incur bursts of bucket cache misses following reclamation, but large, unused caches will be reclaimed as before. Reviewed by: jeff Tested by: pho (an earlier version) MFC after: 2 months Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16667	2019-09-01 22:22:43 +00:00
Mark Johnston	63cdd18e40	Restrict the input domain set in cpuset_setdomain(2) to all_domains. To permit larger values of MAXMEMDOM, which is currently 8 on amd64, cpuset_setdomain(2) accepts a mask of size 256. In the kernel, domain set masks are 64 bits wide, but can only represent a set of MAXMEMDOM domains due to the use of the ds_order table. Domain sets passed to cpuset_setdomain(2) are restricted to a subset of their parent set, which is typically the root set, but before this happens we modify the input set to exclude empty domains. domainset_empty_vm() and other code which manipulates domain sets expect the mask to be a subset of all_domains, so enforce that when performing validation of cpuset_setdomain(2) parameters. Reported and tested by: pho Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21477	2019-09-01 21:38:08 +00:00
Mateusz Guzik	2796c209b0	vfs: stop refing freed mount points in vop_stdgetwritemount The code used blindly ref based on an unsafely red address and then would backpedal if necessary. This was safe in terms of memory access since mounts are type-stable, but made for a potential a bug where the mount was reused and had the count reset to 0 before this code decreased it. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21411	2019-09-01 14:01:09 +00:00
Kyle Evans	32287ea72b	posixshm: switch to OBJT_SWAP in advance of other changes Future changes to posixshm will start tracking writeable mappings in order to support file sealing. Tracking writeable mappings for an OBJT_DEFAULT object is complicated as it may be swapped out and converted to an OBJT_SWAP. One may generically add this tracking for vm_object, but this is difficult to do without increasing memory footprint of vm_object and blowing up memory usage by a significant amount. On the other hand, the swap pager can be expanded to track writeable mappings without increasing vm_object size. This change is currently in D21456. Switch over to OBJT_SWAP in advance of the other changes to the swap pager and posixshm.	2019-09-01 00:33:16 +00:00
Mateusz Guzik	c2b600f98f	vfs: add a missing VNODE_REFCOUNT_FENCE_REL to v_incr_usecount_locked Sponsored by: The FreeBSD Foundation	2019-08-30 21:54:45 +00:00
Mateusz Guzik	3bb8d8d8c9	vfs: tidy up assertions in vfs_subr - assert unlocked vnode interlock in vref - assert right counts in vputx - print debug info for panic in vdrop Sponsored by: The FreeBSD Foundation	2019-08-30 00:45:53 +00:00
Konstantin Belousov	6470c8d3db	Rework v_object lifecycle for vnodes. Current implementation of vnode_create_vobject() and vnode_destroy_vobject() is written so that it prepared to handle the vm object destruction for live vnode. Practically, no filesystems use this, except for some remnants that were present in UFS till today. One of the consequences of that model is that each filesystem must call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result all of them get rid of the v_object in reclaim. Move the call to vnode_destroy_vobject() to vgonel() before VOP_RECLAIM(). This makes v_object stable: either the object is NULL, or it is valid vm object till the vnode reclamation. Remove code from vnode_create_vobject() to handle races with the parallel destruction. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412	2019-08-29 07:50:25 +00:00
Mateusz Guzik	1e2f0ceb2f	vfs: add VOP_NEED_INACTIVE vnode usecount drops to 0 all the time (e.g. for directories during path lookup). When that happens the kernel would always lock the exclusive lock for the vnode in order to call vinactive(). This blocks other threads who want to use the vnode for looukp. vinactive is very rarely needed and can be tested for without the vnode lock held. This patch gives filesytems an opportunity to do it, sample total wait time for tmpfs over 500 minutes of poudriere -j 104: before: 557563641706 (lockmgr:tmpfs) after: 46309603301 (lockmgr:tmpfs) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371	2019-08-28 20:34:24 +00:00
Mark Johnston	772dd133c6	Avoid direct accesses of the vm_page wire_count field. No functional change intended. Sponsored by: Netflix	2019-08-28 18:01:54 +00:00
Mateusz Guzik	88cc62e5a5	proc: eliminate the zombproc list It is not needed by anything in the kernel and it slightly drives up contention on both proctree and allproc locks. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21447	2019-08-28 16:18:23 +00:00
Mark Johnston	b5d239cb97	Wire pages in vm_page_grab() when appropriate. uiomove_object_page() and exec_map_first_page() would previously wire a page after having grabbed it. Ask vm_page_grab() to perform the wiring instead: this removes some redundant code, and is cheaper in the case where the requested page is not resident since the page allocator can be asked to initialize the page as wired, whereas a separate vm_page_wire() call requires the page lock. In vm_imgact_hold_page(), use vm_page_unwire_noq() instead of vm_page_unwire(PQ_NONE). The latter ensures that the page is dequeued before returning, but this is unnecessary since vm_page_free() will trigger a batched dequeue of the page. Reviewed by: alc, kib Tested by: pho (part of a larger patch) MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21440	2019-08-28 16:08:06 +00:00
Mateusz Guzik	2319489b6e	proc: remove zpfind It is not used by anything. If someone wants it back it should be reimplemented to use the proc hash. Sponsored by: The FreeBSD Foundation	2019-08-28 01:22:21 +00:00
John Baldwin	818d755318	Only define the 'tls' member of sfio in KERN_TLS is defined. This field was not initialized in the !KERN_TLS case triggering an assertion failure when using sendfile(2). Reported by: pho, asomers Sponsored by: Netflix	2019-08-27 22:21:18 +00:00
Mateusz Guzik	368cabbcb5	vfs: stop passing LK_INTERLOCK to VOP_UNLOCK The plan is to drop the flags argument. There is also a temporary bug now that nullfs ignores the flag. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21252	2019-08-27 20:30:56 +00:00
Mark Johnston	44e4def73b	Remove an extraneous + 1 in _domainset_create(). DOMAINSET_FLS, like our fls(), is 1-indexed. Reported by: alc MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-08-27 15:42:08 +00:00
Mark Johnston	8e6975047e	Fix several logic issues in domainset_empty_vm(). - Don't add 1 to the result of DOMAINSET_FLS. - Do not modify domainsets containing only empty domains. - Always flatten a _PREFER policy to _ROUNDROBIN if the preferred domain is empty. Previously we were doing this only when ds_cnt > 1. These bugs could cause hangs during boot if a VM domain is empty. Tested by: hselasky Reviewed by: hselasky, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21420	2019-08-27 14:06:34 +00:00
Konstantin Belousov	95acb40caa	vn_vget_ino_gen(): relock the lower vnode on error. The function' interface assumes that the lower vnode is passed and returned locked always. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-27 08:28:38 +00:00
John Baldwin	b2e60773c6	Add kernel-side support for in-kernel TLS. KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277	2019-08-27 00:01:56 +00:00
Xin LI	4e8671dd78	GZIO: Update to use zlib 1.2.11. PR: 229763 Submitted by: Yoshihiro Ota <ota j email ne jp> Differential Revision: https://reviews.freebsd.org/D21408	2019-08-25 07:50:44 +00:00
Mateusz Guzik	0256405e98	vfs: add vholdnz (for already held vnodes) Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21358	2019-08-25 05:11:43 +00:00
Mateusz Guzik	5b596b9fa5	Remove the obsolete pcpu_zone_ptr zone. It was only used by flowtable (removed in r321618). Sponsored by: The FreeBSD Foundation	2019-08-24 00:01:19 +00:00
Konstantin Belousov	e671edac06	De-commision the MNTK_NOINSMNTQ kernel mount flag. After all the changes, its dynamic scope is same as for MNTK_UNMOUNT, but to allow the syncer vnode to be re-installed on unmount failure. But the case of syncer was already handled by using the VV_FORCEINSMQ flag for quite some time. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-23 19:40:10 +00:00
Xin LI	a11bf9a49b	INVARIANTS: treat LA_LOCKED as the same of LA_XLOCKED in mtx_assert. The Linux lockdep API assumes LA_LOCKED semantic in lockdep_assert_held(), meaning that either a shared lock or write lock is Ok. On the other hand, the timeout code uses lc_assert() with LA_XLOCKED, and we need both to work. For mutexes, because they can not be shared (this is unique among all lock classes, and it is unlikely that we would add new lock class anytime soon), it is easier to simply extend mtx_assert to handle LA_LOCKED there, despite the change itself can be viewed as a slight abstraction violation. Reviewed by: mjg, cem, jhb MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D21362	2019-08-23 06:39:40 +00:00
Brooks Davis	075ac3b446	Reorganise conditionals to reduce duplication. No functional change. Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA, AFRL	2019-08-22 10:21:07 +00:00
Rick Macklem	df9bc7df42	Map ENOTTY to EINVAL for lseek(SEEK_DATA/SEEK_HOLE). Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE) on a file in a file system that does not have its own VOP_IOCTL(), the lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since ENOTTY is not listed as an error return by either the lseek(2) man page nor the POSIX draft for lseek(2). This was discussed on freebsd-current@ here: http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA This trivial patch maps ENOTTY to EINVAL for lseek(SEEK_DATA/SEEK_HOLE). Reviewed by: markj Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21300	2019-08-22 01:15:06 +00:00
Mark Johnston	5b699f1614	Add lockmgr(9) probes to the lockstat DTrace provider. They follow the conventions set by rw and sx lock probes. There is an additional lockstat:::lockmgr-disown probe. Update lockstat(1) to report on contention and hold events for lockmgr locks. Document the new probes in dtrace_lockstat.4, and deduplicate some of the existing probe descriptions. Reviewed by: mjg MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21355	2019-08-21 23:43:58 +00:00
Mark Johnston	9fb7c918ef	Remove manual wire_count adjustments from the unmapped mbuf code. The original code came from a desire to minimize the number of updates to v_wire_count, which prior to r329187 was updated using atomics. However, there is no significant benefit to batching today, so simply allocate pages using VM_ALLOC_WIRED and rely on system accounting. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D21323	2019-08-21 20:01:52 +00:00
Mark Johnston	6bc13e042f	Modify pipe_poll() to properly check for pending direct writes. With r349546, it is a responsibility of the writer to clear PIPE_DIRECTW after pinned data has been read. In particular, once a reader has drained this data, there is a small window where the pipe is empty but PIPE_DIRECTW is set. pipe_poll() was using the presence of PIPE_DIRECTW to determine whether to return POLLIN, so in this window it would claim that data was available to read when this was not the case. Fix this by modifying several checks for PIPE_DIRECTW to instead look at the number of residual bytes in data pinned by a direct writer. In some cases we really do want to check for PIPE_DIRECTW, since the presence of this flag indicates that any attempt to write to the pipe will block on the existing direct writer. Bisected and test case provided by: mav Tested by: pho Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21333	2019-08-21 19:35:04 +00:00
Ed Maste	f37192064a	mqueuefs: fix compat32 struct file leak In a compat32 error case we previously leaked a struct file. Submitted by: Karsten König, Secfault Security Security: CVE-2019-5603	2019-08-20 17:44:03 +00:00
Jeff Roberson	cf27e0d125	Use an atomic reference count for paging in progress so that callers do not require the object lock. Reviewed by: markj Tested by: pho (as part of a larger branch) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21311	2019-08-19 23:09:38 +00:00
Mateusz Guzik	4b3f767340	vfs: fix up r351193 ("stop always overwriting ->mnt_stat in VFS_STATFS") fs-specific part of vfs_statfs routines only fill in small portion of the structure. Previous code was always copying everything at a higher layer to acoomodate it and this patch does the same. 'df' (no arguments) worked fine because the caller uses mnt_stat itself as the target buffer, making all the copying a no-op for its own case. 'df /' and similar use a different consumer which passes its own buffer and this is where you can run into trouble. Reported by: cy Fixes: r351193 Sponsored by: The FreeBSD Foundation	2019-08-19 14:11:54 +00:00
Andrey V. Elsukov	75697b16b6	Use TAILQ_FOREACH_SAFE() macro to avoid use after free in soclose(). PR: 239893 MFC after: 1 week	2019-08-19 12:42:03 +00:00
Andriy Gapon	0db7afd0ae	assert that td_lk_slocks is not leaked upon return from kernel This is similar to checks for td_sx_slocks and td_rw_rlocks. Although td_lk_slocks is an implementation detail, it still makes sense to validate it. MFC after: 1 week Sponsored by: Panzura	2019-08-19 11:18:36 +00:00
Rick Macklem	2e1b32c0e3	Add a vop_stdioctl() that performs a trivial FIOSEEKDATA/FIOSEEKHOLE. Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE) on a file in a file system that does not have its own VOP_IOCTL(), the lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since ENOTTY is not listed as an error return by either the lseek(2) man page nor the POSIX draft for lseek(2). A discussion on freebsd-current@ seemed to indicate that implementing a trivial algorithm that returns the offset argument for FIOSEEKDATA and returns the file's size for FIOSEEKHOLE was the preferred fix. http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA The Linux kernel appears to implement this trivial algorithm as well. This patch adds a vop_stdioctl() that implements this trivial algorithm. It returns errors consistent with vn_bmap_seekhole() and, as such, will still return ENOTTY for non-regular files. I have proposed a separate patch that maps errors not described by the lseek(2) man page nor POSIX draft to EINVAL. This patch is under separate review. Reviewed by: kib Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21299	2019-08-19 00:29:05 +00:00
Konstantin Belousov	de4e1aeb21	Fix an issue with executing tmpfs binary. Suppose that a binary was executed from tmpfs mount, and the text vnode was reclaimed while the binary was still running. It is possible during even the normal operations since tmpfs vnode' vm_object has swap type, and no references on the vnode is held. Also assume that the text vnode was revived for some reason. Then, on the process exit or exec, unmapping of the text mapping tries to remove the text reference from the vnode, but since it went from recycle/instantiation cycle, there is no reference kept, and assertion in VOP_UNSET_TEXT_CHECKED() triggers. Fix this by keeping a use reference on the tmpfs vnode for each exec reference. This prevents the vnode reclamation while executable map entry is active. Do it by adding per-mount flag MNTK_TEXT_REFS that directs vop_stdset_text() to add use ref on first vnode text use, and per-vnode VI_TEXT_REF flag, to record the need on unref in vop_stdunset_text() on last vnode text use going away. Set MNTK_TEXT_REFS for tmpfs mounts. Reported by: bdrewery Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-18 20:36:11 +00:00
Konstantin Belousov	bb9e2184f0	Change locking requirements for VOP_UNSET_TEXT(). Require the vnode to be locked for the VOP_UNSET_TEXT() call. This will be used by the following bug fix for a tmpfs issue. Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-18 20:24:52 +00:00
Mateusz Guzik	e7c1709aaf	vfs: stop always overwriting ->mnt_stat in VFS_STATFS The struct is already populated on each mount (and remount). Fields are either constant or not used by filesystem in the first place. Some infrequently used functions use it to avoid having to allocate a new buffer and are left alone. The current code results in an avoidable copying single-threaded and significant cache line bouncing multithreaded While here deduplicate initial filling of the struct. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21317	2019-08-18 18:40:12 +00:00
Jeff Roberson	33205c60e7	Add a blocking wait bit to refcount. This allows refs to be used as a simple barrier. Reviewed by: markj, kib Discussed with: jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21254	2019-08-18 11:43:58 +00:00
Mateusz Guzik	50c7615fb0	fork: rework locking around do_fork - move allproc lock into the func, it is of no use prior to it - the code would lock p1 and p2 while holding allproc to partially construct it after it gets added to the list. instead we can do the work prior to adding anything. - protect lastpid with procid_lock As a side effect we do less work with allproc held. Sponsored by: The FreeBSD Foundation	2019-08-17 18:19:49 +00:00
Mateusz Guzik	60cdcb644d	fork: bump process count before checking for permission to cross the limit The limit is almost never reached. Do the check only on failure to see if we can override it. No change in user-visible behavior. Sponsored by: The FreeBSD Foundation	2019-08-17 17:56:43 +00:00
Mateusz Guzik	b05641b6bd	fork: stop skipping < 100 ids on wrap around Code doing this is commented with a claim that these IDs are occupied by daemons, but that's demonstrably false. To an extent the range is used by init and kernel processes (and on sufficiently big machines it indeed is fully populated). On a sample box 40-way box the highest id in the range is 63. On a different one it is 23. Just use the range. Sponsored by: The FreeBSD Foundation	2019-08-17 17:42:01 +00:00
Alexander Motin	3a60f3dad0	Add support for 'j', 't' and 'z' flags to kernel sscanf(). MFC after: 2 weeks	2019-08-16 19:46:22 +00:00
Jeff Roberson	2194393787	Move phys_avail definition into MI code. It is consumed in the MI layer and doing so adds more flexibility with less redundant code. Reviewed by: jhb, markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21250	2019-08-16 00:45:14 +00:00
Rick Macklem	c61b14315f	Fix copy_file_range(2) so that unneeded blocks are not allocated to the output file. When the byte range for copy_file_range(2) doesn't go to EOF on the output file and there is a hole in the input file, a hole must be "punched" in the output file. This is done by writing a block of bytes all set to 0. Without this patch, the write is done unconditionally which means that, if the output file already has a hole in that byte range, a unneeded data block of all 0 bytes would be allocated. This patch adds code to check for a hole in the output file, so that it can skip doing the write if there is already a hole in that byte range of the output file. This avoids unnecessary allocation of blocks to the output file. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D21155	2019-08-15 23:21:41 +00:00
Jeff Roberson	018ff6860f	Move scheduler state into the per-cpu area where it can be allocated on the correct NUMA domain. Reviewed by: markj, gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19315	2019-08-13 04:54:02 +00:00
Konstantin Belousov	7e097daa93	Only enable COMPAT_43 changes for syscalls ABI for a.out processes. Reviewed by: imp, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21200	2019-08-11 19:16:07 +00:00
Jonathan T. Looney	afd959f332	In m_pulldown(), before trying to prepend bytes to the subsequent mbuf, ensure that the subsequent mbuf contains the remainder of the bytes the caller sought. If this is not the case, fall through to the code which gathers the bytes in a new mbuf. This fixes a bug where m_pulldown() could fail to gather all the desired bytes into consecutive memory. PR: 238787 Reported by: A reddit user Discussed with: emaste Obtained from: NetBSD MFC after: 3 days	2019-08-09 05:18:59 +00:00
Rick Macklem	6b1bc6f7dd	Remove some harmless cruft from vn_generic_copy_file_range(). An earlier version of the patch had code that set "error" between line#s 2797-2799. When that code was moved, the second check for "error != 0" could never be true and the check became harmless cruft. This patch removes the cruft, mainly to make Coverity happy. Reported by: asomers, cem	2019-08-08 20:07:38 +00:00
Rick Macklem	614633146f	Fix copy_file_range(2) for an unlikely race during hole finding. Since the VOP_IOCTL(FIOSEEKDATA/FIOSEEKHOLE) calls are done with the vnode unlocked, it is possible for another thread to do: - truncate(), lseek(), write() between the two calls and create a hole where FIOSEEKDATA returned the start of data. For this case, VOP_IOCTL(FIOSEEKHOLE) will return the same offset for the hole location. This could result in an infinite loop in the copy code, since copylen is set to 0 and the copy doesn't advance. Usually, this race is avoided because of the use of rangelocks, but the NFS server does not do range locking and could do a sequence like the above to create the hole. This patch checks for this case and makes the hole search fail, to avoid the infinite loop. At this time, it is an open question as to whether or not the NFS server should do range locking to avoid this race.	2019-08-08 19:53:07 +00:00
Konstantin Belousov	b706be23b4	Update comment explaining create_init(). Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-08-08 16:42:53 +00:00
Xin LI	22bbc4b242	Convert DDB_CTF to use newer version of ZLIB. PR: 229763 Submitted by: Yoshihiro Ota <ota j email ne jp> Differential Revision: https://reviews.freebsd.org/D21176	2019-08-08 07:27:49 +00:00
Conrad Meyer	7d0658ad55	Fix !DDB kernel configurations after r350713 KDB is standard and the kdb_active variable is always available. So, de-conditionalize inclusion of sys/kdb.h in kern_sysctl.c. Reported by: Michael Butler <imb AT protected-networks.net> X-MFC-With: r350713 Sponsored by: Dell EMC Isilon	2019-08-08 01:37:41 +00:00
Conrad Meyer	088c17b46b	ddb(4): Add 'sysctl' command Implement `sysctl` in `ddb` by overriding `SYSCTL_OUT`. When handling the req, we install custom ddb in/out handlers. The out handler prints straight to the debugger, while the in handler ignores all input. This is intended to allow us to print just about any sysctl. There is a known issue when used from ddb(4) entered via 'sysctl debug.kdb.enter=1'. The DDB mode does not quite prevent all lock interactions, and it is possible for the recursive Giant lock to be unlocked when the ddb(4) 'sysctl' command is used. This may result in a panic on return from ddb(4) via 'c' (continue). Obviously, this is not a problem when debugging already-paniced systems. Submitted by: Travis Lane (formerly: <travis.lane AT isilon.com>) Reviewed by: vangyzen (earlier version), Don Morris <dgmorris AT earthlink.net> Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20219	2019-08-08 00:42:29 +00:00
Conrad Meyer	76cb1112da	sbuf(9): Add sbuf_nl_terminate() API The API is used to gracefully terminate text line(s) with a single \n. If the formatted buffer was empty or already ended in \n, it is unmodified. Otherwise, a newline character is appended to it. The API, like other sbuf-modifying routines, is only valid while the sbuf is not FINISHED. Reviewed by: rlibby Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21030	2019-08-07 19:27:14 +00:00
Conrad Meyer	d23813cdb9	sbuf(9): Refactor sbuf_newbuf into sbuf_new Code flow was somewhat difficult to read due to the combination of multiple return sites and the 4x possible dynamic constructions of an sbuf. (Future consideration: do we need all 4?) Refactored slightly to improve legibility. No functional change. Sponsored by: Dell EMC Isilon	2019-08-07 19:25:56 +00:00

1 2 3 4 5 ...

16978 Commits