This provides a framework to define a template describing
a set of "variables of interest" and the intended way for
the framework to maintain them (for example the maximum, sum,
t-digest, or a combination thereof). Afterwards the user
code feeds in the raw data, and the framework maintains
these variables inside a user-provided, opaque stats blobs.
The framework also provides a way to selectively extract the
stats from the blobs. The stats(3) framework can be used in
both userspace and the kernel.
See the stats(3) manual page for details.
This will be used by the upcoming TCP statistics gathering code,
https://reviews.freebsd.org/D20655.
The stats(3) framework is disabled by default for now, except
in the NOTES kernel (for QA); it is expected to be enabled
in amd64 GENERIC after a cool down period.
Reviewed by: sef (earlier version)
Obtained from: Netflix
Relnotes: yes
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D20477
Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.
Reviewed by: jeff (previous version), kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21646
SI_CHEAPCLONE was introduced in r66067 for use with cloned bpfs. It was
later also used in tty, tun, tap at points. The rough timeline for being
removed in each of these is as follows:
- r181690: bpf switched to use cdevpriv API by ed@
- r181905: ed@ rewrote the TTY later to be mpsafe
- r204464: kib@ removes it from tun/tap, declaring it unused
I've not yet been able to dig up any other consumers in the intervening 9
years. It is no longer set on any devices in the tree and leaves an
interesting situation in make_dev_sv where we're ok with the device already
being set SI_NAMED.
Attempting to initialize si_drv{1,2} with mda_si_drv{1,2} does not work if
you are operating on cloned devices.
clone_create must be called prior to the make_dev* family to create/return
the device on the clonelist as needed. This device is later returned early
in newdev(), prior to si_drv{0,1,2} initialization.
This patch simply breaks out of the loop if we've found a device and
finishes init.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21904
vn_write already checks for vnode type to see if bwillwrite should be called.
This effectively reverts r244643.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21905
MAXPATHLEN / PATH_MAX includes space for the terminating NUL, and namei
verifies the presence of the NUL. Thus there is no need to increase the
buffer size here.
The sysctl passes the string excluding the NUL, so req->newlen equal to
PATH_MAX is too long.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21876
The mountpoint may not have defined an iosize parameter, so an attempt
to configure readahead on a device file can lead to a divide-by-zero
crash.
The sequential heuristic is not applied to I/O to or from device files,
and posix_fadvise(2) returns an error when v_type != VREG, so perform
the same check here.
Reported by: syzbot+e4b682208761aa5bc53a@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21864
kern_shm_open2(), since conception, completely fails to pass the mode along
to kern_shm_open(). This breaks most uses of it.
Add tests alongside this that actually check the mode of the returned
files.
PR: 240934 [pulseaudio breakage]
Reported by: ler, Andrew Gierth [postgres breakage]
Diagnosed by: Andrew Gierth (great catch)
Tested by: ler, tmunro
Pointy hat to: kevans
- Ensure that the end of the mapping passed to vm_page_wire() is
page-aligned. vm_page_wire() expects this.
- Wire pages before reading data into them.
- Apply protections specified in the segment descriptor using
vm_map_protect() once relocation processing is done.
- On amd64, ensure that we load KLDs above KERNBASE, since they
are compiled with the "kernel" memory model by default.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21756
Software Kernel TLS needs to allocate a new destination crypto
buffer when encrypting data from the page cache, so as to avoid
overwriting shared clear-text file data with encrypted data
specific to a single socket. When the data is anonymous, eg, not
tied to a file, then we can encrypt in place and avoid allocating
a new page. This fixes a bug where the existing code always
assumes the data is private, and never encrypts in place. This
results in unneeded page allocations and potentially more memory
bandwidth consumption when doing socket writes.
When the code was written at Netflix, ktls_encrypt() looked at
private sendfile flags to determine if the pages being encrypted
where part of the page cache (coming from sendfile) or
anonymous (coming from sosend). This was broken internally at
Netflix when the sendfile flags were made private, and the
M_WRITABLE() check was added. Unfortunately, M_WRITABLE() will
always be false for M_NOMAP mbufs, since one cannot just mtod()
them.
This change introduces a new flags field to the mbuf_ext_pgs
struct by stealing a byte from the tls hdr. Note that the current
header is still 2 bytes larger than the largest header we
support: AES-CBC with explicit IV. We set MBUF_PEXT_FLAG_ANON
when creating an unmapped mbuf in m_uiotombuf_nomap() (which is
the path that socket writes take), and we check for that flag in
ktls_encrypt() when looking for anon pages.
Reviewed by: jhb
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21796
TLS 1.3 requires a few changes because 1.3 pretends to be 1.2
with a record type of application data. The "real" record type is
then included at the end of the user-supplied plaintext
data. This required adding a field to the mbuf_ext_pgs struct to
save the record type, and passing the real record type to the
sw_encrypt() ktls backend functions.
Reviewed by: jhb, hselasky
Sponsored by: Netflix
Differential Revision: D21801
The current mechanism is bogus in several ways:
- the limit is a percentage of total entries added, which means negative
entries get evicted all the time even if there are plenty of resources
- evicting code is almost not concurrent, which makes it unable to
remove entries fast enough when doing something as simple as -j 104
buildworld
- there is no support for performing mass removal if necessary
Vast majority of negative entries never get any hits. Only evicting
them when the filesystem demands it results in a significant growth of
the namecache with almost no improvement in the hit ratio.
Sample result about afer 90 minutes of poudriere -j 104:
current no evict % of the original
numneg 219737 2013157 916
numneghits 266711906 263544562 98 [1]
[1] this may look funny but there is a certain dose of variation to the
build
The number was chosen as something which mostly eliminates spurious
evictions during lighter workloads but still keeps the total at bay.
Sponsored by: The FreeBSD Foundation
Continue protecting demotion from the hotlist and selection of the
target list with the ncneg_shrink_lock lock, but drop it before
relocking to zap the node.
While here count how many times we skipped shrinking due to the lock
being already taken.
Sponsored by: The FreeBSD Foundation
Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap(). MD trap_pfault() now almost
always uses the signal numbers and error codes calculated in
consistent MI way.
This introduces the protection fault compatibility sysctls to all
non-x86 architectures which did not have that bug, but apparently they
were already much more wrong in selecting delivered signals on
protection violations.
Change the delivered signal for accesses to mapped area after the
backing object was truncated. According to POSIX description for
mmap(2):
The system shall always zero-fill any partial page at the end of an
object. Further, the system shall never write out any modified
portions of the last page of an object which are beyond its
end. References within the address range starting at pa and
continuing for len bytes to whole pages following the end of an
object shall result in delivery of a SIGBUS signal.
An implementation may generate SIGBUS signals when a reference
would cause an error in the mapped object, such as out-of-space
condition.
Adjust according to the description, keeping the existing
compatibility code for SIGSEGV/SIGBUS on protection failures.
For situations where kernel cannot handle page fault due to resource
limit enforcement, SIGBUS with a new error code BUS_OBJERR is
delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on
amd64 due to protection key access violation.
vm_fault_hold() is renamed to vm_fault(). Fixed some nits in
trap_pfault()s like mis-interpreting Mach errors as errnos. Removed
unneeded truncations of the fault addresses reported by hardware.
PR: 211924
Reviewed by: alc
Discussed with: jilles, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21566
When a VFS option passed to nmount is present but NULL the kernel will
place an empty option in its internal list. This will have a NULL
pointer and a length of 0. When we come to read one of these the kernel
will try to load from the last address of virtual memory. This is
normally invalid so will fault resulting in a kernel panic.
Fix this by checking if the length is valid before dereferencing.
MFC after: 3 days
Sponsored by: DARPA, AFRL
exec_map_first_page() would unconditionally free an unbacked, invalid
page from the executable image. However, it is possible that the page
is wired, in which case it is incorrect to free the page, so check for
additional wirings first.
Reported by: syzkaller
Tested by: pho
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21767
Add an atomic shm rename operation, similar in spirit to a file
rename. Atomically unlink an shm from a source path and link it to a
destination path. If an existing shm is linked at the destination
path, unlink it as part of the same atomic operation. The caller needs
the same permissions as shm_unlink to the shm being renamed, and the
same permissions for the shm at the destination which is being
unlinked, if it exists. If those fail, EACCES is returned, as with the
other shm_* syscalls.
truss support is included; audit support will come later.
This commit includes only the implementation; the sysent-generated
bits will come in a follow-on commit.
Submitted by: Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: jilles (earlier revision)
Reviewed by: brueffer (manpages, earlier revision)
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21423
We have two ways to check if kenv variable exists - either we check return
value from TUNABLE_INT_FETCH, or we pre-initialize the variable and check
if this value did change. In terminal_init() it is more convinient to
use pre-initialized variables.
Problem was revealed by older loader.efi, which did not set teken.* variables.
Reported by: tuexen
Use of CPU_FFS() to implement CPUSET_FOREACH() allows to save up to ~0.5%
of CPU time on 72-thread SMT system doing 80K IOPS to NVMe from one thread.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
I've noticed that I missed intr check at one more SCHED_AFFINITY(),
so instead of adding one more branching I prefer to remove few.
Profiler shows the function CPU time reduction from 0.24% to 0.16%.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
When RFSPAWN is passed, rfork exhibits vfork(2) semantics but also resets
signal handlers in the child during creation to avoid a point of corruption
of parent state from the child.
This flag will be used by posix_spawn(3) to handle potential signal issues.
Reviewed by: jilles, kib
Differential Revision: https://reviews.freebsd.org/D19058
properly nested and warns about recursive entrances. Unlike with locks,
there is nothing fundamentally wrong with such use, the intent of tracer
is to help to review complex epoch-protected code paths, and we mean the
network stack here.
Reviewed by: hselasky
Sponsored by: Netflix
Pull Request: https://reviews.freebsd.org/D21610
shm_open2 allows a little more flexibility than the original shm_open.
shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate
shmflag argument that can be expanded later. Currently the only shmflag is
to allow file sealing on the returned fd.
shm_open and memfd_create will both be implemented in libc to use this new
syscall.
__FreeBSD_version is bumped to indicate the presence.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D21393
Now that flags may be set on posixshm, add an argument to kern_shm_open()
for the initial seals. To maintain past behavior where callers of
shm_open(2) are guaranteed to not have any seals applied to the fd they're
given, apply F_SEAL_SEAL for existing callers of kern_shm_open. A special
flag could be opened later for shm_open(2) to indicate that sealing should
be allowed.
We currently restrict initial seals to F_SEAL_SEAL. We cannot error out if
F_SEAL_SEAL is re-applied, as this would easily break shm_open() twice to a
shmfd that already existed. A note's been added about the assumptions we've
made here as a hint towards anyone wanting to allow other seals to be
applied at creation.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D21392
File sealing applies protections against certain actions
(currently: write, growth, shrink) at the inode level. New fileops are added
to accommodate seals - EINVAL is returned by fcntl(2) if they are not
implemented.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D21391
Check for teken.fg_color and teken.bg_color and prepare the color
attributes accordingly.
When white background is used, make it light to improve visibility.
When black background is used, make kernel messages light.
Doing some tests with very high interrupt rates I've noticed that one of
conditions I added in r232207 to make interrupt threads in most cases
run on local CPU never worked as expected (worked only if previous time
it was executed on some other CPU, that is quite opposite). It caused
additional CPU usage to run full CPU search and could schedule interrupt
threads to some other CPU.
This patch removes that code and instead reuses existing non-interrupt
code path with some tweaks for interrupt case:
- On SMT systems, if current thread is idle, don't look on other threads.
Even if they are busy, it may take more time to do fill search and bounce
the interrupt thread to other core then execute it locally, even sharing
CPU resources. It is other threads should migrate, not bound interrupts.
- Try hard to keep interrupt threads within LLC of their original CPU.
This improves scheduling cost and supposedly cache and memory locality.
On a test system with 72 threads doing 2.2M IOPS to NVMe this saves few
percents of CPU time while adding few percents to IOPS.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
is a completely separate TCP stack (tcp_bbr.ko) that will be built only if
you add the make options WITH_EXTRA_TCP_STACKS=1 and also include the option
TCPHPTS. You can also include the RATELIMIT option if you have a NIC interface that
supports hardware pacing, BBR understands how to use such a feature.
Note that this commit also adds in a general purpose time-filter which
allows you to have a min-filter or max-filter. A filter allows you to
have a low (or high) value for some period of time and degrade slowly
to another value has time passes. You can find out the details of
BBR by looking at the original paper at:
https://queue.acm.org/detail.cfm?id=3022184
or consult many other web resources you can find on the web
referenced by "BBR congestion control". It should be noted that
BBRv1 (which this is) does tend to unfairness in cases of small
buffered paths, and it will usually get less bandwidth in the case
of large BDP paths(when competing with new-reno or cubic flows). BBR
is still an active research area and we do plan on implementing V2
of BBR to see if it is an improvement over V1.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D21582
- track the total count of hot entries
- pre-read the lock when shrinking since it is typically already taken
- place the lock in its own cacheline
- shorten the hold time of hot lock list when zapping
Sponsored by: The FreeBSD Foundation
This is required for DPCPU and VNET data variable definitions to work when
KLDs are linked as DSOs. R_X86_64_RELATIVE relocations should not appear
in object files, so assert this in elf_relocaddr().
Reviewed by: kib
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21755
The two options are
* nocover/cover: Prevent/allow mounting over an existing root mountpoint.
E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local
is already a mountpoint.
* emptydir/noemptydir: Prevent/allow mounting on a non-empty directory.
E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail.
Neither of these options is intended to be a default, for historical and
compatibility reasons.
Reviewed by: allanjude, kib
Differential Revision: https://reviews.freebsd.org/D21458
by issuing delayed-writes (bdwrite()) until a non-sequential block
is written or the maximum cluster size is reached. At that point
it collects the delayed buffers together (using bread()) to write
them in a single operation. The assumption was that since we just
looked at them they will still be in memory so there is no need to
check for a read error from bread(). Very occationally (apparently
every 10-hours or so when being pounded by Peter Holm's tests)
this assumption is wrong.
The fix is to check for errors from bread() and fail the cluster
write thus falling back to the default individual flushing of any
still dirty buffers.
Reported by: Peter Holm and Chuck Silvers
Reviewed by: kib
MFC after: 3 days
This one is very hard to run into. If the filesystem is being unmounted or
the mount point is freed the vfs_op_thread_enter will fail. For it to
succeed the mount point itself would have to be reallocated in the time
window between the initial read and the attempt to enter.
Sponsored by: The FreeBSD Foundation
The breakage was added after all the testing and the testing which followed
was not sufficient to find it.
Reported by: pho
Sponsored by: The FreeBSD Foundation
There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.
Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before: 852393 ops/s
after: 76682077 ops/s
Reviewed by: kib, jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21637
New primitive is introduced to denote sections can operate locklessly
on aspects of struct mount, but which can also be disabled if necessary.
This provides an opportunity to start scaling common case modifications
while providing stable state of the struct when facing unmount, write
suspendion or other events.
mnt_ref is the first counter to start being managed in this manner with
the intent to make it per-cpu.
Reviewed by: kib, jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21425
A future change to posixshm to add file sealing (in DIFF_21391[0] and child)
will move locking out of shm_dotruncate as kern_shm_open() will require the
lock to be held across the dotruncate until the seal is actually applied.
For this, the cookie is passed into shm_dotruncate_locked which asserts
RCA_WLOCKED.
[0] Name changed to protect the innocent, hopefully, from getting autoclosed
due to this reference...
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D21628
1. If we release the last usecount we take ownership of the hold count, which
means the vnode will remain allocated until we vdrop it.
2. If someone else vrefs they will find no usecount and will proceed to add
their own hold count.
3. No code has a problem with v_usecount transitioning to 0 without the
interlock
These facts combined mean we can fetchadd instead of having a cmpset loop.
Reviewed by: kib (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21528
cpuset_getroot() is guaranteed to return a non-NULL pointer.
Reported by: Mark Millard <marklmi@yahoo.com>
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Refcount waiting will set some flag bits in the refcount value.
Make sure these bits get cleared by using the REFCOUNT_COUNT()
macro to obtain the actual refcount.
Differential Revision: https://reviews.freebsd.org/D21620
Reviewed by: kib@, markj@
MFC after: 1 week
Sponsored by: Mellanox Technologies
As I like to forget: static kenv var formatting is actually such that an
empty environment would be double null bytes. We should make sure that a
non-zero buffer has at least enough for this, though most of the current
usage is with a 4k buffer.
Garbage in the passed-in buffer can cause problems if any attempts to read
the kenv are inadvertently made between init_static_kenv and the first
kern_setenv -- assuming there is one.
This is cheap and easy, so do it. This also helps rule out some class of
bugs as one tries to debug; tunables fetch from the static environment up
until SI_SUB_KMEM + 1, and many of these buffers are global ~4k buffers that
rely on BSS clearing while others just grab a page of free memory and use it
(e.g. xen).
Setting the B_INVALONERR flag before a synchronous write causes the buf
cache to forcibly invalidate contents if the write fails (BIO_ERROR).
This is intended to be used to allow layers above the buffer cache to make
more informed decisions about when discarding dirty buffers without
successful write is acceptable.
As a proof of concept, use in msdosfs to handle failures to mark the on-disk
'dirty' bit during rw mount or ro->rw update.
Extending this to other filesystems is left as future work.
PR: 210316
Reviewed by: kib (with objections)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21539
Due to lock ordering issues (bucket lock held, vnode locks wanted) the code
starts with trylocking which in face of contention often fails. Prior to
the change it would loop back with a possible yield.
Instead note we know what locks are needed and can take them in the right
order, avoiding retries. Then we can safely re-lookup and see if the entry
we are looking for is still there.
On a 104-way box poudriere would result in constant retries during an 11h
run as seen in the vfs.cache.zap_and_exit_bucket_fail counter.
before: 408866592
after : 0
However, a new stat reports:
vfs.cache.zap_and_exit_bucket_relock_success: 32638
Note this is only a bandaid over current design issues.
Tested by: pho
Sponsored by: The FreeBSD Foundation
It used to be mp_ncpus * 64, but this gives unnecessarily big values for small
machines and at the same time constraints bigger ones. In particular this helps
on a 104-way box for which the count is now doubled.
While here make cache_purgevfs less likely. Currently it is not efficient in
face of contention due to lock ordering issues. These are fixable but not worth
it at the moment.
Sponsored by: The FreeBSD Foundation
- VM_ALLOC_NOCREAT will grab without creating a page.
- vm_page_grab_valid() will grab and page in if necessary.
- vm_page_busy_acquire() automates some busy acquire loops.
Discussed with: alc, kib, markj
Tested by: pho (part of larger branch)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21546
races with page busy state. The object lock is still used as an interlock
to ensure that the identity stays valid. Most callers should use
vm_page_sleep_if_busy() to handle the locking particulars.
Reviewed by: alc, kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21255
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
Otherwise, we might left the boolean set, which would affect cleanup
after an error on interpreter activation.
Reviewed by: markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21560
These were fully neutered in r177676 (2008), but not removed at the time for
unclear reasons. They're totally dead code, so go ahead and yank them now.
No functional change.
There are 2 problems:
- it introduces a funny bug where it can end up trylocking the same vnode [1]
- it exposes a pre-existing softdep deadlock [2]
Both are easier to run into that the bug which got fixed, so revert until
a complete solution is worked out.
Reported by: cy [1], pho [2]
Sponsored by: The FreeBSD Foundation
Currently the code only bumps holdcnt and clears the VI_FREE flag, not
performing actual vhold. Since the vnode is still visible elsewhere, a
potential new user can find it and incorrectly assume it is properly held.
Use vholdl instead to correctly hold the vnode. Another place recycling
(vlrureclaim) does this already.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21522
r351650 switched posixshm to using OBJT_SWAP for shm_object
r351795 added support to the swap_pager for tracking writeable mappings
Take advantage of this and start tracking writeable mappings; fd sealing
will use this to reject a seal on writing with EBUSY if any such mapping
exist.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D21456
Currently writemapping accounting is only done for vnode_pager which does
some accounting on the underlying vnode.
Extend this to allow accounting to be possible for any of the pager types.
New pageops are added to update/release writecount that need to be
implemented for any pager wishing to do said accounting, and we implement
these methods now for both vnode_pager (unchanged) and swap_pager.
The primary motivation for this is to allow other systems with OBJT_SWAP
objects to check if their objects have any write mappings and reject
operations with EBUSY if so. posixshm will be the first to do so in order to
reject adding write seals to the shmfd if any writable mappings exist.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D21456
It allows a process to request that stack gap was not applied to its
stacks, retroactively. Also it is possible to control the gaps in the
process after exec.
PR: 239894
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21352
vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting
freed and usecount to denote it is actively used.
Previously all operations bumping usecount would also bump holdcnt, which is
not necessary. We can detect if usecount is already > 1 (in which case holdcnt
is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves
on atomic ops.
Reviewed by: kib
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21471
Previously userspace would issue one syscall to resolve the sysctl and then
another one to actually use it. Do it all in one trip.
Fallback is provided in case newer libc happens to be running on an older
kernel.
Submitted by: Pawel Biernacki
Reported by: kib, brooks
Differential Revision: https://reviews.freebsd.org/D17282
The initially read mount point can already be NULL.
Reported by: markj
Fixes: r351656 ("vfs: stop refing freed mount points in vop_stdgetwritemount")
Sponsored by: The FreeBSD Foundation
There is no correctness change here, but the procid lock is contended in
the fork path and taking it while holding proctree avoidably extends its
hold time.
Note that there are other ids which can end up getting cleared with the
lock.
Sponsored by: The FreeBSD Foundation
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure. This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.
With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets. Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only
items in excess of the estimate are reclaimed. Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.
Now, when under memory pressure, the page daemon will trim zones
rather than draining them. As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.
Reviewed by: jeff
Tested by: pho (an earlier version)
MFC after: 2 months
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16667
To permit larger values of MAXMEMDOM, which is currently 8 on amd64,
cpuset_setdomain(2) accepts a mask of size 256. In the kernel, domain
set masks are 64 bits wide, but can only represent a set of MAXMEMDOM
domains due to the use of the ds_order table.
Domain sets passed to cpuset_setdomain(2) are restricted to a subset
of their parent set, which is typically the root set, but before this
happens we modify the input set to exclude empty domains.
domainset_empty_vm() and other code which manipulates domain sets
expect the mask to be a subset of all_domains, so enforce that when
performing validation of cpuset_setdomain(2) parameters.
Reported and tested by: pho
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21477
The code used blindly ref based on an unsafely red address and then would
backpedal if necessary. This was safe in terms of memory access since
mounts are type-stable, but made for a potential a bug where the mount
was reused and had the count reset to 0 before this code decreased it.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21411
Future changes to posixshm will start tracking writeable mappings in order
to support file sealing. Tracking writeable mappings for an OBJT_DEFAULT
object is complicated as it may be swapped out and converted to an
OBJT_SWAP. One may generically add this tracking for vm_object, but this is
difficult to do without increasing memory footprint of vm_object and blowing
up memory usage by a significant amount.
On the other hand, the swap pager can be expanded to track writeable
mappings without increasing vm_object size. This change is currently in
D21456. Switch over to OBJT_SWAP in advance of the other changes to the
swap pager and posixshm.
Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode. Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.
Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM(). This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation. Remove code from
vnode_create_vobject() to handle races with the parallel destruction.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21412
vnode usecount drops to 0 all the time (e.g. for directories during path lookup).
When that happens the kernel would always lock the exclusive lock for the vnode
in order to call vinactive(). This blocks other threads who want to use the vnode
for looukp.
vinactive is very rarely needed and can be tested for without the vnode lock held.
This patch gives filesytems an opportunity to do it, sample total wait time for
tmpfs over 500 minutes of poudriere -j 104:
before: 557563641706 (lockmgr:tmpfs)
after: 46309603301 (lockmgr:tmpfs)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21371
It is not needed by anything in the kernel and it slightly drives up contention
on both proctree and allproc locks.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21447
uiomove_object_page() and exec_map_first_page() would previously wire a
page after having grabbed it. Ask vm_page_grab() to perform the wiring
instead: this removes some redundant code, and is cheaper in the case
where the requested page is not resident since the page allocator can be
asked to initialize the page as wired, whereas a separate vm_page_wire()
call requires the page lock.
In vm_imgact_hold_page(), use vm_page_unwire_noq() instead of
vm_page_unwire(PQ_NONE). The latter ensures that the page is dequeued
before returning, but this is unnecessary since vm_page_free() will
trigger a batched dequeue of the page.
Reviewed by: alc, kib
Tested by: pho (part of a larger patch)
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21440
This field was not initialized in the !KERN_TLS case triggering an
assertion failure when using sendfile(2).
Reported by: pho, asomers
Sponsored by: Netflix
The plan is to drop the flags argument. There is also a temporary bug
now that nullfs ignores the flag.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21252
- Don't add 1 to the result of DOMAINSET_FLS.
- Do not modify domainsets containing only empty domains.
- Always flatten a _PREFER policy to _ROUNDROBIN if the preferred
domain is empty. Previously we were doing this only when ds_cnt > 1.
These bugs could cause hangs during boot if a VM domain is empty.
Tested by: hselasky
Reviewed by: hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21420
The function' interface assumes that the lower vnode is passed and
returned locked always.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
KTLS adds support for in-kernel framing and encryption of Transport
Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports
offload of TLS for transmitted data. Key negotation must still be
performed in userland. Once completed, transmit session keys for a
connection are provided to the kernel via a new TCP_TXTLS_ENABLE
socket option. All subsequent data transmitted on the socket is
placed into TLS frames and encrypted using the supplied keys.
Any data written to a KTLS-enabled socket via write(2), aio_write(2),
or sendfile(2) is assumed to be application data and is encoded in TLS
frames with an application data type. Individual records can be sent
with a custom type (e.g. handshake messages) via sendmsg(2) with a new
control message (TLS_SET_RECORD_TYPE) specifying the record type.
At present, rekeying is not supported though the in-kernel framework
should support rekeying.
KTLS makes use of the recently added unmapped mbufs to store TLS
frames in the socket buffer. Each TLS frame is described by a single
ext_pgs mbuf. The ext_pgs structure contains the header of the TLS
record (and trailer for encrypted records) as well as references to
the associated TLS session.
KTLS supports two primary methods of encrypting TLS frames: software
TLS and ifnet TLS.
Software TLS marks mbufs holding socket data as not ready via
M_NOTREADY similar to sendfile(2) when TLS framing information is
added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then
called to schedule TLS frames for encryption. In the case of
sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving
the mbufs marked M_NOTREADY until encryption is completed. For other
writes (vn_sendfile when pages are available, write(2), etc.), the
PRUS_NOTREADY is set when invoking pru_send() along with invoking
ktls_enqueue().
A pool of worker threads (the "KTLS" kernel process) encrypts TLS
frames queued via ktls_enqueue(). Each TLS frame is temporarily
mapped using the direct map and passed to a software encryption
backend to perform the actual encryption.
(Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if
someone wished to make this work on architectures without a direct
map.)
KTLS supports pluggable software encryption backends. Internally,
Netflix uses proprietary pure-software backends. This commit includes
a simple backend in a new ktls_ocf.ko module that uses the kernel's
OpenCrypto framework to provide AES-GCM encryption of TLS frames. As
a result, software TLS is now a bit of a misnomer as it can make use
of hardware crypto accelerators.
Once software encryption has finished, the TLS frame mbufs are marked
ready via pru_ready(). At this point, the encrypted data appears as
regular payload to the TCP stack stored in unmapped mbufs.
ifnet TLS permits a NIC to offload the TLS encryption and TCP
segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS)
is allocated on the interface a socket is routed over and associated
with a TLS session. TLS records for a TLS session using ifnet TLS are
not marked M_NOTREADY but are passed down the stack unencrypted. The
ip_output_send() and ip6_output_send() helper functions that apply
send tags to outbound IP packets verify that the send tag of the TLS
record matches the outbound interface. If so, the packet is tagged
with the TLS send tag and sent to the interface. The NIC device
driver must recognize packets with the TLS send tag and schedule them
for TLS encryption and TCP segmentation. If the the outbound
interface does not match the interface in the TLS send tag, the packet
is dropped. In addition, a task is scheduled to refresh the TLS send
tag for the TLS session. If a new TLS send tag cannot be allocated,
the connection is dropped. If a new TLS send tag is allocated,
however, subsequent packets will be tagged with the correct TLS send
tag. (This latter case has been tested by configuring both ports of a
Chelsio T6 in a lagg and failing over from one port to another. As
the connections migrated to the new port, new TLS send tags were
allocated for the new port and connections resumed without being
dropped.)
ifnet TLS can be enabled and disabled on supported network interfaces
via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported
across both vlan devices and lagg interfaces using failover, lacp with
flowid enabled, or lacp with flowid enabled.
Applications may request the current KTLS mode of a connection via a
new TCP_TXTLS_MODE socket option. They can also use this socket
option to toggle between software and ifnet TLS modes.
In addition, a testing tool is available in tools/tools/switch_tls.
This is modeled on tcpdrop and uses similar syntax. However, instead
of dropping connections, -s is used to force KTLS connections to
switch to software TLS and -i is used to switch to ifnet TLS.
Various sysctls and counters are available under the kern.ipc.tls
sysctl node. The kern.ipc.tls.enable node must be set to true to
enable KTLS (it is off by default). The use of unmapped mbufs must
also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS.
KTLS is enabled via the KERN_TLS kernel option.
This patch is the culmination of years of work by several folks
including Scott Long and Randall Stewart for the original design and
implementation; Drew Gallatin for several optimizations including the
use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records
awaiting software encryption, and pluggable software crypto backends;
and John Baldwin for modifications to support hardware TLS offload.
Reviewed by: gallatin, hselasky, rrs
Obtained from: Netflix
Sponsored by: Netflix, Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21277
After all the changes, its dynamic scope is same as for MNTK_UNMOUNT,
but to allow the syncer vnode to be re-installed on unmount failure.
But the case of syncer was already handled by using the VV_FORCEINSMQ
flag for quite some time.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The Linux lockdep API assumes LA_LOCKED semantic in lockdep_assert_held(),
meaning that either a shared lock or write lock is Ok. On the other hand,
the timeout code uses lc_assert() with LA_XLOCKED, and we need both to
work.
For mutexes, because they can not be shared (this is unique among all lock
classes, and it is unlikely that we would add new lock class anytime soon),
it is easier to simply extend mtx_assert to handle LA_LOCKED there, despite
the change itself can be viewed as a slight abstraction violation.
Reviewed by: mjg, cem, jhb
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D21362
Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE)
on a file in a file system that does not have its own VOP_IOCTL(), the
lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since
ENOTTY is not listed as an error return by either the lseek(2) man page
nor the POSIX draft for lseek(2).
This was discussed on freebsd-current@ here:
http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA
This trivial patch maps ENOTTY to EINVAL for lseek(SEEK_DATA/SEEK_HOLE).
Reviewed by: markj
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D21300
They follow the conventions set by rw and sx lock probes. There is
an additional lockstat:::lockmgr-disown probe.
Update lockstat(1) to report on contention and hold events for
lockmgr locks. Document the new probes in dtrace_lockstat.4, and
deduplicate some of the existing probe descriptions.
Reviewed by: mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21355
The original code came from a desire to minimize the number of updates
to v_wire_count, which prior to r329187 was updated using atomics.
However, there is no significant benefit to batching today, so simply
allocate pages using VM_ALLOC_WIRED and rely on system accounting.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D21323
With r349546, it is a responsibility of the writer to clear PIPE_DIRECTW
after pinned data has been read. In particular, once a reader has
drained this data, there is a small window where the pipe is empty but
PIPE_DIRECTW is set. pipe_poll() was using the presence of PIPE_DIRECTW
to determine whether to return POLLIN, so in this window it would
claim that data was available to read when this was not the case.
Fix this by modifying several checks for PIPE_DIRECTW to instead look
at the number of residual bytes in data pinned by a direct writer. In
some cases we really do want to check for PIPE_DIRECTW, since the
presence of this flag indicates that any attempt to write to the pipe
will block on the existing direct writer.
Bisected and test case provided by: mav
Tested by: pho
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21333
fs-specific part of vfs_statfs routines only fill in small portion of the
structure. Previous code was always copying everything at a higher layer to
acoomodate it and this patch does the same.
'df' (no arguments) worked fine because the caller uses mnt_stat itself as the
target buffer, making all the copying a no-op for its own case.
'df /' and similar use a different consumer which passes its own buffer and
this is where you can run into trouble.
Reported by: cy
Fixes: r351193
Sponsored by: The FreeBSD Foundation
This is similar to checks for td_sx_slocks and td_rw_rlocks.
Although td_lk_slocks is an implementation detail, it still makes sense
to validate it.
MFC after: 1 week
Sponsored by: Panzura
Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE)
on a file in a file system that does not have its own VOP_IOCTL(), the
lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since
ENOTTY is not listed as an error return by either the lseek(2) man page
nor the POSIX draft for lseek(2).
A discussion on freebsd-current@ seemed to indicate that implementing
a trivial algorithm that returns the offset argument for FIOSEEKDATA and
returns the file's size for FIOSEEKHOLE was the preferred fix.
http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA
The Linux kernel appears to implement this trivial algorithm as well.
This patch adds a vop_stdioctl() that implements this trivial algorithm.
It returns errors consistent with vn_bmap_seekhole() and, as such, will
still return ENOTTY for non-regular files.
I have proposed a separate patch that maps errors not described by the
lseek(2) man page nor POSIX draft to EINVAL. This patch is under separate
review.
Reviewed by: kib
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D21299
Suppose that a binary was executed from tmpfs mount, and the text
vnode was reclaimed while the binary was still running. It is
possible during even the normal operations since tmpfs vnode'
vm_object has swap type, and no references on the vnode is held. Also
assume that the text vnode was revived for some reason. Then, on the
process exit or exec, unmapping of the text mapping tries to remove
the text reference from the vnode, but since it went from
recycle/instantiation cycle, there is no reference kept, and assertion
in VOP_UNSET_TEXT_CHECKED() triggers.
Fix this by keeping a use reference on the tmpfs vnode for each exec
reference. This prevents the vnode reclamation while executable map
entry is active.
Do it by adding per-mount flag MNTK_TEXT_REFS that directs
vop_stdset_text() to add use ref on first vnode text use, and
per-vnode VI_TEXT_REF flag, to record the need on unref in
vop_stdunset_text() on last vnode text use going away. Set
MNTK_TEXT_REFS for tmpfs mounts.
Reported by: bdrewery
Tested by: sbruno, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Require the vnode to be locked for the VOP_UNSET_TEXT() call. This
will be used by the following bug fix for a tmpfs issue.
Tested by: sbruno, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The struct is already populated on each mount (and remount). Fields are either
constant or not used by filesystem in the first place.
Some infrequently used functions use it to avoid having to allocate a new buffer
and are left alone.
The current code results in an avoidable copying single-threaded and significant
cache line bouncing multithreaded
While here deduplicate initial filling of the struct.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21317
- move allproc lock into the func, it is of no use prior to it
- the code would lock p1 and p2 while holding allproc to partially
construct it after it gets added to the list. instead we can do the
work prior to adding anything.
- protect lastpid with procid_lock
As a side effect we do less work with allproc held.
Sponsored by: The FreeBSD Foundation
The limit is almost never reached. Do the check only on failure to see if
we can override it.
No change in user-visible behavior.
Sponsored by: The FreeBSD Foundation
Code doing this is commented with a claim that these IDs are occupied by
daemons, but that's demonstrably false. To an extent the range is used by init
and kernel processes (and on sufficiently big machines it indeed is fully
populated).
On a sample box 40-way box the highest id in the range is 63. On a different one
it is 23. Just use the range.
Sponsored by: The FreeBSD Foundation
doing so adds more flexibility with less redundant code.
Reviewed by: jhb, markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21250
When the byte range for copy_file_range(2) doesn't go to EOF on the
output file and there is a hole in the input file, a hole must be
"punched" in the output file. This is done by writing a block of bytes
all set to 0.
Without this patch, the write is done unconditionally which means that,
if the output file already has a hole in that byte range, a unneeded data block
of all 0 bytes would be allocated.
This patch adds code to check for a hole in the output file, so that it can
skip doing the write if there is already a hole in that byte range of
the output file. This avoids unnecessary allocation of blocks to the
output file.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D21155
ensure that the subsequent mbuf contains the remainder of the bytes
the caller sought. If this is not the case, fall through to the code
which gathers the bytes in a new mbuf.
This fixes a bug where m_pulldown() could fail to gather all the desired
bytes into consecutive memory.
PR: 238787
Reported by: A reddit user
Discussed with: emaste
Obtained from: NetBSD
MFC after: 3 days
An earlier version of the patch had code that set "error" between
line#s 2797-2799. When that code was moved, the second check for "error != 0"
could never be true and the check became harmless cruft.
This patch removes the cruft, mainly to make Coverity happy.
Reported by: asomers, cem
Since the VOP_IOCTL(FIOSEEKDATA/FIOSEEKHOLE) calls are done with the
vnode unlocked, it is possible for another thread to do:
- truncate(), lseek(), write()
between the two calls and create a hole where FIOSEEKDATA returned the
start of data.
For this case, VOP_IOCTL(FIOSEEKHOLE) will return the same offset for
the hole location. This could result in an infinite loop in the copy
code, since copylen is set to 0 and the copy doesn't advance.
Usually, this race is avoided because of the use of rangelocks, but the
NFS server does not do range locking and could do a sequence like the
above to create the hole.
This patch checks for this case and makes the hole search fail, to avoid
the infinite loop.
At this time, it is an open question as to whether or not the NFS server
should do range locking to avoid this race.
KDB is standard and the kdb_active variable is always available. So,
de-conditionalize inclusion of sys/kdb.h in kern_sysctl.c.
Reported by: Michael Butler <imb AT protected-networks.net>
X-MFC-With: r350713
Sponsored by: Dell EMC Isilon
Implement `sysctl` in `ddb` by overriding `SYSCTL_OUT`. When handling the
req, we install custom ddb in/out handlers. The out handler prints straight
to the debugger, while the in handler ignores all input. This is intended
to allow us to print just about any sysctl.
There is a known issue when used from ddb(4) entered via 'sysctl
debug.kdb.enter=1'. The DDB mode does not quite prevent all lock
interactions, and it is possible for the recursive Giant lock to be unlocked
when the ddb(4) 'sysctl' command is used. This may result in a panic on
return from ddb(4) via 'c' (continue). Obviously, this is not a problem
when debugging already-paniced systems.
Submitted by: Travis Lane (formerly: <travis.lane AT isilon.com>)
Reviewed by: vangyzen (earlier version), Don Morris <dgmorris AT earthlink.net>
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20219
The API is used to gracefully terminate text line(s) with a single \n. If
the formatted buffer was empty or already ended in \n, it is unmodified.
Otherwise, a newline character is appended to it. The API, like other
sbuf-modifying routines, is only valid while the sbuf is not FINISHED.
Reviewed by: rlibby
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21030
Code flow was somewhat difficult to read due to the combination of
multiple return sites and the 4x possible dynamic constructions of an
sbuf. (Future consideration: do we need all 4?) Refactored slightly to
improve legibility.
No functional change.
Sponsored by: Dell EMC Isilon