device_printf does multiple calls to printf allowing other console messages to
be inserted between the device name, and the rest of the message. This change
uses sbuf to compose to two into a single buffer, and prints it all at once.
It exposes an sbuf drain function (drain-to-printf) for common use.
Update documentation to match; some unit tests included.
Submitted by: jmg
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D16690
Multiple tools use @generated to identify generated files (for example,
in a review Phabricator will by default hide diffs in generated files).
Use the @generated tag in makesyscalls.sh as we've done for other
generated files.
Reviewed by: cem
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20183
resume where the last search left off. Suppose that there are no free
blocks of size 32, but plenty of size 16. If we repeatedly request
size 32 blocks, fail, and retry with size 16 blocks, then the failures
all reset the cursor to the beginning of memory, making the 16 block
allocation use a first fit, rather than next fit, strategy.
This change has blist_alloc make a copy of the cursor for its own
decision making, and only updates the real blist cursor after a
successful allocation, making those 16 block searches behave like
next-fit searches.
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20177
Allow users to specify multiple dump configurations in a prioritized list.
This enables fallback to secondary device(s) if primary dump fails. E.g.,
one might configure a preference for netdump, but fallback to disk dump as a
second choice if netdump is unavailable.
This change does not list-ify netdump configuration, which is tracked
separately from ordinary disk dumps internally; only one netdump
configuration can be made at a time, for now. It also does not implement
IPv6 netdump.
savecore(8) is already capable of scanning and iterating multiple devices
from /etc/fstab or passed on the command line.
This change doesn't update the rc or loader variables 'dumpdev' in any way;
it can still be set to configure a single dump device, and rc.d/savecore
still uses it as a single device. Only dumpon(8) is updated to be able to
configure the more complicated configurations for now.
As part of revving the ABI, unify netdump and disk dump configuration ioctl
/ structure, and leave room for ipv6 netdump as a future possibility.
Backwards-compatibility ioctls are added to smooth ABI transition,
especially for developers who may not keep kernel and userspace perfectly
synced.
Reviewed by: markj, scottl (earlier version)
Relnotes: maybe
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19996
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.
The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount. To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal
The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change. vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP. vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.
nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.
On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.
Reviewed by: markj, trasz
Tested by: mjg, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D19923
We unlock the vnode around malloc(M_WAITOK), to make it possible for
pagedaemon to flush vnode pages for us. Instead of doing it
unconditionally, first try M_NOWAIT allocation, which typically
succeed. Only on failure, unlock the vnode and retry with M_WAITOK.
Reviewed by: markj, trasz
Tested by: mjg, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19923
IPI_STOP is used after panic or when ddb is entered manually. MONITOR/
MWAIT allows CPUs that support the feature to sleep in a low power way
instead of spinning. Something similar is already used at idle.
It is perhaps especially useful in oversubscribed VM environments, and is
safe to use even if the panic/ddb thread is not the BSP. (Except in the
presence of MWAIT errata, which are detected automatically on platforms with
known wakeup problems.)
It can be tuned/sysctled with "machdep.stop_mwait," which defaults to 0
(off). This commit also introduces the tunable
"machdep.mwait_cpustop_broken," which defaults to 0, unless the CPU has
known errata, but may be set to "1" in loader.conf to signal that mwait
wakeup is broken on CPUs FreeBSD does not yet know about.
Unfortunately, Bhyve doesn't yet support MONITOR extensions, so this doesn't
help bhyve hypervisors running FreeBSD guests.
Submitted by: Anton Rang <rang AT acm.org> (earlier version)
Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20135
These predicates are vestigal and cannot be true today. For example,
idle threads are not allowed to acquire locks.
Also cache curthread in breada().
No functional change intended.
Reviewed by: kib, mckusick
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20066
Contrary to the comments, it was never used by core dumps or
debuggers. Instead, it used to hold the signal code of a pending
signal, but that was replaced by the 'ksi_code' member of ksiginfo_t
when signal information was reworked in 7.0.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D20047
Drivers can now pass up numa domain information via the
mbuf numa domain field. This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.
Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.
Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20028
This is a stopgap measure to unbreak installer/VM/embedded boot issues
introduced (or at least exposed by) in r346250.
Add the new tunable, "security.stack_protect.permit_nonrandom_cookies," in
order to continue boot with insecure non-random stack cookies if the random
device is unavailable.
For now, enable it by default. This is NOT safe. It will be disabled by
default in a future revision.
There is follow-on work planned to use fast random sources (e.g., RDRAND on
x86 and DARN on Power) to seed when the early entropy file cannot be
provided, for whatever reason. Please see D19928.
Some better hacks may be used to make the non-random __stack_chk_guard
slightly less predictable (from delphij@ and mjg@); those suggestions are
left for a future revision. I think it may also be plausible to move stack
guard initialization far later in the boot process; potentially it could be
moved all the way to just before userspace is started.
Reported by: many
Reviewed by: delphij, emaste, imp (all w/ caveat: this is a stopgap fix)
Security: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19927
r176215 corrected readlink(2)'s return type and the type of the last
argument. readlink(2) was introduced in r177788 after being developed
as part of Google Summer of Code 2007; it appears to have inherited the
wrong return type.
Man pages and header files were already ssize_t; update syscalls.master
to match.
PR: 197915
Submitted by: Henning Petersen <henning.petersen@t-online.de>
MFC after: 2 weeks
read_random() is/was used, mostly without error checking, in a lot of
very sensitive places in the kernel -- including seeding the widely used
arc4random(9).
Most uses, especially arc4random(9), should block until the device is seeded
rather than proceeding with a bogus or empty seed. I did not spy any
obvious kernel consumers where blocking would be inappropriate (in the
sense that lack of entropy would be ok -- I did not investigate locking
angle thoroughly). In many instances, arc4random_buf(9) or that family
of APIs would be more appropriate anyway; that work was done in r345865.
A minor cleanup was made to the implementation of the READ_RANDOM function:
instead of using a variable-length array on the stack to temporarily store
all full random blocks sufficient to satisfy the requested 'len', only store
a single block on the stack. This has some benefit in terms of reducing
stack usage, reducing memcpy overhead and reducing devrandom output leakage
via the stack. Additionally, the stack block is now safely zeroed if it was
used.
One caveat of this change is that the kern.arandom sysctl no longer returns
zero bytes immediately if the random device is not seeded. This means that
FreeBSD-specific userspace applications which attempted to handle an
unseeded random device may be broken by this change. If such behavior is
needed, it can be replaced by the more portable getrandom(2) GRND_NONBLOCK
option.
On any typical FreeBSD system, entropy is persisted on read/write media and
used to seed the random device very early in boot, and blocking is never a
problem.
This change primarily impacts the behavior of /dev/random on embedded
systems with read-only media that do not configure "nodevice random". We
toggle the default from 'charge on blindly with no entropy' to 'block
indefinitely.' This default is safer, but may cause frustration. Embedded
system designers using FreeBSD have several options. The most obvious is to
plan to have a small writable NVRAM or NAND to persist entropy, like larger
systems. Early entropy can be fed from any loader, or by writing directly
to /dev/random during boot. Some embedded SoCs now provide a fast hardware
entropy source; this would also work for quickly seeding Fortuna. A 3rd
option would be creating an embedded-specific, more simplistic random
module, like that designed by DJB in [1] (this design still requires a small
rewritable media for forward secrecy). Finally, the least preferred option
might be "nodevice random", although I plan to remove this in a subsequent
revision.
To help developers emulate the behavior of these embedded systems on
ordinary workstations, the tunable kern.random.block_seeded_status was
added. When set to 1, it blocks the random device.
I attempted to document this change in random.4 and random.9 and ran into a
bunch of out-of-date or irrelevant or inaccurate content and ended up
rototilling those documents more than I intended to. Sorry. I think
they're in a better state now.
PR: 230875
Reviewed by: delphij, markm (earlier version)
Approved by: secteam(delphij), devrandom(markm)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D19744
r340744 broke the NFSv4 client, because it replaced pfind_locked() with a
call to pfind(), since pfind() acquires the sx lock for the pid hash and
the NFSv4 already holds a mutex when it does the call.
The patch fixes the problem by recreating a pfind_any_locked() and adding the
functions pidhash_slockall() and pidhash_sunlockall to acquire/release
all of the pid hash locks.
These functions are then used by the NFSv4 client instead of acquiring
the allproc_lock and calling pfind().
Reviewed by: kib, mjg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19887
cache_lookup's documentation got dislocated by r324378. Relocate and expand
it.
Reviewed by: jhb, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Provide a convenience function to avoid the hack with filling fake
struct vop_fsync_args and then calling vop_stdfsync().
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
descriptor, not the file descriptor. The file descriptor is used only for
verification so do not expect any additional capabilities on it.
Reported by: antoine
Tested by: antoine
Discussed with: kib, emaste, bapt
Sponsored by: Fudo Security
Such processes will be reparented to the reaper when the current
parent is done with them (i.e., ptrace detached), so p_oppid must be
updated accordingly.
Add a regression test to exercise this code path. Previously it
would not be possible to reap an orphan with a stale oppid.
Reviewed by: kib, mjg
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19825
'tos' is an index into an array and never holds a negative value. Correct
its signedness to match PCTRIE_LIMIT, which it is compared to in assertions.
No functional change (kills a warning).
the file associated with the given file descriptor.
Reviewed by: kib, asomers
Reviewed by: cem, jilles, brooks (they reviewed previous version)
Discussed with: pjd, and many others
Differential Revision: https://reviews.freebsd.org/D14567
It performs BUS_RESET_CHILD() on the parental bus and the specified
device.
Reviewed by: imp (previous version), jhb (previous version)
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19646
The methods BUS_RESET_PREPARE(), BUS_RESET(), and BUS_RESET_POST()
should be implemented by bus which can provide reset to a device. The
methods are described in inline doxygen comments.
Code only provides BUS_RESET_PREPARE() and BUS_RESET_POST() helpers
instead of default implementation, because actual bus needs to handle
device state around reset, while helpers provide the other half of
typical prepare/post code.
Reviewed by: imp (previous version), jhb (previous version)
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19646
Otherwise we might miss the last iteration where EOF appears below
unaligned noff.
Reported and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19811
In particular, elf32 FreeBSD binaries were not executed on LP64 hosts.
The interp_name_len value should account for the nul terminator. This
is needed for strncmp()s in brand checking code to work.
Reported by: andreast
Sponsored by: The FreeBSD Foundation
MFC after: 12 days (together with r345661)
In most cases kernel.bootfile is populated from the information
provided by loader(8). There are certain scenarios when loader
is not available, for instance when kernel is loaded by u-boot
or some other BootROM directly. In this case the default value
"/kernel" points to invalid location and breaks some functinality,
like using installkernel on self-hosted system or dtrace's CTF
lookup. This can be fixed by setting the value manually but the
default that reflects correct location is better than default that
points to invalid one.
Current default was set around FreeBSD 1, when "/kernel" was the
actual path. Transition to /boot/kernel/kernel happened circa FreeBSD 3.
PR: 221550
Reviewed by: ian, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18902
It makes the code slightly easier to follow, and might make
it easier to fix the resouce accounting to also account for
the interpreter.
The PROC_UNLOCK() is moved earlier - I don't see anything
it should protect; the lim_max() is a wrapper around lim_rlimit(),
and that, differently from lim_rlimit_proc(), doesn't require
the proc lock to be held.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D19689
A sysid of 0 denotes the local system, and some handlers for remote
locking commands do not attempt to deal with local locks. Note that
F_SETLK_REMOTE is only available to privileged users as it is intended
to be used as a testing interface.
Reviewed by: kib
Reported by: syzbot+9c457a6ae014a3281eb8@syzkaller.appspotmail.com
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19702
There are only 19 bytes available for the name of an interrupt plus the
name(s) of handlers/drivers using it. There is a mechanism from the days of
shared interrupts that replaces some of the handler names with '+' when they
don't all fit into 19 bytes.
In modern times there is typically only one device on an interrupt, but long
device names are the norm, especially with embedded systems. Also, in systems
with multiple interrupt controllers, the names of the interrupts themselves
can be long. For example, 'gic0,s54: imx6_anatop0' doesn't fit, and
replacing the device driver name with a '+' provides no useful info at all.
When there is only one handler but its name was too long to fit, this
change truncates enough leading chars of the handler name (replacing them
with a '-' char to indicate that some chars are missing) to use all 19
bytes, preserving the unit number typically on the end of the name. Using
the prior example, this results in: 'gic0,s54:-6_anatop0' which provides
plenty of info to figure out which device is involved.
PR: 211946
Reviewed by: gonzo@ (prior version without the '-' char)
Differential Revision: https://reviews.freebsd.org/D19675
r343532 noted the difference between "hw.realmem" and "hw.physmem", which I
was previously unaware of. I discovered that neither sysctl had a
description visible via `sysctl -d', so I found where they were defined and
added suitable descriptions. While in the file, I went ahead and added
descriptions for all the others which lacked them. I also updated sysctl.3
accordingly
Reviewed by: kib, bcr
MFC after: 1 weeks
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D19007
The line was misedited to change tt to st instead of
changing ut to st.
The use of st as the denominator in mul64_by_fraction() will lead
to an integer divide fault in the intr proc (the process holding
ithreads) where st will be 0. This divide by 0 happens after
the total runtime for all ithreads exceeds 76 hours.
Submitted by: bde
Add the infrastructure to allow MD procctl(2) commands, and use it to
introduce amd64 PTI control and reporting. PTI mode cannot be
modified for existing pmap, the knob controls PTI of the new vmspace
created on exec.
Requested by: jhb
Reviewed by: jhb, markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514
PTI mode for the process pmap on exec is activated iff P_MD_PTI is set.
On exec, the existing vmspace can be reused only if pti mode of the
pmap matches the P_MD_PTI flag of the process. Add MD
cpu_exec_vmspace_reuse() callback for exec_new_vmspace() which can
vetoed reuse of the existing vmspace.
MFC note: md_flags change struct proc KBI.
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514