Teach poll(2) to support Linux-style POLLRDHUP events for sockets, if
requested. Triggered when the remote peer shuts down writing or closes
its end.
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29757
Currently running `truss -a -e` does not decode any
argument values for freebsd32_* syscalls (open/readlink/etc.)
This change checks whether a syscall starts with freebsd{32,64}_ and if
so strips that prefix when looking up the syscall information. To ensure
that the truss logs include the real syscall name we create a copy of
the syscall information struct with the updated.
The other problem is that when reading string array values, truss
naively iterates over an array of char* and fetches the pointer value.
This will result in arguments not being loaded if the pointer is not
aligned to sizeof(void*), which can happens in the compat32 case. If it
happens to be aligned, we would end up printing every other value.
To fix this problem, this changes adds a pointer_size member to the
procabi struct and uses that to correctly read indirect arguments
as 64/32 bit addresses in the the compat32 case (and also compat64 on
CheriBSD).
The motivating use-case for this change is using truss for 64-bit
programs on a CHERI system, but most of the diff also applies to 32-bit
compat on a 64-bit system, so I'm upstreaming this instead of keeping it
as a local CheriBSD patch.
Output of `truss -aef ldd32 /usr/bin/ldd32` before:
39113: freebsd32_mmap(0x0,0x1000,0x3,0x1002,0xffffffff,0x0,0x0) = 543440896 (0x20644000)
39113: freebsd32_ioctl(0x1,0x402c7413,0xffffd2a0) = 0 (0x0)
/usr/bin/ldd32:
39113: write(1,"/usr/bin/ldd32:\n",16) = 16 (0x10)
39113: fork() = 39114 (0x98ca)
39114: <new process>
39114: freebsd32_execve(0xffffd97e,0xffffd680,0x20634000) EJUSTRETURN
39114: freebsd32_mmap(0x0,0x20000,0x3,0x1002,0xffffffff,0x0,0x0) = 541237248 (0x2042a000)
39114: freebsd32_mprotect(0x20427000,0x1000,0x1) = 0 (0x0)
39114: issetugid() = 0 (0x0)
39114: openat(AT_FDCWD,"/etc/libmap32.conf",O_RDONLY|O_CLOEXEC,00) ERR#2 'No such file or directory'
39114: openat(AT_FDCWD,"/var/run/ld-elf32.so.hints",O_RDONLY|O_CLOEXEC,00) = 3 (0x3)
39114: read(3,"Ehnt\^A\0\0\0\M^@\0\0\0#\0\0\0\0"...,128) = 128 (0x80)
39114: freebsd32_fstat(0x3,0xffffbd98) = 0 (0x0)
39114: freebsd32_pread(0x3,0x2042f000,0x23,0x80,0x0) = 35 (0x23)
39114: close(3) = 0 (0x0)
39114: openat(AT_FDCWD,"/usr/lib32/libc.so.7",O_RDONLY|O_CLOEXEC|O_VERIFY,00) = 3 (0x3)
39114: freebsd32_fstat(0x3,0xffffc7d0) = 0 (0x0)
39114: freebsd32_mmap(0x0,0x1000,0x1,0x40002,0x3,0x0,0x0) = 541368320 (0x2044a000)
After:
783: freebsd32_mmap(0x0,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12),-1,0x0) = 543543296 (0x2065d000)
783: freebsd32_ioctl(1,TIOCGETA,0xffffd7b0) = 0 (0x0)
/usr/bin/ldd32:
783: write(1,"/usr/bin/ldd32:\n",16) = 16 (0x10)
784: <new process>
783: fork() = 784 (0x310)
784: freebsd32_execve("/usr/bin/ldd32",[ "(null)" ],[ "LD_32_TRACE_LOADED_OBJECTS_PROGNAME=/usr/bin/ldd32", "LD_TRACE_LOADED_OBJECTS_PROGNAME=/usr/bin/ldd32", "LD_32_TRACE_LOADED_OBJECTS=yes", "LD_TRACE_LOADED_OBJECTS=yes", "USER=root", "LOGNAME=root", "HOME=/root", "SHELL=/bin/csh", "BLOCKSIZE=K", "MAIL=/var/mail/root", "MM_CHARSET=UTF-8", "LANG=C.UTF-8", "PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/root/bin", "TERM=vt100", "HOSTTYPE=FreeBSD", "VENDOR=amd", "OSTYPE=FreeBSD", "MACHTYPE=x86_64", "SHLVL=1", "PWD=/root", "GROUP=wheel", "HOST=freebsd-amd64", "EDITOR=vi", "PAGER=less" ]) EJUSTRETURN
784: freebsd32_mmap(0x0,135168,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 541212672 (0x20424000)
784: freebsd32_mprotect(0x20421000,4096,PROT_READ) = 0 (0x0)
784: issetugid() = 0 (0x0)
784: sigfastblock(0x1,0x204234fc) = 0 (0x0)
784: open("/etc/libmap32.conf",O_RDONLY|O_CLOEXEC,00) ERR#2 'No such file or directory'
784: open("/var/run/ld-elf32.so.hints",O_RDONLY|O_CLOEXEC,00) = 3 (0x3)
784: read(3,"Ehnt\^A\0\0\0\M^@\0\0\0\v\0\0\0"...,128) = 128 (0x80)
784: freebsd32_fstat(3,{ mode=-r--r--r-- ,inode=18680,size=32768,blksize=0 }) = 0 (0x0)
784: freebsd32_pread(3,"/usr/lib32\0",11,0x80) = 11 (0xb)
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D27625
This change is a refactoring cleanup to improve support for compat32
syscalls (and compat64 on CHERI systems). Each process ABI now has it's
own struct sycall instead of using one global list. The list of all
syscalls is replaced with a list of seen syscalls. Looking up the syscall
argument passing convention now interates over the fixed-size array instead
of using a link-list that's populated on startup so we no longer need the
init_syscall() function.
The actual functional changes are in D27625.
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D27636
In both cases, print the flag bits first followed by the command.
Output now looks something like this:
(ktrace)
_umtx_op(0x8605f7008,0xf<UMTX_OP_WAIT_UINT_PRIVATE>,0,0,0)
_umtx_op(0x9fffdce8,0x80000003<UMTX_OP__32BIT|UMTX_OP_WAKE>,0x1,0,0)
(truss)
_umtx_op(0x7fffffffda50,UMTX_OP_WAKE,0x1,0x0,0x0) = 0 (0x0)
_umtx_op(0x9fffdd08,UMTX_OP__32BIT|UMTX_OP_WAKE,0x1,0x0,0x0) = 0 (0x0)
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D27325
Add an "nextnoskip" sysctl that allows for listing of sysctls intended to be
normally skipped for cost reasons.
This makes it so the names/descriptions of those sysctls can be discovered with
sysctl -aN/sysctl -ad/sysctl -at.
It also makes it so children are visited when a node flagged with CTLFLAG_SKIP
is explicitly requested.
The intended use case is to mark the root "kstat" node with CTLFLAG_SKIP so that
the extensive and expensive stats are skipped by default but may still be easily
obtained without having to know them all (which may not even be possible) and
request each one-by-one.
Reviewed by: jhb
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26560
This is consistent with what we are doing for close(2) and it makes
it a bit easier to follow when debugging file descriptor operations.
i.e. many other syscalls are decoding fds as integers rather than
base 16 numbers.
MFC after: 1 week
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.
This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.
See the review for sample syscall counts.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23574
shm_open2 is similar to shm_open, except it also takes shmflags and optional
name to label the anonymous region for, e.g., debugging purposes.
The appropriate support for decoding shmflags was added to libsysdecode in
r358115.
This is a part of D23733.
Reviewed by: kaktus
In nearly all cases, the caller has a uintptr_t compatible argument so
this eliminates a large number of casts.
Add a print_pointer function to centralize printing pointers.
Reviewed by: jhb
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22212
Add an atomic shm rename operation, similar in spirit to a file
rename. Atomically unlink an shm from a source path and link it to a
destination path. If an existing shm is linked at the destination
path, unlink it as part of the same atomic operation. The caller needs
the same permissions as shm_unlink to the shm being renamed, and the
same permissions for the shm at the destination which is being
unlinked, if it exists. If those fail, EACCES is returned, as with the
other shm_* syscalls.
truss support is included; audit support will come later.
This commit includes only the implementation; the sysent-generated
bits will come in a follow-on commit.
Submitted by: Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: jilles (earlier revision)
Reviewed by: brueffer (manpages, earlier revision)
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21423
This removes all of the architecture-specific functions from truss.
A per-ABI structure is still needed to map syscall numbers to names
and FreeBSD errno values to ABI error values as well as hold syscall
counters. However, the linker set of ABI structures is now replaced
with a simple table mapping ABI names to structures. This approach
permits sharing the same ABI structure among separate names such as
i386 a.out and ELF binaries as well as ELF v1 vs ELF v2 for powerpc64.
A few differences are visible due to using PT_GET_SC_RET to fetch the
error value of a system call. Note that ktrace/kdump have had the
"new" behaviors for a long time already:
- System calls that return with EJUSTRETURN or ERESTART will now be
noticed and logged as such. Previously sigreturn (which uses
EJUSTRETURN) would report whatever random value was in the register
holding errno from the previous system call for example. Now it
reports EJUSTRETURN.
- System calls that return errno as their error value such as
posix_fallocate() and posix_fadvise() now report non-zero return
values as errors instead of success with a non-zero return value.
Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D20963
The default handling showed the argument as hex. Add explicit handling so
we can show it as decimal, since that's how we show file descriptors
everywhere else.
Approved by: mjg (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19295
Currently truss(1) shows shm_open(SHM_ANON, ...) as shm_open("(null)", ...).
Detect the special value and display it by name.
Reviewed by: jhb, allanjude, tuexen
Approved by: mjg (mentor)
MFC with: r339224
Differential Revision: https://reviews.freebsd.org/D17461
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.
Our kernel currently defines two-argument versions of timespecadd and
timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions. Solaris also defines a three-argument version, but
only in its kernel. This revision changes our definition to match the
common three-argument version.
Bump _FreeBSD_version due to the breaking KPI change.
Discussed with: cem, jilles, ian, bde
Differential Revision: https://reviews.freebsd.org/D14725
The general idea here is to provide userspace programs with well-defined
sources of entropy, in a fashion that doesn't require opening a new file
descriptor (ulimits) or accessing paths (/dev/urandom may be restricted
by chroot or capsicum).
getrandom(2) is the more general API, and comes from the Linux world.
Since our urandom and random devices are identical, the GRND_RANDOM flag
is ignored.
getentropy(3) is added as a compatibility shim for the OpenBSD API.
truss(1) support is included.
Tests for both system calls are provided. Coverage is believed to be at
least as comprehensive as LTP getrandom(2) test coverage. Additionally,
instructions for running the LTP tests directly against FreeBSD are provided
in the "Test Plan" section of the Differential revision linked below. (They
pass, of course.)
PR: 194204
Reported by: David CARLIER <david.carlier AT hardenedbsd.org>
Discussed with: cperciva, delphij, jhb, markj
Relnotes: maybe
Differential Revision: https://reviews.freebsd.org/D14500
- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
structures.
The structure name in the record payload is preceded by a size_t
containing the size of the individual structures. Use this to
replace the previous code that dumped the kevent arrays dumped for
kevent(). kdump is now able to decode the kevent structures rather
than dumping their contents via a hexdump.
One change from before is that the 'changes' and 'events' arrays are
not marked with separate 'read' and 'write' annotations in kdump
output. Instead, the first array is the 'changes' array, and the
second array (only present if kevent doesn't fail with an error) is
the 'events' array. For kevent(), empty arrays are denoted by an
entry with an array containing zero entries rather than no record.
- Move kevent decoding tables from truss to libsysdecode.
This adds three new functions to decode members of struct kevent:
sysdecode_kevent_filter, sysdecode_kevent_flags, and
sysdecode_kevent_fflags.
kdump uses these helper functions to pretty-print kevent fields.
- Move structure definitions for freebsd11 and freebsd32 kevent
structures to <sys/event.h> so that they can be shared with userland.
The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
defined. The 32-bit freebsd11 structure requires both.
- Decode freebsd11 kevent structures in truss for the compat11.kevent()
system call.
- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
system calls.
- While here, constify the 'void *data' argument to ktrstruct().
Reviewed by: kib (earlier version)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12470
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Initially, only tag files that use BSD 4-Clause "Original" license.
RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D13133
The most important change in this release is the removal of the
poll_fd() system call; CloudABI's equivalent of kevent(). Though I think
that kqueue is a lot saner than many of its alternatives, our
experience is that emulating this system call on other systems
accurately isn't easy. It has become a complex API, even though I'm not
convinced this complexity is needed. This is why we've decided to take a
different approach, by looking one layer up.
We're currently adding an event loop to CloudABI's C library that is API
compatible with libuv (except when incompatible with Capsicum).
Initially, this event loop will be built on top of plain inefficient
poll() calls. Only after this is finished, we'll work our way backwards
and design a new set of system calls to optimize it.
Interesting challenges will include integrating asynchronous I/O into
such a system call API. libuv currently doesn't aio(4) on Linux/BSD, due
to it being unreliable and having undesired semantics.
Obtained from: https://github.com/NuxiNL/cloudabi
Now that CloudABI's sockets API has been changed to be addressless and
only connected socket instances are used (e.g., socket pairs), they have
become fairly similar to pipes. The only differences on CloudABI is that
socket pairs additionally support shutdown(), send() and recv().
To simplify the ABI, we've therefore decided to remove pipes as a
separate file descriptor type and just let pipe() return a socket pair
of type SOCK_STREAM. S_ISFIFO() and S_ISSOCK() are now defined
identically.
Move tables that were previously in truss over to libsysdecode. truss
output is unchanged, but kdump has been updated to decode these fields.
In addition, sysdecode_sysarch_number() should support all platforms
whereas the old table in truss only supported x86.
Specifically, decode the siginfo structure returned by sigtimedwait(),
sigwaitinfo(), and wait6(). While here, also decode the signal number
returned in the second argument to sigwait().
Now that all of the packaged software has been adjusted to either use
Flower (https://github.com/NuxiNL/flower) for making incoming/outgoing
network connections or can have connections injected, there is no longer
need to keep accept() around. It is now a lot easier to write networked
services that are address family independent, dual-stack, testable, etc.
Remove all of the bits related to accept(), but also to
getsockopt(SO_ACCEPTCONN).
With Flower (CloudABI's network connection daemon) becoming more
complete, there is no longer any need for creating any unconnected
sockets. Socket pairs in combination with file descriptor passing is all
that is necessary, as that is what is used by Flower to pass network
connections from the public internet to listening processes.
Remove all of the kernel bits that were used to implement socket(),
listen(), bindat() and connectat(). In principle, accept() and
SO_ACCEPTCONN may also be removed, but there are still some consumers
left.
Obtained from: https://github.com/NuxiNL/cloudabi
MFC after: 1 month
The CloudABI specification has had some minor changes over the last half
year. No substantial features have been added, but some features that
are deemed unnecessary in retrospect have been removed:
- mlock()/munlock():
These calls tend to be used for two different purposes: real-time
support and handling of sensitive (cryptographic) material that
shouldn't end up in swap. The former use case is out of scope for
CloudABI. The latter may also be handled by encrypting swap.
Removing this has the advantage that we no longer need to worry about
having resource limits put in place.
- SOCK_SEQPACKET:
Support for SOCK_SEQPACKET is rather inconsistent across various
operating systems. Some operating systems supported by CloudABI (e.g.,
macOS) don't support it at all. Considering that they are rarely used,
remove support for the time being.
- getsockname(), getpeername(), etc.:
A shortcoming of the sockets API is that it doesn't allow you to
create socket(pair)s, having fake socket addresses associated with
them. This makes it harder to test applications or transparently
forward (proxy) connections to them.
With CloudABI, we're slowly moving networking connectivity into a
separate daemon called Flower. In addition to passing around socket
file descriptors, this daemon provides address information in the form
of arbitrary string labels. There is thus no longer any need for
requesting socket address information from the kernel itself.
This change also updates consumers of the generated code accordingly.
Even though system calls end up getting renumbered, this won't cause any
problems in practice. CloudABI programs always call into the kernel
through a kernel-supplied vDSO that has the numbers updated as well.
Obtained from: https://github.com/NuxiNL/cloudabi
This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.
To make this useful, data member of the struct kevent must be extended
to 64bit. Using the opportunity, I also added ext members. This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.
The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).
Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2). Compat shims
are provided for both host native and compat32.
Requested by: bapt
Reviewed by: bapt, brooks, ngie (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D11025
This includes decoding both scheduler policy constants and the sched_param
structure for sched_get_priority_max(), sched_get_priority_min(),
sched_getparam(), sched_getscheduler(), sched_rr_get_interval(),
sched_setparam(), and sched_setscheduler().