The mprotect syscall decleration is not const. I added this one
incorrectly in a944d28d0edf7ceb1bef4d789dfa4e8e18331658.
Reported by: kib
Reviewed by: kib, imp
Rather than combining the declearation of nosys with the registration
of SYS_syscall, declare syscall(2) and __syscall(2) with the new
SYSMUX type in syscalls.master and declare nosys directly. This
eliminates the last use of syscall aliases in the tree.
Reviewed by: kib, imp
This type is for system call multiplexers (syscall(2), __syscall(2))
that don't have a normal handler and instead are handled in the
machine-dependent syscall code.
Reviewed by: kib, imp
Declare the exit system call normally. This results in the
implementation being named sys_exit rather than sys_sys_exit and
being decalred as returning an int. Infact it does not return
at all because exit1 does not, so add an __unreachable() to let the
compiler know that.
Reviewed by: kib, imp
Declare o<foo>_args rather than reusing the equivalent <foo>_args
structs. Avoiding the addition of a new type isn't worth the
gratutious differences.
Reviewed by: kib, imp
This avoids the need to keep a freebsd32-specific syscalls.master
in sync with the default ABI. As evidenced by the number of commits
required to sync the two, it is extremely easy for them to get out
of sync due to misunderstandings and user errors.
Reviewed by: kevans, kib
Add _Contains_ annotations indicating that the data pointed to by a
pointer argument contains types that vary between FreeBSD ABIs. The
supported set is long (including size_t), pointer (including
intptr_t), and time_t. The first two vary between 32- and 64-bit
ABIs. The laste betwen i386 and everything else.
These will be used to detect which syscalls require handling on
particular ABIs.
Reviewed by: kevans, kib
Match the function decleration which takes an int not a signed int.
No functional change as the range of valid values is 0-2.
Obtained from: CheriBSD
Reviewed by: kevans
Rename to match the naming of syscalls and allow 32 to be appended
without making an ugly name like kevent_freebsd1132.
While here, make the kevent changelist argument const.
Reviewed by: kib
for compatibility with Linux.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32901
Switch the main syscall table to use CAPENABLED flags rather than
capabilities.conf. This avoid synchronization issues between
syscalls.master and capabilities.conf (e.g. when renaming a syscall
during development).
For now, move capabilities.conf to sys/compat/freebsd32 and use it
there. Use of sys/compat/freebsd32/syscalls.master should be replaced
by makesyscalls.lua enhancements to allow the main one to be used.
This change results in no changes to generated files after running
`make sysent`.
Reviewed by: kevans, emaste
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D31350
The CAPENABLED flag indicates that the syscall can be used in capsicum
capability mode. It is intended to replace capabilities.conf.
Reviewed by: kevans, emaste
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D31349
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.
The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.
fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.
fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28347
to restore ABI compatibility for pre-10.x binaries.
It restores _umtx_lock() and _umtx_unlock() syscalls, and UMTX_OP_LOCK/
UMTX_OP_UNLOCK umtx_op(2) operations. UMUTEX_ERROR_CHECK flag is left
out for now, I do not think it makes a difference.
PR: 218571
Reviewed by: brooks (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31220
It is best for auditing of syscalls.master if we only append to the
file. Reserving unimplemented system call numbers for local use makes
this policy and provides a large set of syscall numbers FreeBSD
derivatives can use without risk of conflict.
Reviewed by: jhb, kevans, kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27988
RESERVED syscall number are reserved for local/vendor use. RESERVED is
identical to UNIMPL except that comments are ignored.
Reviewed by: kevans
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27988
We have not been able to run binaries from other BSDs well over a
decade. There is no need to document their allocation decisions here.
We also don't need to reserve syscall numbers of never-implemented
syscalls.
Reviewed by: jhb, kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27988
The originally chosen numbers interfere with downstream projects'
syscalls. Move them to the end of the syscall table instead.
Reported by: jrtc27
Reviewed by: brooks
MFC-With: 022ca2fc7fe08d51f33a1d23a9be49e6d132914e
Differential Revision: 022ca2fc7fe08d51f33a1d23a9be49e6d132914e
POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but they're an obvious extension. They work just
like their synchronous equivalents pwritev and preadv.
It isn't yet possible to use vectored aiocbs with lio_listio, but that
could be added in the future.
Reviewed by: jhb, kib, bcr
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D27743
eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues
are not passable between processes. And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow. This came up when debugging strange HW video decode
failures in Firefox. A native implementation would avoid these problems
and help with porting Linux software.
Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.
Submitted by: greg@unrelenting.technology
Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26668
The nfs-over-tls daemons need a system call to perform operations such as
associate a file descriptor with a krpc socket.
The daemons will not be in head for some time, but it will make it
easier for testers of nfs-over-tls to do testing if the system call
is in head (basically the stub for libc which will be commited soon).
Reviewed by: brooks
Differential Revision: https://reviews.freebsd.org/D24949
Include a temporarily compatibility shim as well for kernels predating
close_range, since closefrom is used in some critical areas.
Reviewed by: markj (previous version), kib
Differential Revision: https://reviews.freebsd.org/D24399
close_range(min, max, flags) allows for a range of descriptors to be
closed. The Python folk have indicated that they would much prefer this
interface to closefrom(2), as the case may be that they/someone have special
fds dup'd to higher in the range and they can't necessarily closefrom(min)
because they don't want to hit the upper range, but relocating them to lower
isn't necessarily feasible.
sys_closefrom has been rewritten to use kern_close_range() using ~0U to
indicate closing to the end of the range. This was chosen rather than
requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the
call to kern_close_range for simplicity.
The flags argument of close_range(2) is currently unused, so any flags set
is currently EINVAL. It was added to the interface in Linux so that future
flags could be added for, e.g., "halt on first error" and things of this
nature.
This patch is based on a syscall of the same design that is expected to be
merged into Linux.
Reviewed by: kib, markj, vangyzen (all slightly earlier revisions)
Differential Revision: https://reviews.freebsd.org/D21627
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.
This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.
See the review for sample syscall counts.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23574
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
Co-mingling two things here:
* Addressing some feedback from Konstantin and Kyle re: jail,
capability mode, and a few other things
* Adding audit support as promised.
The audit support change includes a partial refresh of OpenBSM from
upstream, where the change to add shm_rename has already been
accepted. Matthew doesn't plan to work on refreshing anything else to
support audit for those new event types.
Submitted by: Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: kib
Relnotes: Yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22083
Pointer arguments should be of the form "<type> *..." and not "<type>* ...".
No functional change.
Reviewed by: kevans
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22373
Add an atomic shm rename operation, similar in spirit to a file
rename. Atomically unlink an shm from a source path and link it to a
destination path. If an existing shm is linked at the destination
path, unlink it as part of the same atomic operation. The caller needs
the same permissions as shm_unlink to the shm being renamed, and the
same permissions for the shm at the destination which is being
unlinked, if it exists. If those fail, EACCES is returned, as with the
other shm_* syscalls.
truss support is included; audit support will come later.
This commit includes only the implementation; the sysent-generated
bits will come in a follow-on commit.
Submitted by: Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: jilles (earlier revision)
Reviewed by: brueffer (manpages, earlier revision)
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21423
shm_open2 allows a little more flexibility than the original shm_open.
shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate
shmflag argument that can be expanded later. Currently the only shmflag is
to allow file sealing on the returned fd.
shm_open and memfd_create will both be implemented in libc to use this new
syscall.
__FreeBSD_version is bumped to indicate the presence.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D21393