Commit Graph

2057 Commits

Author SHA1 Message Date
Mateusz Guzik
f3f3e3c44d fd: add close_range(..., CLOSE_RANGE_CLOEXEC)
For compatibility with Linux.

MFC after:	3 days
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D34424
2022-03-03 17:21:58 +00:00
Konstantin Belousov
a1f9326607 libc binuptime(): use the right function to get the most significant bit index
Reported and tested by:	Jaroslaw Pelczar <jarek@jpelczar.com>
PR:	261781
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2022-02-08 21:44:23 +02:00
Andrew Turner
548a2ec49b Add PT_GETREGSET
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.

The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.

Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19831
2022-01-27 11:40:34 +00:00
Kyle Evans
773fa8cd13 execve: disallow argc == 0
The manpage has contained the following verbiage on the matter for just
under 31 years:

"At least one argument must be present in the array"

Previous to this version, it had been prefaced with the weakening phrase
"By convention."

Carry through and document it the rest of the way.  Allowing argc == 0
has been a source of security issues in the past, and it's hard to
imagine a valid use-case for allowing it.  Toss back EINVAL if we ended
up not copying in any args for *execve().

The manpage change can be considered "Obtained from: OpenBSD"

Reviewed by:	emaste, kib, markj (all previous version)
Differential Revision:	https://reviews.freebsd.org/D34045
2022-01-26 13:40:27 -06:00
John Baldwin
b943d31594 Include the correct header for pdfork()'s prototype.
Reviewed by:	kib, markj
Obtained from:	CheriBSD
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D33988
2022-01-24 09:52:12 -08:00
Konstantin Belousov
a393644ecb ptrace(2): document policies affecting access to the facility
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33986
2022-01-22 19:36:56 +02:00
Konstantin Belousov
7406ec4ea9 kqueue(2): Add note about format of the data for NOTE_EXIT
Noted by:	Dave Baukus <daveb@spectralogic.com>
PR:	261346
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2022-01-20 00:37:25 +02:00
Kirk McKusick
1fa68dae46 Clarify the description of the EINTEGRITY error in intro(2).
Requested by: pauamma_gundo.com
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D18765
2021-12-28 16:39:46 -08:00
Ed Maste
5bc2e6e227 getfh: clarify that it is a privileged operation
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33629
2021-12-23 11:54:43 -05:00
Florian Walpen
a9545eede4 Add idle priority scheduling privilege group to MAC/priority
Add an idletime user group that allows non-root users to run processes
with idle scheduling priority. Privileges are granted by a MAC policy in
the mac_priority module. For this purpose, the kernel privilege
PRIV_SCHED_IDPRIO was added to sys/priv.h (kernel module ABI change).

Deprecate the system wide sysctl(8) knob
security.bsd.unprivileged_idprio which lets any user run idle priority
processes, regardless of context. While the knob is still working, it is
marked as deprecated in the description and in the man pages.

MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D33338
2021-12-10 04:54:48 +02:00
Konstantin Belousov
9f0fea5d03 Document new variant of swapoff(2)
Reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33343
2021-12-09 02:48:53 +02:00
Konstantin Belousov
5346570276 swapoff: add one more variant of the syscall
Requested and reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33343
2021-12-09 02:48:46 +02:00
Brooks Davis
022ce9617f libc: get rid of NO_P1003_1B make variable
There's no point in a knob to avoid installing a half dozen manpages.
It's undocumented and unused in the tree.  Online, the only metions
I've found are the FreeBSD source tree, a commit in DragonFly BSD
removing it, and some lists of build options for small systems where
it's inevitably redundant due to an accompanying NO_MAN.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D33310
2021-12-07 00:21:44 +00:00
Mark Johnston
cbdec8db18 libc: Add pdfork to the list of interposed system calls
Otherwise the asm stub is used and libthr interposition does not work.

Reviewed by:	kib
Fixes:		21f749da82 ("libthr: wrap pdfork(2), same as fork(2).")
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-12-06 18:32:43 -05:00
Konstantin Belousov
97722455cc fcntl(2): be more precise about third arg type
Also use the term operation consistently, over the command.

Reviewed by:	emaste, jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33277
2021-12-07 01:27:38 +02:00
Konstantin Belousov
794d3e8e63 fcntl(2): add F_KINFO operation
that returns struct kinfo_file for the given file descriptor.  Among
other data, it also returns kf_path, if file op was able to restore file
path.

Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33277
2021-12-06 22:18:09 +02:00
Konstantin Belousov
79d650f262 swapoff(2): document extended syscall arguments
Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33165
2021-12-05 00:20:58 +02:00
Florian Walpen
bf2fa8d9d1 MAC/priority module for realtime privilege group
This is a MAC policy module that grants scheduling privileges based on
group membership.  Users or processes in the group realtime (gid 47) are
allowed to run threads and processes with realtime scheduling priority.
For timing-sensitive, low-latency software like audio/jack, running with
realtime priority helps to avoid stutter and gaps.

PR:	239125
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D33191
2021-12-04 20:19:25 +02:00
Konstantin Belousov
77b2c2f814 Add sched_getcpu()
for compatibility with Linux.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32901
2021-11-10 21:18:54 +02:00
Konstantin Belousov
be10c0a910 fexecve(2): allow O_PATH file descriptors opened without O_EXEC
This improves compatibility with Linux.

Noted by:	Drew DeVault <sir@cmpwn.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32821
2021-11-03 18:00:42 +02:00
Mark Johnston
426682b05a bpf: Fix the write filter for detached descriptors
A BPF descriptor only has an associated interface descriptor once it is
attached to an interface, e.g., with BIOCSETIF.  Avoid dereferencing a
NULL pointer in filt_bpfwrite() if the BPF descriptor is not attached.

Reviewed by:	ae
Reported by:	syzbot+ae45d5166afe15a5a21d@syzkaller.appspotmail.com
Fixes:	ded77e0237 ("Allow the BPF to be select for write.")
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32561
2021-10-26 10:00:39 -04:00
Konstantin Belousov
f5bb6e5a6d procctl: actually require debug privileges over target
for state control over TRACE, TRAPCAP, ASLR, PROTMAX, STACKGAP,
NO_NEWPRIVS, and WXMAP.

Reported by:	emaste
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
f833ab9dd1 procctl(2): add consistent shortcut P_ID:0 as curproc
Reported by:	bdrewery, emaste
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Hartmut Brandt
ded77e0237 Allow the BPF to be select for write. This is needed for boost:asio
which otherwise fails to handle BPFs.
Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D31967
2021-10-10 17:03:51 +02:00
Greg V
98dae405de O_PATH: allow vfs_extattr syscalls
These calls do operate on vnodes only, not file contents.
This is useful for e.g. the xdg-document-portal fuse filesystem.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32438
2021-10-11 20:09:49 +03:00
Piotr Pawel Stefaniak
4f556830de nanosleep.2: use appropriate macros
Reported by:	kib
Fixes:		bf8f6ffcb6
2021-10-11 18:57:58 +02:00
Konstantin Belousov
5fb54d2fc8 readlinkat(2): allow O_PATH fd
PR:	258856
Reported by:	ashish
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32390
2021-10-09 22:31:37 +03:00
Piotr Pawel Stefaniak
bf8f6ffcb6 Mention kern.timecounter.alloweddeviation in nanosleep.1
PR:		224837
Reported by:	Aleksander Derevianko
2021-10-08 17:07:50 +02:00
Konstantin Belousov
8d3cd7767a Fix mistakes in link(2) and shm_open(2)
PR:	258957
Submitted by:	sigsys@gmail.com
MFC after:	1 week
2021-10-06 06:38:26 +03:00
Kyle Evans
0f43c5b55c kqueue: clean up some igor and mandoc -Tlint warnings 2021-09-30 21:31:28 -05:00
Kyle Evans
4b5554cebb kqueue: document how timers with low/past timeouts are handled
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D32237
2021-09-30 21:31:28 -05:00
Nathaniel Wesley Filardo
0321a7990b kqueue: Add EV_KEEPUDATA flag
When this flag is set, operations that update an existing kevent will
not change the udata field.  This can be used to NOTE_TRIGGER or
EV_{EN,DIS}ABLE events without overwriting the stashed pointer.

Reviewed by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
Obtained from:	CheriBSD
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D30286
2021-09-23 17:31:39 -07:00
Konstantin Belousov
796a8e1ad1 procctl(2): Add PROC_WXMAP_CTL/STATUS
It allows to override kern.elf{32,64}.allow_wx on per-process basis.
In particular, it makes it possible to run binaries without PT_GNU_STACK
and without elfctl note while allow_wx = 0.

Reviewed by:	brooks, emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31779
2021-09-17 15:42:01 +03:00
Jessica Clarke
877175a17a libc: Fix build on case-insensitive file systems
On case-insensitive file systems (most likely to be seen on macOS, where
it is the default), _Fork.o for the new POSIX _Fork function conflicts
with _fork.o for the PSEUDO file. This results in non-determinsitic
behaviour in terms of which ends up being present; if _Fork.o wins then
the build fails to link libc.so due to missing __sys_fork, and if
_fork.o wins then libc silently fails to include the implementation of
_Fork. A similar issue occurred in the past for C99's _Exit conflicting
with exit(2) and was fixed in cb1cb6a2a8, so this adds a fix based on
that.

As a longer-term solution it might be better to instead make the
generated files use a different prefix that's less likely to conflict
with other things (such as __sys_foo.o given they always contain that)
but that's a rather more invasive change.

Fixes:	49ad342cc1 ("Add _Fork()")
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31895
2021-09-10 01:19:38 +01:00
Alex Richardson
395db99f32 Export _mmap and __sys_mmap from libc.so
Unlike the other syscalls these two symbols were missing from the
version script. I noticed this while looking into the compiler-rt
runtime libraries for CHERI.

Reviewed by:	brooks
Obtained from:	https://github.com/CTSRD-CHERI/cheribsd/pull/1063
MFC after:	3 days
2021-09-09 11:46:53 +01:00
Brooks Davis
85bea309f9 mprotect.2: Improve the description of prot
The new wording for standard flags is losely based on the POSIX
description.

Make it clearer that PROT_MAX() is a local extension.

Reviewed by:	alc, mckusick, imp, kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D31777
2021-09-07 17:28:50 +01:00
Mark Johnston
f756c91168 kqueue.2: Document the fact that EVFILT_READ can be used on kqueues
Reviewed by:	bcr, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31864
2021-09-07 11:19:29 -04:00
Brooks Davis
e51b29b5a9 mprotect.2: Remove legacy BSD text
This text dates to the BSD 4.4 import and is misleading.  The mprotect
syscall acts on page granularity and breaks up mappings as required to
do so.

Note that with the addition of non-transparent superpages (aka
largepages) the size of a page at a given address may vary.  This
commit does not attempt to address the lack of documentation of this
feature.

Sponsored by:	DARPA

Reviewed by:	alc, mckusick, imp, kib, markj
Differential Revision:	https://reviews.freebsd.org/D31776
2021-09-03 19:30:23 +01:00
Ka Ho Ng
9bb8304c10 Symbol.map: Remove an extra space before _Fork
Make it consistent with all other entries.

Sponsored by:	The FreeBSD Foundation
2021-09-02 21:10:22 +08:00
Ka Ho Ng
9e202d036d fspacectl(2): Changes on rmsr.r_offset's minimum value returned
rmsr.r_offset now is set to rqsr.r_offset plus the number of bytes
zeroed before hitting the end-of-file. After this change rmsr.r_offset
no longer contains the EOF when the requested operation range is
completely beyond the end-of-file. Instead in such case rmsr.r_offset is
equal to rqsr.r_offset.  Callers can obtain the number of bytes zeroed
by subtracting rqsr.r_offset from rmsr.r_offset.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31677
2021-08-26 00:03:37 +08:00
Ka Ho Ng
1eaa36523c fspacectl(2): Clarifies the return values
rmacklem@ spotted two things in the system call:
- Upon returning from a successful operation, vop_stddeallocate can
  update rmsr.r_offset to a value greater than file size. This behavior,
  although being harmless, can be confusing.
- The EINVAL return value for rqsr.r_offset + rqsr.r_len > OFF_MAX is
  undocumented.

This commit has the following changes:
- vop_stddeallocate and shm_deallocate to bound the the affected area
  further by the file size.
- The EINVAL case for rqsr.r_offset + rqsr.r_len > OFF_MAX is
  documented.
- The fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9)'s return
  len is explicitly documented the be the value 0, and the return offset
  is restricted to be the smallest of off + len and current file size
  suggested by kib@. This semantic allows callers to interact better
  with potential file size growth after the call.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D31604
2021-08-24 17:08:28 +08:00
Thomas Munro
3904e7966e Fix aio_readv(2), aio_writev(2) with SIGEV_THREAD.
Add missing wrapper code to librt for these new functions so that
SIGEV_THREAD works.  Without machinery to convert it to SIGEV_THREAD_ID,
you got EINVAL.

Reviewed by:    asomers
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D31618
2021-08-22 23:49:23 +12:00
Thomas Munro
f30a1ae8d5 lio_listio(2): Allow LIO_READV and LIO_WRITEV.
Allow multiple vector IOs to be started with one system call.
aio_readv() and aio_writev() already used these opcodes under the
covers.  This commit makes them available to user space.

Being non-standard extensions, they're only visible if __BSD_VISIBLE is
defined, like the functions.

Reviewed by:    asomers, kib
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D31627
2021-08-22 23:00:42 +12:00
Konstantin Belousov
2a51e8823a fork(2): comment about doubtful use of stdio and exit(3) in example
Add fflush(stdout) as the common idiom.  Explain the need to use exit()
but advise against it.

Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D31425
2021-08-08 22:38:59 +03:00
Ka Ho Ng
fd0ffba3b4 Fix pathconf.2 documentation error
_PC_MIN_HOLE_SIZE and _PC_DEALLOC_PRESENT were mixed somehow before this
fix.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	delphij
Differential Revision:	https://reviews.freebsd.org/D31436
2021-08-07 07:09:57 +08:00
Ceri Davies
383dbdb2eb fork.2: correct minor typo in manpage. 2021-08-05 19:36:33 +01:00
Ka Ho Ng
0dc332bff2 Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.

The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.

fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.

fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28347
2021-08-05 23:20:42 +08:00
Konstantin Belousov
49ad342cc1 Add _Fork()
Current POSIX standard requires fork() to be async-signal safe.  Neither
our implementation, nor implementations in other operating systems are,
and practically it is impossible to make fork() async-signal safe without
too much efforts.  Also, that would put undue requirement that all atfork
handlers should be async-signal safe as well, which contradicts its main
use.

As result, Austin Group dropped the requirement, and added a new function
_Fork() that should be async-signal safe, but it does not call atfork
handlers.  Basically, _Fork() can be implemented as a raw syscall.

Release of glibc 2.34 added _Fork(), do the same for FreeBSD.
Clarify threading behavior for fork() in the manpage.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D31378
2021-08-03 21:19:32 +03:00
Warner Losh
155f15118a clock_gettime: Add Linux aliases for CLOCK_*
Linux standardized what we call CLOCK_{REALTIME,MONOTONIC}_FAST as
CLOCK_{REALTIME,MONOTONIC}_COARSE. In addition, Linux spells
CLOCK_UPTIME as CLOCK_BOOTTIME.

Add aliases to time.h and document these new aliases in
clock_gettime(2).

Reviewed by:		vangyzen, kib (prior), dchagin (prior)
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30988
2021-07-30 17:20:22 -06:00
Roy Marples
7045b1603b socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by:	philip (network), kbowling (transport), gbe (manpages)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26652
2021-07-28 09:35:09 -07:00