Commit Graph

2112 Commits

Author SHA1 Message Date
Dmitry Chagin
45a4c44299 Bump Dd in getdirentries.2 after c6487446.
MFC after:	1 week
2022-04-20 17:55:32 +03:00
Konstantin Belousov
bf13db086b Mostly revert a5970a529c: Make files opened with O_PATH to not block non-forced unmount
Problem is that open(O_PATH) on nullfs -o nocache is broken then,
because there is no reference on the vnode after the open syscall exits.

Reported and tested by:	ambrisko
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2022-04-14 02:47:04 +03:00
Fernando Apesteguía
e07b0c12ba [patch][doc] Fix EXAMPLE in kqueue(2)
The error control was not properly implemented. "changelist" is const, hence
event.flags is never changed by the syscall.

PR:	196844
Reported by:	eugen@
Reviewed by:	PauAmma <pauamma@gundo.com>
Approved by:	eugen@
Fixes:	8c231786f0
2022-04-13 08:01:58 +02:00
Dmitry Chagin
c6487446d7 getdirentries: return ENOENT for unlinked but still open directory.
To be more compatible to IEEE Std 1003.1-2008 (“POSIX.1”).

Reviewed by:		mjg, Pau Amma (doc)
Differential revision:  https://reviews.freebsd.org/D34680
MFC after:		2 weeks
2022-04-11 23:30:16 +03:00
Greg Lehey
4044083079 chroot.2: Correct grammar errors.
No functional change.

MFC after:	1 week
2022-03-31 13:05:49 +11:00
Mateusz Guzik
f3f3e3c44d fd: add close_range(..., CLOSE_RANGE_CLOEXEC)
For compatibility with Linux.

MFC after:	3 days
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D34424
2022-03-03 17:21:58 +00:00
Konstantin Belousov
a1f9326607 libc binuptime(): use the right function to get the most significant bit index
Reported and tested by:	Jaroslaw Pelczar <jarek@jpelczar.com>
PR:	261781
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2022-02-08 21:44:23 +02:00
Andrew Turner
548a2ec49b Add PT_GETREGSET
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.

The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.

Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19831
2022-01-27 11:40:34 +00:00
Kyle Evans
773fa8cd13 execve: disallow argc == 0
The manpage has contained the following verbiage on the matter for just
under 31 years:

"At least one argument must be present in the array"

Previous to this version, it had been prefaced with the weakening phrase
"By convention."

Carry through and document it the rest of the way.  Allowing argc == 0
has been a source of security issues in the past, and it's hard to
imagine a valid use-case for allowing it.  Toss back EINVAL if we ended
up not copying in any args for *execve().

The manpage change can be considered "Obtained from: OpenBSD"

Reviewed by:	emaste, kib, markj (all previous version)
Differential Revision:	https://reviews.freebsd.org/D34045
2022-01-26 13:40:27 -06:00
John Baldwin
b943d31594 Include the correct header for pdfork()'s prototype.
Reviewed by:	kib, markj
Obtained from:	CheriBSD
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D33988
2022-01-24 09:52:12 -08:00
Konstantin Belousov
a393644ecb ptrace(2): document policies affecting access to the facility
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33986
2022-01-22 19:36:56 +02:00
Konstantin Belousov
7406ec4ea9 kqueue(2): Add note about format of the data for NOTE_EXIT
Noted by:	Dave Baukus <daveb@spectralogic.com>
PR:	261346
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2022-01-20 00:37:25 +02:00
Kirk McKusick
1fa68dae46 Clarify the description of the EINTEGRITY error in intro(2).
Requested by: pauamma_gundo.com
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D18765
2021-12-28 16:39:46 -08:00
Ed Maste
5bc2e6e227 getfh: clarify that it is a privileged operation
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33629
2021-12-23 11:54:43 -05:00
Florian Walpen
a9545eede4 Add idle priority scheduling privilege group to MAC/priority
Add an idletime user group that allows non-root users to run processes
with idle scheduling priority. Privileges are granted by a MAC policy in
the mac_priority module. For this purpose, the kernel privilege
PRIV_SCHED_IDPRIO was added to sys/priv.h (kernel module ABI change).

Deprecate the system wide sysctl(8) knob
security.bsd.unprivileged_idprio which lets any user run idle priority
processes, regardless of context. While the knob is still working, it is
marked as deprecated in the description and in the man pages.

MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D33338
2021-12-10 04:54:48 +02:00
Konstantin Belousov
9f0fea5d03 Document new variant of swapoff(2)
Reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33343
2021-12-09 02:48:53 +02:00
Konstantin Belousov
5346570276 swapoff: add one more variant of the syscall
Requested and reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33343
2021-12-09 02:48:46 +02:00
Brooks Davis
022ce9617f libc: get rid of NO_P1003_1B make variable
There's no point in a knob to avoid installing a half dozen manpages.
It's undocumented and unused in the tree.  Online, the only metions
I've found are the FreeBSD source tree, a commit in DragonFly BSD
removing it, and some lists of build options for small systems where
it's inevitably redundant due to an accompanying NO_MAN.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D33310
2021-12-07 00:21:44 +00:00
Mark Johnston
cbdec8db18 libc: Add pdfork to the list of interposed system calls
Otherwise the asm stub is used and libthr interposition does not work.

Reviewed by:	kib
Fixes:		21f749da82 ("libthr: wrap pdfork(2), same as fork(2).")
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-12-06 18:32:43 -05:00
Konstantin Belousov
97722455cc fcntl(2): be more precise about third arg type
Also use the term operation consistently, over the command.

Reviewed by:	emaste, jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33277
2021-12-07 01:27:38 +02:00
Konstantin Belousov
794d3e8e63 fcntl(2): add F_KINFO operation
that returns struct kinfo_file for the given file descriptor.  Among
other data, it also returns kf_path, if file op was able to restore file
path.

Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33277
2021-12-06 22:18:09 +02:00
Konstantin Belousov
79d650f262 swapoff(2): document extended syscall arguments
Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33165
2021-12-05 00:20:58 +02:00
Florian Walpen
bf2fa8d9d1 MAC/priority module for realtime privilege group
This is a MAC policy module that grants scheduling privileges based on
group membership.  Users or processes in the group realtime (gid 47) are
allowed to run threads and processes with realtime scheduling priority.
For timing-sensitive, low-latency software like audio/jack, running with
realtime priority helps to avoid stutter and gaps.

PR:	239125
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D33191
2021-12-04 20:19:25 +02:00
Konstantin Belousov
77b2c2f814 Add sched_getcpu()
for compatibility with Linux.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32901
2021-11-10 21:18:54 +02:00
Konstantin Belousov
be10c0a910 fexecve(2): allow O_PATH file descriptors opened without O_EXEC
This improves compatibility with Linux.

Noted by:	Drew DeVault <sir@cmpwn.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32821
2021-11-03 18:00:42 +02:00
Mark Johnston
426682b05a bpf: Fix the write filter for detached descriptors
A BPF descriptor only has an associated interface descriptor once it is
attached to an interface, e.g., with BIOCSETIF.  Avoid dereferencing a
NULL pointer in filt_bpfwrite() if the BPF descriptor is not attached.

Reviewed by:	ae
Reported by:	syzbot+ae45d5166afe15a5a21d@syzkaller.appspotmail.com
Fixes:	ded77e0237 ("Allow the BPF to be select for write.")
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32561
2021-10-26 10:00:39 -04:00
Konstantin Belousov
f5bb6e5a6d procctl: actually require debug privileges over target
for state control over TRACE, TRAPCAP, ASLR, PROTMAX, STACKGAP,
NO_NEWPRIVS, and WXMAP.

Reported by:	emaste
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
f833ab9dd1 procctl(2): add consistent shortcut P_ID:0 as curproc
Reported by:	bdrewery, emaste
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Hartmut Brandt
ded77e0237 Allow the BPF to be select for write. This is needed for boost:asio
which otherwise fails to handle BPFs.
Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D31967
2021-10-10 17:03:51 +02:00
Greg V
98dae405de O_PATH: allow vfs_extattr syscalls
These calls do operate on vnodes only, not file contents.
This is useful for e.g. the xdg-document-portal fuse filesystem.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32438
2021-10-11 20:09:49 +03:00
Piotr Pawel Stefaniak
4f556830de nanosleep.2: use appropriate macros
Reported by:	kib
Fixes:		bf8f6ffcb6
2021-10-11 18:57:58 +02:00
Konstantin Belousov
5fb54d2fc8 readlinkat(2): allow O_PATH fd
PR:	258856
Reported by:	ashish
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32390
2021-10-09 22:31:37 +03:00
Piotr Pawel Stefaniak
bf8f6ffcb6 Mention kern.timecounter.alloweddeviation in nanosleep.1
PR:		224837
Reported by:	Aleksander Derevianko
2021-10-08 17:07:50 +02:00
Konstantin Belousov
8d3cd7767a Fix mistakes in link(2) and shm_open(2)
PR:	258957
Submitted by:	sigsys@gmail.com
MFC after:	1 week
2021-10-06 06:38:26 +03:00
Kyle Evans
0f43c5b55c kqueue: clean up some igor and mandoc -Tlint warnings 2021-09-30 21:31:28 -05:00
Kyle Evans
4b5554cebb kqueue: document how timers with low/past timeouts are handled
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D32237
2021-09-30 21:31:28 -05:00
Nathaniel Wesley Filardo
0321a7990b kqueue: Add EV_KEEPUDATA flag
When this flag is set, operations that update an existing kevent will
not change the udata field.  This can be used to NOTE_TRIGGER or
EV_{EN,DIS}ABLE events without overwriting the stashed pointer.

Reviewed by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
Obtained from:	CheriBSD
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D30286
2021-09-23 17:31:39 -07:00
Konstantin Belousov
796a8e1ad1 procctl(2): Add PROC_WXMAP_CTL/STATUS
It allows to override kern.elf{32,64}.allow_wx on per-process basis.
In particular, it makes it possible to run binaries without PT_GNU_STACK
and without elfctl note while allow_wx = 0.

Reviewed by:	brooks, emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31779
2021-09-17 15:42:01 +03:00
Jessica Clarke
877175a17a libc: Fix build on case-insensitive file systems
On case-insensitive file systems (most likely to be seen on macOS, where
it is the default), _Fork.o for the new POSIX _Fork function conflicts
with _fork.o for the PSEUDO file. This results in non-determinsitic
behaviour in terms of which ends up being present; if _Fork.o wins then
the build fails to link libc.so due to missing __sys_fork, and if
_fork.o wins then libc silently fails to include the implementation of
_Fork. A similar issue occurred in the past for C99's _Exit conflicting
with exit(2) and was fixed in cb1cb6a2a8, so this adds a fix based on
that.

As a longer-term solution it might be better to instead make the
generated files use a different prefix that's less likely to conflict
with other things (such as __sys_foo.o given they always contain that)
but that's a rather more invasive change.

Fixes:	49ad342cc1 ("Add _Fork()")
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31895
2021-09-10 01:19:38 +01:00
Alex Richardson
395db99f32 Export _mmap and __sys_mmap from libc.so
Unlike the other syscalls these two symbols were missing from the
version script. I noticed this while looking into the compiler-rt
runtime libraries for CHERI.

Reviewed by:	brooks
Obtained from:	https://github.com/CTSRD-CHERI/cheribsd/pull/1063
MFC after:	3 days
2021-09-09 11:46:53 +01:00
Brooks Davis
85bea309f9 mprotect.2: Improve the description of prot
The new wording for standard flags is losely based on the POSIX
description.

Make it clearer that PROT_MAX() is a local extension.

Reviewed by:	alc, mckusick, imp, kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D31777
2021-09-07 17:28:50 +01:00
Mark Johnston
f756c91168 kqueue.2: Document the fact that EVFILT_READ can be used on kqueues
Reviewed by:	bcr, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31864
2021-09-07 11:19:29 -04:00
Brooks Davis
e51b29b5a9 mprotect.2: Remove legacy BSD text
This text dates to the BSD 4.4 import and is misleading.  The mprotect
syscall acts on page granularity and breaks up mappings as required to
do so.

Note that with the addition of non-transparent superpages (aka
largepages) the size of a page at a given address may vary.  This
commit does not attempt to address the lack of documentation of this
feature.

Sponsored by:	DARPA

Reviewed by:	alc, mckusick, imp, kib, markj
Differential Revision:	https://reviews.freebsd.org/D31776
2021-09-03 19:30:23 +01:00
Ka Ho Ng
9bb8304c10 Symbol.map: Remove an extra space before _Fork
Make it consistent with all other entries.

Sponsored by:	The FreeBSD Foundation
2021-09-02 21:10:22 +08:00
Ka Ho Ng
9e202d036d fspacectl(2): Changes on rmsr.r_offset's minimum value returned
rmsr.r_offset now is set to rqsr.r_offset plus the number of bytes
zeroed before hitting the end-of-file. After this change rmsr.r_offset
no longer contains the EOF when the requested operation range is
completely beyond the end-of-file. Instead in such case rmsr.r_offset is
equal to rqsr.r_offset.  Callers can obtain the number of bytes zeroed
by subtracting rqsr.r_offset from rmsr.r_offset.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31677
2021-08-26 00:03:37 +08:00
Ka Ho Ng
1eaa36523c fspacectl(2): Clarifies the return values
rmacklem@ spotted two things in the system call:
- Upon returning from a successful operation, vop_stddeallocate can
  update rmsr.r_offset to a value greater than file size. This behavior,
  although being harmless, can be confusing.
- The EINVAL return value for rqsr.r_offset + rqsr.r_len > OFF_MAX is
  undocumented.

This commit has the following changes:
- vop_stddeallocate and shm_deallocate to bound the the affected area
  further by the file size.
- The EINVAL case for rqsr.r_offset + rqsr.r_len > OFF_MAX is
  documented.
- The fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9)'s return
  len is explicitly documented the be the value 0, and the return offset
  is restricted to be the smallest of off + len and current file size
  suggested by kib@. This semantic allows callers to interact better
  with potential file size growth after the call.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D31604
2021-08-24 17:08:28 +08:00
Thomas Munro
3904e7966e Fix aio_readv(2), aio_writev(2) with SIGEV_THREAD.
Add missing wrapper code to librt for these new functions so that
SIGEV_THREAD works.  Without machinery to convert it to SIGEV_THREAD_ID,
you got EINVAL.

Reviewed by:    asomers
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D31618
2021-08-22 23:49:23 +12:00
Thomas Munro
f30a1ae8d5 lio_listio(2): Allow LIO_READV and LIO_WRITEV.
Allow multiple vector IOs to be started with one system call.
aio_readv() and aio_writev() already used these opcodes under the
covers.  This commit makes them available to user space.

Being non-standard extensions, they're only visible if __BSD_VISIBLE is
defined, like the functions.

Reviewed by:    asomers, kib
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D31627
2021-08-22 23:00:42 +12:00
Konstantin Belousov
2a51e8823a fork(2): comment about doubtful use of stdio and exit(3) in example
Add fflush(stdout) as the common idiom.  Explain the need to use exit()
but advise against it.

Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D31425
2021-08-08 22:38:59 +03:00
Ka Ho Ng
fd0ffba3b4 Fix pathconf.2 documentation error
_PC_MIN_HOLE_SIZE and _PC_DEALLOC_PRESENT were mixed somehow before this
fix.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	delphij
Differential Revision:	https://reviews.freebsd.org/D31436
2021-08-07 07:09:57 +08:00
Ceri Davies
383dbdb2eb fork.2: correct minor typo in manpage. 2021-08-05 19:36:33 +01:00
Ka Ho Ng
0dc332bff2 Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.

The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.

fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.

fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28347
2021-08-05 23:20:42 +08:00
Konstantin Belousov
49ad342cc1 Add _Fork()
Current POSIX standard requires fork() to be async-signal safe.  Neither
our implementation, nor implementations in other operating systems are,
and practically it is impossible to make fork() async-signal safe without
too much efforts.  Also, that would put undue requirement that all atfork
handlers should be async-signal safe as well, which contradicts its main
use.

As result, Austin Group dropped the requirement, and added a new function
_Fork() that should be async-signal safe, but it does not call atfork
handlers.  Basically, _Fork() can be implemented as a raw syscall.

Release of glibc 2.34 added _Fork(), do the same for FreeBSD.
Clarify threading behavior for fork() in the manpage.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D31378
2021-08-03 21:19:32 +03:00
Warner Losh
155f15118a clock_gettime: Add Linux aliases for CLOCK_*
Linux standardized what we call CLOCK_{REALTIME,MONOTONIC}_FAST as
CLOCK_{REALTIME,MONOTONIC}_COARSE. In addition, Linux spells
CLOCK_UPTIME as CLOCK_BOOTTIME.

Add aliases to time.h and document these new aliases in
clock_gettime(2).

Reviewed by:		vangyzen, kib (prior), dchagin (prior)
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D30988
2021-07-30 17:20:22 -06:00
Roy Marples
7045b1603b socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by:	philip (network), kbowling (transport), gbe (manpages)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26652
2021-07-28 09:35:09 -07:00
Kyle Evans
db0f264393 kenv: allow listing of static kernel environments
The early environment is typically cleared, so these new options
need the PRESERVE_EARLY_KENV kernel config(8) option. These environments
are reported as missing by kenv(1) if the option is not present in the
running kernel.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D30835
2021-07-18 23:06:19 -05:00
David Chisnall
cf98bc28d3 Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-16 18:06:44 +01:00
David Chisnall
d2b558281a Revert "Pass the syscall number to capsicum permission-denied signals"
This broke the i386 build.

This reverts commit 3a522ba1bc.
2021-07-10 20:26:01 +01:00
David Chisnall
3a522ba1bc Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-10 17:19:52 +01:00
Edward Tomasz Napierala
db8d680ebe procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared.  The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.

The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30939
2021-07-01 09:42:07 +01:00
Michael Gmelin
e349cc19cf shm_open(2): Cross-reference posixshmcontrol(1)
When debugging POSIX shared memory issues, it's really
useful to learn that there is a command line tool now
to manipulate shared memory segments.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D30896
2021-06-25 18:12:05 +02:00
Konstantin Belousov
60b0ad10dd vdso: lower precision of vdso implementation of CLOCK_MONOTONIC_FAST and CLOCK_UPTIME_FAST
so that libc vdso and kernel syscall give closer results.

Reported by:	dchagin
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30873
2021-06-24 00:36:33 +03:00
Konstantin Belousov
e912fbe167 vdso gettimeofday: minor restructuring
Call binuptime inside switch statement, instead of pre-calculating
the abs argument.
Change the type of the abs argument to bool.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30873
2021-06-24 00:36:33 +03:00
Mark Johnston
e00bae5c18 kevent: Prohibit negative change and event list lengths
Previously, a negative change list length would be treated the same as
an empty change list.  A negative event list length would result in
bogus copyouts.  Make kevent(2) return EINVAL for both cases so that
application bugs are more easily found, and to be more robust against
future changes to kevent internals.

Reviewed by:	imp, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30480
2021-05-27 15:52:20 -04:00
Konstantin Belousov
fd3ac06f45 ptrace: add an option to not kill debuggees on debugger exit
Requested by:	markj
Reviewed by:	jhb (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differrential revision:	https://reviews.freebsd.org/D30351
2021-05-25 18:22:34 +03:00
Ceri Davies
1760799b4c Remove references to timed(8)
There are still references to timed(8) and timedc(8) in the base system,
which were removed in 2018.

PR: 255425
Reported by:	Ceri Davies <ceri at submonkey dot net>
Reviewed by:	ygy, gbe
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30232
2021-05-13 09:53:08 +02:00
Konstantin Belousov
5e7cdf1817 openat(2): add O_EMPTY_PATH
It reopens the passed file descriptor, checking the file backing vnode'
current access rights against open mode. In particular, this flag allows
to convert file descriptor opened with O_PATH, into operable file
descriptor, assuming permissions allow that.

Reviewed by:	markj
Tested by:	Andrew Walker <awalker@ixsystems.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30148
2021-05-11 02:39:24 +03:00
Edward Tomasz Napierala
1bffa44166 ptrace: document ENOMEM
Reviewed By:	emaste, markj
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29960
2021-05-04 15:22:42 +01:00
Konstantin Belousov
87a64872cd Add ptrace(PT_COREDUMP)
It writes the core of live stopped process to the file descriptor
provided as an argument.

Based on the initial version from https://reviews.freebsd.org/D29691,
submitted by Michał Górny <mgorny@gentoo.org>.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:18:26 +03:00
Konstantin Belousov
07f229d20c connectat(2): clarify that the s argument is socket
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-04-30 17:43:45 +03:00
Thomas Munro
3aaaa2efde poll(2): Add POLLRDHUP.
Teach poll(2) to support Linux-style POLLRDHUP events for sockets, if
requested.  Triggered when the remote peer shuts down writing or closes
its end.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D29757
2021-04-28 23:00:31 +12:00
Robert Watson
8e491aaeac Add code examples to cpuset(2), and improve cross referencing.
MFC after:	1 week
Reviewed by:	jeff, jrtc27, kevans, bcr (manpages)
Differential revision:	https://reviews.freebsd.org/D27803
2021-04-25 15:22:00 +01:00
Mateusz Piotrowski
ca904beafd fork.2: Fix a typo in an example
Reported by:	rpokala
MFC with:	c4207d867c
2021-04-20 10:24:21 +02:00
Mateusz Piotrowski
c4207d867c fork.2: Add a simple use pattern
It seems to be a nice idea to show how fork() is usually used in
practice. This may act as a guide to developers who want to quickly
recall how to use the fork() function.

Reviewed by:	bcr, yuripv
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27626
2021-04-17 23:12:06 +02:00
Konstantin Belousov
bbf7a4e878 O_PATH: allow vnode kevent filter on such files
if VREAD access is checked as allowed during open

Requested by:	wulf
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:49:18 +03:00
Konstantin Belousov
a5970a529c Make files opened with O_PATH to not block non-forced unmount
by only keeping hold count on the vnode, instead of the use count.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:48:27 +03:00
Konstantin Belousov
8d9ed174f3 open(2): Implement O_PATH
Reviewed by:	markj
Tested by:	pho
Discussed with:	walker.aj325_gmail.com, wulf
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:48:24 +03:00
Konstantin Belousov
509124b626 Add AT_EMPTY_PATH for several *at(2) syscalls
It is currently allowed to fchownat(2), fchmodat(2), fchflagsat(2),
utimensat(2), fstatat(2), and linkat(2).

For linkat(2), PRIV_VFS_FHOPEN privilege is required to exercise the flag.
It allows to link any open file.

Requested by:	trasz
Tested by:	pho, trasz
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29111
2021-04-15 12:48:11 +03:00
Konstantin Belousov
c78e124535 link(2): correct descriptor name in AT_RESOLVE_BENEATH description
Noted and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D29111
2021-04-15 12:47:40 +03:00
Piotr Pawel Stefaniak
1fdd6934d5 getdirentries.2: remove unnecessary space 2021-04-11 11:17:01 +02:00
Fernando Apesteguía
3457dbd52b mq_open(2): Fix xref to mq_unlink(2)
mq_unlink(2) was added in acab1d58be

PR: 215611
Reported by: rwatson@FreeBSD.org
Approved by: gbe@ (mentor)
Differential Revision:	https://reviews.freebsd.org/D28913
2021-03-04 13:32:42 +01:00
Konstantin Belousov
20e91ca36a open(2): Remove O_BENEATH and AT_BENEATH
with the reasoning that the flags did not worked properly, and were not
shipped in a release.

O_RESOLVE_BENEATH is kept as useful.

Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:16:55 +02:00
Konstantin Belousov
600756afb5 fhlink(2): the syscalls do not take flag
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:16:55 +02:00
Guangyuan Yang
504e64af32 pwrite(2): add a BUGS section
Add a BUGS section about using pwrite(2) when O_APPEND is set on the fd.

MFC after:	3 days
Submitted by:	Ka Ho Ng <khng300@gmail.com>
Reviewed by:	gbe, yuripv
Differential Revision:	https://reviews.freebsd.org/D28372
2021-02-20 08:05:43 +00:00
Jamie Gritton
d4380c0cdd jail: Change both root and working directories in jail_attach(2)
jail_attach(2) performs an internal chroot operation, leaving it up to
the calling process to assure the working directory is inside the jail.

Add a matching internal chdir operation to the jail's root.  Also
ignore kern.chroot_allow_open_directories, and always disallow the
operation if there are any directory descriptors open.

Reported by:    mjg
Approved by:    markj, kib
MFC after:      3 days
2021-02-19 14:13:35 -08:00
Fernando Apesteguía
acab1d58be mq_unlink(3): Add manual page
Summary: Add a succinct manual page for mq_unlink

Mostly borrowed from https://pubs.opengroup.org/onlinepubs/9699959099/ and
hence, the disclaimer note at the bottom.

PR: 243174
Reported by: rfg-freebsd@tristatelogic.com
Reviewed by: gbe@, yuripv@
Approved by: gbe@ (mentor), yuripv@
Differential Revision: https://reviews.freebsd.org/D28593
2021-02-18 18:56:52 +01:00
Rick Macklem
a0698341cd getdirentries.2: fix for NFS mounts
It was reported that getdirentries(2) was
returning dirents with d_off set to 0 for an NFS
mount.

This is believed to be correct behaviour at
this time (it may change for some NFS mounts
in the future), but is inconsistent with what the
getdirentries(2) man page says.

This patch fixes the man page.

This is a content change.

PR:	253428
Reviewed by:	asomers
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28664
2021-02-14 18:16:58 -08:00
Alexander V. Chernikov
924d1c9a05 Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors."
Wrong version of the change was pushed inadvertenly.

This reverts commit 4a01b854ca.
2021-02-08 22:32:32 +00:00
Alexander V. Chernikov
4a01b854ca SO_RERROR indicates that receive buffer overflows should be handled as errors.
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.

This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.
2021-02-08 21:42:20 +00:00
Alan Somers
ff1a307801 lio_listio: validate aio_lio_opcode
Previously, we would accept any kind of LIO_* opcode, including ones
that were intended for in-kernel use only like LIO_SYNC (which is not
defined in userland).  The situation became more serious with
022ca2fc7f.  After that revision, setting
aio_lio_opcode to LIO_WRITEV or LIO_READV would trigger an assertion.

Note that POSIX does not specify what should happen if aio_lio_opcode is
invalid.

MFC-with:	022ca2fc7f
Reviewed by:	jhb, tmunro, 0mp
Differential Revision:	<https://reviews.freebsd.org/D28078
2021-01-11 19:53:01 -07:00
Konstantin Belousov
21f749da82 libthr: wrap pdfork(2), same as fork(2).
Without wrapping, rtld services and malloc(3) are not guaranteed
to operate correctly in the forked child.

Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28088
2021-01-11 22:59:52 +02:00
Thomas Munro
801ac943ea aio_fsync(2): Support O_DSYNC.
aio_fsync(O_DSYNC, ...) is the asynchronous version of fdatasync(2).

Reviewed by: kib, asomers, jhb
Differential Review: https://reviews.freebsd.org/D25071
2021-01-08 13:15:56 +13:00
Thomas Munro
a5e284038e open(2): Add O_DSYNC flag.
POSIX O_DSYNC means that writes include an implicit fdatasync(2), just
as O_SYNC implies fsync(2).

VOP_WRITE() functions that understand the new IO_DATASYNC flag can act
accordingly, but we'll still pass down IO_SYNC so that file systems that
don't understand it will continue to provide the stronger O_SYNC
behaviour.

Flag also applies to fcntl(2).

Reviewed by: kib, delphij
Differential Revision: https://reviews.freebsd.org/D25090
2021-01-08 13:15:56 +13:00
Alan Somers
022ca2fc7f Add aio_writev and aio_readv
POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but they're an obvious extension. They work just
like their synchronous equivalents pwritev and preadv.

It isn't yet possible to use vectored aiocbs with lio_listio, but that
could be added in the future.

Reviewed by:    jhb, kib, bcr
Relnotes:       yes
Differential Revision: https://reviews.freebsd.org/D27743
2021-01-02 19:57:58 -07:00
Rick Macklem
d189a74dfd copy_file_range(2): add recommendation to use large "len"
PR#252358 reported a serious performance problem w.r.t.
cp(1) when copying large non-sparse files.
This problem appears to have been caused by cp(1)
calling copy_file_range(2) with a small "len" argument.

This patch adds a recommendation to use a large "len"
value where possible, for performance reasons.

Reviewed by:	asomers
Differential Revision:	https://reviews.freebsd.org/D27935
2021-01-02 17:21:21 -08:00
Konstantin Belousov
58b2ed4672 eventfd.2: Add the mail address of the submitter into copyright.
Requested by:	rgrimes
MFC after:	13 days
2020-12-28 21:03:16 +02:00
Konstantin Belousov
6d075fd9a5 Document eventfd().
Submitted by:   greg@unrelenting.technology
Reviewed by:    bcr, markj (previous version)
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D26668
2020-12-27 12:57:26 +02:00
Konstantin Belousov
0ef405eee9 kqueue(2): Use .Fo instead .Ft
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2020-12-27 12:57:26 +02:00
Konstantin Belousov
44c5db52e2 Add eventfd(3) wrappers to libc.
eventfd_read/write one-liners are from musl libc.

Submitted by:   greg@unrelenting.technology
Reviewed by:    markj (previous version)
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D26668
2020-12-27 12:57:26 +02:00
Guangyuan Yang
a1d7836752 mmap(2): Update .Dd missed in the last commit
PR:		252097
MFC after:	1 week
2020-12-24 14:14:56 +00:00