Linux has more tolerant checks of the user supplied cpuset_t's.
Minimum cpuset_t size that the Linux kernel permits in case of
getaffinity() is the maximum CPU id, present in the system / NBBY,
the maximum size is not limited.
For setaffinity(), Linux does not limit the size of the user-provided
cpuset_t, internally using only the meaningful part of the set, where
the upper bound is the maximum CPU id, present in the system, no larger
than the size of the kernel cpuset_t.
Unlike FreeBSD, Linux ignores high bits if set in the setaffinity(),
so clear it in the sched_setaffinity() and Linuxulator itself.
Reviewed by: Pau Amma (man pages)
In collaboration with: jhb
Differential revision: https://reviews.freebsd.org/D34849
MFC after: 2 weeks
There are some sections which could be improved
and work to do so is on going. The work will be
covered via 'X-MFC-WITH' commits.
Obtained from: OpenBSD
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D34759
Problem is that open(O_PATH) on nullfs -o nocache is broken then,
because there is no reference on the vnode after the open syscall exits.
Reported and tested by: ambrisko
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The error control was not properly implemented. "changelist" is const, hence
event.flags is never changed by the syscall.
PR: 196844
Reported by: eugen@
Reviewed by: PauAmma <pauamma@gundo.com>
Approved by: eugen@
Fixes: 8c231786f01b9f8614e2fe5b47196db1caa7a772
To be more compatible to IEEE Std 1003.1-2008 (“POSIX.1”).
Reviewed by: mjg, Pau Amma (doc)
Differential revision: https://reviews.freebsd.org/D34680
MFC after: 2 weeks
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.
The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.
Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19831
The manpage has contained the following verbiage on the matter for just
under 31 years:
"At least one argument must be present in the array"
Previous to this version, it had been prefaced with the weakening phrase
"By convention."
Carry through and document it the rest of the way. Allowing argc == 0
has been a source of security issues in the past, and it's hard to
imagine a valid use-case for allowing it. Toss back EINVAL if we ended
up not copying in any args for *execve().
The manpage change can be considered "Obtained from: OpenBSD"
Reviewed by: emaste, kib, markj (all previous version)
Differential Revision: https://reviews.freebsd.org/D34045
Reviewed by: kib, markj
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33988
Add an idletime user group that allows non-root users to run processes
with idle scheduling priority. Privileges are granted by a MAC policy in
the mac_priority module. For this purpose, the kernel privilege
PRIV_SCHED_IDPRIO was added to sys/priv.h (kernel module ABI change).
Deprecate the system wide sysctl(8) knob
security.bsd.unprivileged_idprio which lets any user run idle priority
processes, regardless of context. While the knob is still working, it is
marked as deprecated in the description and in the man pages.
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D33338
There's no point in a knob to avoid installing a half dozen manpages.
It's undocumented and unused in the tree. Online, the only metions
I've found are the FreeBSD source tree, a commit in DragonFly BSD
removing it, and some lists of build options for small systems where
it's inevitably redundant due to an accompanying NO_MAN.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D33310
Otherwise the asm stub is used and libthr interposition does not work.
Reviewed by: kib
Fixes: 21f749da82e7 ("libthr: wrap pdfork(2), same as fork(2).")
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Also use the term operation consistently, over the command.
Reviewed by: emaste, jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33277
that returns struct kinfo_file for the given file descriptor. Among
other data, it also returns kf_path, if file op was able to restore file
path.
Reviewed by: jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33277
This is a MAC policy module that grants scheduling privileges based on
group membership. Users or processes in the group realtime (gid 47) are
allowed to run threads and processes with realtime scheduling priority.
For timing-sensitive, low-latency software like audio/jack, running with
realtime priority helps to avoid stutter and gaps.
PR: 239125
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D33191
for compatibility with Linux.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32901
A BPF descriptor only has an associated interface descriptor once it is
attached to an interface, e.g., with BIOCSETIF. Avoid dereferencing a
NULL pointer in filt_bpfwrite() if the BPF descriptor is not attached.
Reviewed by: ae
Reported by: syzbot+ae45d5166afe15a5a21d@syzkaller.appspotmail.com
Fixes: ded77e0237a8 ("Allow the BPF to be select for write.")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32561
for state control over TRACE, TRAPCAP, ASLR, PROTMAX, STACKGAP,
NO_NEWPRIVS, and WXMAP.
Reported by: emaste
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32513
These calls do operate on vnodes only, not file contents.
This is useful for e.g. the xdg-document-portal fuse filesystem.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32438
When this flag is set, operations that update an existing kevent will
not change the udata field. This can be used to NOTE_TRIGGER or
EV_{EN,DIS}ABLE events without overwriting the stashed pointer.
Reviewed by: Domagoj Stolfa <domagoj.stolfa@gmail.com>
Obtained from: CheriBSD
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D30286
It allows to override kern.elf{32,64}.allow_wx on per-process basis.
In particular, it makes it possible to run binaries without PT_GNU_STACK
and without elfctl note while allow_wx = 0.
Reviewed by: brooks, emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31779
On case-insensitive file systems (most likely to be seen on macOS, where
it is the default), _Fork.o for the new POSIX _Fork function conflicts
with _fork.o for the PSEUDO file. This results in non-determinsitic
behaviour in terms of which ends up being present; if _Fork.o wins then
the build fails to link libc.so due to missing __sys_fork, and if
_fork.o wins then libc silently fails to include the implementation of
_Fork. A similar issue occurred in the past for C99's _Exit conflicting
with exit(2) and was fixed in cb1cb6a2a83f, so this adds a fix based on
that.
As a longer-term solution it might be better to instead make the
generated files use a different prefix that's less likely to conflict
with other things (such as __sys_foo.o given they always contain that)
but that's a rather more invasive change.
Fixes: 49ad342cc10c ("Add _Fork()")
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31895
Unlike the other syscalls these two symbols were missing from the
version script. I noticed this while looking into the compiler-rt
runtime libraries for CHERI.
Reviewed by: brooks
Obtained from: https://github.com/CTSRD-CHERI/cheribsd/pull/1063
MFC after: 3 days
The new wording for standard flags is losely based on the POSIX
description.
Make it clearer that PROT_MAX() is a local extension.
Reviewed by: alc, mckusick, imp, kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D31777
This text dates to the BSD 4.4 import and is misleading. The mprotect
syscall acts on page granularity and breaks up mappings as required to
do so.
Note that with the addition of non-transparent superpages (aka
largepages) the size of a page at a given address may vary. This
commit does not attempt to address the lack of documentation of this
feature.
Sponsored by: DARPA
Reviewed by: alc, mckusick, imp, kib, markj
Differential Revision: https://reviews.freebsd.org/D31776
rmsr.r_offset now is set to rqsr.r_offset plus the number of bytes
zeroed before hitting the end-of-file. After this change rmsr.r_offset
no longer contains the EOF when the requested operation range is
completely beyond the end-of-file. Instead in such case rmsr.r_offset is
equal to rqsr.r_offset. Callers can obtain the number of bytes zeroed
by subtracting rqsr.r_offset from rmsr.r_offset.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31677
rmacklem@ spotted two things in the system call:
- Upon returning from a successful operation, vop_stddeallocate can
update rmsr.r_offset to a value greater than file size. This behavior,
although being harmless, can be confusing.
- The EINVAL return value for rqsr.r_offset + rqsr.r_len > OFF_MAX is
undocumented.
This commit has the following changes:
- vop_stddeallocate and shm_deallocate to bound the the affected area
further by the file size.
- The EINVAL case for rqsr.r_offset + rqsr.r_len > OFF_MAX is
documented.
- The fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9)'s return
len is explicitly documented the be the value 0, and the return offset
is restricted to be the smallest of off + len and current file size
suggested by kib@. This semantic allows callers to interact better
with potential file size growth after the call.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D31604