Commit Graph

2112 Commits

Author SHA1 Message Date
Guangyuan Yang
81720dbab2 mmap(2): Fix a typo
PR:             252097
MFC after:      1 week
Reported by:    Nick Frampton <nick.frampton@akips.com>
2020-12-24 14:08:34 +00:00
Gordon Bergling
f6d234d870 libc: Fix most issues reported by mandoc
- varios "new sentence, new line" warnings
- varios "sections out of conventional order" warnings
- varios "unusual Xr order" warnings
- varios "missing section argument" warnings
- varios "no blank before trailing delimiter" warnings
- varios "normalizing date format" warnings

MFC after:	1 month
2020-12-19 14:54:28 +00:00
Enji Cooper
0c424c64f8 cpuset{,_getaffinity,_getdomain}.2: fix SEE ALSO
Sort by manpage section, then sort entries alphabetically.

This makes the manpages `make manlint` clean.

MFC after:	1 week
Sponsored by:	DellEMC Isilon
2020-12-11 01:52:27 +00:00
Enji Cooper
92d4164179 aio_suspend.2: properly canonicalize .Dd
Months should be fully spelled as their local-specific equivalents: in this
case `Oct` should have been spelled like `October`.

Reported by:	make manlint
MFC after:	1 week
Sponsored by:	DellEMC Isilon
2020-12-11 00:28:28 +00:00
Enji Cooper
20daf0ca6e cap_enter(2): fix CAVEATS section
The CAVEATS section was misspelled as "CAVEAT" before this change. Fix the
spelling to identify issues related to the section.

Furthermore, given that the section order was incorrect, move the CAVEATS
section down to the bottom of the manpage, per the conventional section
order.

MFC after:	1 week
Reported by:	make manlint
Sponsored by:	DellEMC Isilon
2020-12-11 00:26:49 +00:00
Kyle Evans
e04a83a3e1 _umtx_op(2): document recent addition of 32bit compat flags
This was part of D27325.

Reviewed by:	kib
2020-12-09 03:20:51 +00:00
Enji Cooper
00107a56e5 extattr_get_file(20: bump .Dd
This is being done for the formatting and context changes. While the net content
hasn't been changed, the content/context changes were sufficient to warrant the
date bump.

MFC after:	1 week
MFC with:	r368431, r368433, r368434, r368435
Sponsored by:	DellEMC Isilon
2020-12-08 04:18:16 +00:00
Enji Cooper
cf681016d4 extattr_get_file(2): clarify RETURN VALUES
While some of the syscalls' behavior were documented and implied in the
RETURN VALUES section by earlier, e.g., the DESCRIPTION sections, as having
behavior of the other calls (`*_fd` vs `*_file` vs `*_link`), there was a lot
of implied return value behavior in the section prior to this change.

Explicitly document the syscall behavior per the current implementation in
sys/kern/vfs_extattr.c so others can better develop based on its explicit
documented behavior instead of having to digest the context of the manpage to
understand the appropriate behavior.

MFC after:	1 week
MFC with:	r368431, r368433, r368434
Sponsored by:	DellEMC Isilon
2020-12-08 04:16:05 +00:00
Enji Cooper
f705523939 extattr_get_file(2): fix more formatting
- Remove an unnecessary trailing comma separating a two-item clause.
- Sort more function calls alphabetically (in the same vein as r368433).

MFC after:	1 week
Sponsored by:	DellEMC Isilon
2020-12-08 04:05:19 +00:00
Enji Cooper
e8e0f91b8b extattr_get_file(2): sort syscalls alphabetically
Although some sections of the manpage sort the syscalls alphabetically, many
core areas of the manpage do not. Sort the syscalls so it is easier to pick out
functional changes and to improve manpage readability.

This formatting change is also being done to make future functional changes
easier to spot.

MFC after:	1 week
Sponsored by:	DellEMC Isilon
2020-12-08 04:01:03 +00:00
Enji Cooper
403b2124d4 lio_listio(2): fix manlint error
The date with .Dd prior to this change isn't canonically spelled out: it
should have been "December", not "Dec".

MFC after:	1 week
Sponsored by:	DellEMC Isilon
2020-12-08 03:48:05 +00:00
Enji Cooper
9d610b1516 extattr_get_fd(2): fix manlint errors
- The CAVEATS section was misspelled as "CAVEAT".
- The CAVEATS section should come before the "BUGS" section and after
  other existing sections by convention.

MFC after:	1 week
Reported by:	make manlint
Sponsored by:	DellEMC Isilon
2020-12-08 03:43:00 +00:00
Kyle Evans
231f59920a _umtx_op: document UMTX_OP_SEM2_WAIT copyout behavior
This clever technique to get a time remaining back was added to support sem_clockwait_np.

Reviewed by:	kib, vangyzen
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27160
2020-11-17 03:26:56 +00:00
Edward Tomasz Napierala
bce7ee9d41 Drop "All rights reserved" from all my stuff. This includes
Foundation copyrights, approved by emaste@.  It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.

Reviewed by:	emaste, imp, gbe (manpages)
Differential Revision:	https://reviews.freebsd.org/D26980
2020-10-28 13:46:11 +00:00
Alan Cox
76c7af51ab Revise the description of MAP_STACK. In particular, describe the guard
in more detail.

Reviewed by:	bcr, kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D26908
2020-10-27 18:08:33 +00:00
John-Mark Gurney
0fda26dbb3 update write(2)'s iovec limit w/ info about the iosize_max_clamp sysctl... 2020-10-26 00:37:31 +00:00
Konstantin Belousov
69c09181d4 mmap(2): Document guard size for MAP_STACK and related EINVAL.
Based on submission by:	emaste
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26894
2020-10-21 21:40:33 +00:00
Gordon Bergling
3d265fce43 Fix a few mandoc issues
- skipping paragraph macro: Pp after Sh
- sections out of conventional order: Sh EXAMPLES
- whitespace at end of input line
- normalizing date format
2020-10-09 19:12:44 +00:00
Warner Losh
61c4a6f317 Updates to chroot(2) docs
1. Note what settings give historic behavior
2. Recommend jail under security considerations.
2020-09-29 18:13:54 +00:00
Konstantin Belousov
1f305be431 Document {O,AT}_RESOLVE_BENEATH and new O_BENEATH behavior for relative paths.
PR:	248335
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:54:54 +00:00
Mark Johnston
3d1098617b Fix error checking in shm_create_largepage().
Reviewed by:	alc, kib
Reported by:	Coverity
MFC with:	r365524
Differential Revision:	https://reviews.freebsd.org/D26464
2020-09-18 12:30:15 +00:00
Kyle Evans
8b8cf4ece6 memfd_create: simplify HUGETLB support a little bit
This also fixes a minor issue that was missed in the initial review; the
layout of the MFD_HUGE_* flags is actually not 1:1 bit:flag -- it instead
borrowed the Linux convention of how this is laid out since it was
originally implemented on Linux, the top 6 bits represent the shift required
for the requested page size.

This allows us to remove the flag <-> pgsize mapping table and simplify the
logic just prior to validation of the requested page size.

While we're here, fix two small nits:

- HUGETLB memfd shouldn't exhibit the SHM_GROW_ON_WRITE behavior. We can
  only grow largepage shm by appropriately aligned (i.e. requested pagesize)
  sizes, so it can't work in the typical/sane fashion. Furthermore, Linux
  does the same, so let's be compatible.

- We don't allow MFD_HUGETLB without specifying a pagesize, so no need to
  check for that later.

Reviewed by:	kib (slightly earlier version)
2020-09-11 02:02:15 +00:00
Kyle Evans
9bf2b80ca6 memfd_create: fix return values
Literally returning EINVAL from a function designed to return an fd makes
for interesting scenarios.

I cannot assign enough pointy hats to cover this one.
2020-09-10 21:25:16 +00:00
Kyle Evans
944174e7bf Fix memfd_create tests after r365524
r365524 did accidentally invert this check that sets SHM_LARGEPAGE, leading
non-hugetlb memfd as unconfigured largepage shm and thus test failures when
we try to ftruncate or write to them.

PR:		249236
Discussed with:	kib
2020-09-10 17:23:30 +00:00
Konstantin Belousov
3ef55e8f25 Add shm_create_largepage(3) helper for creation and configuration of
largepage shm objects.

And since we can, add memfd_create(MFD_HUGETLB) support, hopefully
close enough to the Linux feature.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-09 22:20:36 +00:00
Kyle Evans
69112cca60 getlogin_r: fix the type of len
getlogin_r is specified by POSIX to to take a size_t len, not int. Fix our
version to do the same, bump the symbol version due to ABI change and
provide compat.

This was reported to break compilation of Ruby 2.8.

Some discussion about the necessity of the ABI compat did take place in the
review. While many 64-bit platforms would likely be passing it in a 64-bit
register and zero-extended and thus, not notice ABI breakage, some do
sign-extend (e.g. mips).

PR:		247102
Submitted by:	Bertram Scharpf <software@bertram-scharpf.de> (original)
Submitted by:	cem (ABI compat)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D26335
2020-09-09 18:07:13 +00:00
Mark Johnston
847ab36bf2 Include the psind in data returned by mincore(2).
Currently we use a single bit to indicate whether the virtual page is
part of a superpage.  To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.

The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl.  MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.

For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.

Reviewed by:	alc, kib
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26238
2020-09-02 18:16:43 +00:00
Alan Somers
e3f1731aec [skip ci] document close_range(2) as async-signal-safe
Reviewed by:	bcr (manpages)
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25513
2020-07-21 16:46:40 +00:00
Gordon Bergling
e7677232d6 lseek(2): Document the seek behavior better and update the POSIX compliance
In certain situations lseek(2) will return successful although if no seek
was performed. This can happen when operating on devices that don't support
seeking (older tape drives) or when operating on changeable media devices
(such as DVD or Blu-ray devices) without a medium inserted.

Document this within the man page and update the POSIX compliance while here.

PR:		162765
Submitted by:	arundel@
Reported by:	arundel@
Reviewed by:	bcr (mentor)
Approved by:	bcr (mentor)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25646
2020-07-13 15:52:57 +00:00
Allan Jude
0e3972bc19 procctl(2): consistently refer to the last agrument as 'data'
Some older references called it 'arg'

Also fix a syntax error that was underlining an entire sentence.

PR:		247386
Reported by:	Paul Floyd <paulf@free.fr>, PauAmma (research)
MFC after:	2 weeks
Sponsored by:	Klara Inc.
2020-07-11 18:04:09 +00:00
Kyle Evans
423a033ba7 memfd_create: turn on SHM_GROW_ON_WRITE
memfd_create fds will no longer require an ftruncate(2) to set the size;
they'll grow (to the extent that it's possible) upon write(2)-like syscalls.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25502
2020-07-10 00:45:16 +00:00
Mark Johnston
fe59cb6ba2 Apply the logic from r363051 to semctl(2) and __sem_base field.
Reported by:	Jeffball <jeffball@grimm-co.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25600
2020-07-09 18:34:54 +00:00
Mark Johnston
f4f16af1d3 Avoid copying out kernel pointers from msgctl(IPC_STAT).
While this behaviour is harmless, it is really just an artifact of the
fact that the msgctl(2) implementation uses a user-visible structure as
part of the internal implementation, so it is not deliberate and these
pointers are not useful to userspace.  Thus, NULL them out before
copying out, and remove references to them from the manual page.

Reported by:	Jeffball <jeffball@grimm-co.com>
Reviewed by:	emaste, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25600
2020-07-09 17:26:49 +00:00
Warner Losh
f045cfb816 Chroot actually appeared in 7th Edition Unix.
Chroot appeared during the development of 7th edition Unix. The FreeBSD jail
documents, incorrectly, that Bill Joy added this to 4.2BSD on 18 March
1982. That was when Bill Joy converted from a statically coded system call glue
to dynamically generated assembler. Chroot was present in 32V, 3BSD, 4.0BSD, 4.1BSD
and 4.1cBSD well in advance of this. Kirk McKusick agrees with this analysis.

See also:
	V7: https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/sys/chroot.s
	32V: https://minnie.tuhs.org/cgi-bin/utree.pl?file=32V/usr/src/libc/sys/chroot.s
	3BSD: https://minnie.tuhs.org/cgi-bin/utree.pl?file=3BSD/usr/src/libc/sys/chroot.s
	4BSD: https://minnie.tuhs.org/cgi-bin/utree.pl?file=4BSD/usr/src/libc/sys/chroot.s
	4.1cBSD: https://minnie.tuhs.org/cgi-bin/utree.pl?file=4.1cBSD/usr/src/libc/sys/chroot.s

The 6th and earlier editions do not have this system call, nor do they have
anything named chroot in the trees available from TUHS.

Reviewed by: allanjude@
Differential Revision: https://reviews.freebsd.org/D25475
2020-06-26 22:05:23 +00:00
Pawel Biernacki
eb8a06388c man page of select(2) should mention pselect(2)
Reviewed by:	bcr (manpages), kib, trasz
Approved by:	kib (mentor)
MFC after:	7 days
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D25169
2020-06-25 12:31:05 +00:00
Mateusz Piotrowski
3b3e9cfb1b Fix a typo in cpuset_getdomain.2
PR:		247385
Reported by:	Paul Floyd <paulf free.fr>
MFC after:	1 week
2020-06-18 19:03:20 +00:00
Gordon Bergling
421f325efc libcasper(3): Document HISTORY within the manpages
Reviewed by:	bcr (mentor)
Approved by:	bcr (mentor)
MFC after:		7 days
Differential Revision:	https://reviews.freebsd.org/D24695
2020-06-16 16:48:52 +00:00
Gordon Bergling
e0f7c06de2 libc manpages: various improvements from NetBSD
- Add STANDARDS and HISTORY sections within the appropriate manpages
- Mention two USENIX papers within kqueue(2) and strlcpy(3)

Reviewed by:	bcr (mentor)
Approved by:	bcr (mentor)
Obtained from:	NetBSD
MFC after:	7 days
Differential Revision: https://reviews.freebsd.org/D24650
2020-06-14 05:59:30 +00:00
Konstantin Belousov
6cf8fba381 procctl(2): document PROC_KPTI
Reviewed by:	bcr
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25258
2020-06-13 18:19:42 +00:00
Konstantin Belousov
7e54fea1d1 procctl(2): consistently refer to the data pointer as 'data'.
Reviewed by:	bcr
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25258
2020-06-13 18:18:34 +00:00
Kyle Evans
63619b6dba vfs: add restrictions to read(2) of a directory [2/2]
This commit adds the priv(9) that waters down the sysctl to make it only
allow read(2) of a dirfd by the system root. Jailed root is not allowed, but
jail policy and superuser policy will abstain from allowing/denying it so
that a MAC module can fully control the policy.

Such a MAC module has been written, and can be found at:
https://people.freebsd.org/~kevans/mac_read_dir-0.1.0.tar.gz

It is expected that the MAC module won't be needed by many, as most only
need to do such diagnostics that require this behavior as system root
anyways. Interested parties are welcome to grab the MAC module above and
create a port or locally integrate it, and with enough support it could see
introduction to base. As noted in mac_read_dir.c, it is released under the
BSD 2 clause license and allows the restrictions to be lifted for only
jailed root or for all unprivileged users.

PR:		246412
Reviewed by:	mckusick, kib, emaste, jilles, cy, phk, imp (all previous)
Reviewed by:	rgrimes (latest version)
Differential Revision:	https://reviews.freebsd.org/D24596
2020-06-04 18:17:25 +00:00
Kyle Evans
dcef4f65ae vfs: add restrictions to read(2) of a directory [1/2]
Historically, we've allowed read() of a directory and some filesystems will
accommodate (e.g. ufs/ffs, msdosfs). From the history department staffed by
Warner: <<EOF

pdp-7 unix seemed to allow reading directories, but they were weird, special
things there so I'm unsure (my pdp-7 assembler sucks).

1st Edition's sources are lost, mostly. The kernel allows it. The
reconstructed sources from 2nd or 3rd edition read it though.

V6 to V7 changed the filesystem format, and should have been a warning, but
reading directories weren't materially changed.

4.1b BSD introduced readdir because of UFS. UFS broke all directory reading
programs in 1983. ls, du, find, etc all had to be rewritten. readdir() and
friends were introduced here.

SysVr3 picked up readdir() in 1987 for the AT&T fork of Unix. SysVr4 updated
all the directory reading programs in 1988 because different filesystem
types were introduced.

In the 90s, these interfaces became completely ubiquitous as PDP-11s running
V7 faded from view and all the folks that initially started on V7 upgraded
to SysV. Linux never supported this (though I've not done the software
archeology to check) because it has always had a pathological diversity of
filesystems.
EOF

Disallowing read(2) on a directory has the side-effect of masking
application bugs from relying on other implementation's behavior
(e.g. Linux) of rejecting these with EISDIR across the board, but allowing
it has been a vector for at least one stack disclosure bug in the past[0].

By POSIX, this is implementation-defined whether read() handles directories
or not. Popular implementations have chosen to reject them, and this seems
sensible: the data you're reading from a directory is not structured in some
unified way across filesystem implementations like with readdir(2), so it is
impossible for applications to portably rely on this.

With this patch, we will reject most read(2) of a dirfd with EISDIR. Users
that know what they're doing can conscientiously set
bsd.security.allow_read_dir=1 to allow read(2) of directories, as it has
proven useful for debugging or recovery. A future commit will further limit
the sysctl to allow only the system root to read(2) directories, to make it
at least relatively safe to leave on for longer periods of time.

While we're adding logic pertaining to directory vnodes to vn_io_fault, an
additional assertion has also been added to ensure that we're not reaching
vn_io_fault with any write request on a directory vnode. Such request would
be a logical error in the kernel, and must be debugged rather than allowing
it to potentially silently error out.

Commented out shell aliases have been placed in root's chsrc/shrc to promote
awareness that grep may become noisy after this change, depending on your
usage.

A tentative MFC plan has been put together to try and make it as trivial as
possible to identify issues and collect reports; note that this will be
strongly re-evaluated. Tentatively, I will MFC this knob with the default as
it is in HEAD to improve our odds of actually getting reports. The future
priv(9) to further restrict the sysctl WILL NOT BE MERGED BACK, so the knob
will be a faithful reversion on stable/12. We will go into the merge
acknowledging that the sysctl default may be flipped back to restore
historical behavior at *any* point if it's warranted.

[0] https://www.freebsd.org/security/advisories/FreeBSD-SA-19:10.ufs.asc

PR:		246412
Reviewed by:	mckusick, kib, emaste, jilles, cy, phk, imp (all previous)
Reviewed by:	rgrimes (latest version)
MFC after:	1 month (note the MFC plan mentioned above)
Relnotes:	absolutely, but will amend previous RELNOTES entry
Differential Revision:	https://reviews.freebsd.org/D24596
2020-06-04 18:09:55 +00:00
Ed Maste
3f65edb369 mmap.2: correct prot argument terminology
One of the error descriptions referred to permissions; in context the
meaning was probably clear, but the prot values are properly called
protections.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-06-03 20:42:52 +00:00
John Baldwin
ae84ff9c47 Document SO_NO_OFFLOADS and SO_NO_DDP.
Reviewed by:	bcr, np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25043
2020-06-03 18:59:31 +00:00
Rick Macklem
6f05ed08c2 Add an entry to Symbol.map for the rpctls_syscall added by r361599.
Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D24949
2020-05-28 21:26:26 +00:00
Kyle Evans
880ff10ba9 procctl(2): correct a minor cut-n-pasto
This is clearly describing PROC_PROTMAX_FORCE_DISABLE, rather than
PROC_ASL_FORCE_DISABLE.

Submitted by:	sigsys@gmail.com
2020-05-16 04:52:29 +00:00
Benedict Reuschling
6a7016194d Add HISTORY sections to document when this
functionality first appeared in FreeBSD.

Submitted by:	Gordon Bergling gbergling_gmail.com
Approved by:	bcr
Differential Revision:	https://reviews.freebsd.org/D24677
2020-05-05 19:31:47 +00:00
Mark Johnston
bea2668321 Document handling of connection-mode sockets by sendto(2).
sendto(2), sendmsg(2) and sendmmsg(2) return ENOTCONN if a destination
address is specified and the socket is not connected and the socket
protocol does not automatically connect ("implied connect").  Document
that.  Also document the fact that the destination address is ignored
for connection-mode sockets if the socket is already connected.

PR:		245817
Submitted by:	Erik Inge Bolsø <knan-bfo@modirum.com>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24530
2020-04-27 16:12:32 +00:00
Mark Johnston
569eb766c5 Fix handling of EV_EOF for named pipes.
Contrary to the kevent man page, EV_EOF on a fifo is not cleared by
EV_CLEAR.  Modify the read and write filters to clear EV_EOF when the
fifo's PIPE_EOF flag is clear, and update the man page to document the
new behaviour.

Modify the write filter to return the amount of buffer space available
even if no readers are present.  This matches the behaviour for sockets.

When reading from a pipe, only call pipeselwakeup() if some data was
actually read.  This prevents the continuous re-triggering of a
EVFILT_READ event on EOF when in edge-triggered mode.

PR:		203366, 224615
Submitted by:	Jan Kokemüller <jan.kokemueller@gmail.com>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24528
2020-04-27 15:59:19 +00:00
Conrad Meyer
1e72c52e23 libc: partially revert r326576
In r326576 ("use @@@ instead of @@ in __sym_default"), an earlier version of
the phabricator-discussed patch was inadvertently committed.  The commit
message claims that @@@ means that weak is not needed, but that was due to a
misunderstanding of the use of weak symbols in this context by the submitted
in the first draft of the patch; the description text was not updated to
match the discussion.  As discussed in phabricator, weak is needed for
symbol interposing because of the behavior of our rtld, and is widely used
elsewhere in libc.

This partial revert restores the approved version of the patch and permits
symbol interposing for openat.

Reported by:	Raymond Ramsden <rramsden AT isilon.com>
Reviewed by:	dim, emaste, kib (2017)
Discussed with:	kib (2020)
Differential Revision:	https://reviews.freebsd.org/D11653
2020-04-25 14:24:54 +00:00
Mateusz Piotrowski
5dcf0083fc Fix a typo
Reported by:	pstef
MFC after:	2 days
2020-04-24 22:04:14 +00:00
Kyle Evans
a269a14ff0 kqueue(2): de-vandalize the random sentence in the middle
A last minute change appears to have inadvertently vandalized unrelated
parts of the manpage with the date. =-(

Reported by:	rpokala
2020-04-22 04:05:02 +00:00
Kyle Evans
00b0f94c58 kqueue(2): add a note about EV_RECEIPT
In the below-referenced PR, a case is attached of a simple reproducer that
exhibits suboptimal behavior: EVFILT_READ and EVFILT_WRITE being set in the
same kevent(2) call will only honor the first one. This is, in-fact, how
it's supposed to work.

A read of the manpage leads me to believe we could be more clear about this;
right now there's a logical leap to make in the relevant statement: "When
passed as input, it forces EV_ERROR to always be returned." -- the logical
leap being that this indicates the caller should have allocated space for
the change to be returned with EV_ERROR indicated in the events, or
subsequent filters will get dropped on the floor.

Another possible workaround that accomplishes similar effect without needing
space for all events is just setting EV_RECEIPT on the final change being
passed in; if any errored before it, the kqueue would not be drained. If we
made it to the final change with EV_RECEIPT set, then we would return that
one with EV_ERROR and still not drain the kqueue. This would seem to not be
all that advisable.

PR:		229741
MFC after:	1 week
2020-04-22 03:45:52 +00:00
Kyle Evans
7851fb8ecb closefrom: clamp lowfd to >= 0; close_range's parameters are unsigned.
Pointy hat:	kevans
Reported by:	CI (lwhsu)
2020-04-14 23:24:24 +00:00
Kyle Evans
7d03e08112 Mark closefrom(2) COMPAT12, reimplement in libc to wrap close_range
Include a temporarily compatibility shim as well for kernels predating
close_range, since closefrom is used in some critical areas.

Reviewed by:	markj (previous version), kib
Differential Revision:	https://reviews.freebsd.org/D24399
2020-04-14 18:07:42 +00:00
Jonathan T. Looney
fb401f1bba Make sonewconn() overflow messages have per-socket rate-limits and values.
sonewconn() emits debug-level messages when a listen socket's queue
overflows. Currently, sonewconn() tracks overflows on a global basis. It
will only log one message every 60 seconds, regardless of how many sockets
experience overflows. And, when it next logs at the end of the 60 seconds,
it records a single message referencing a single PCB with the total number
of overflows across all sockets.

This commit changes to per-socket overflow tracking. The code will now
log one message every 60 seconds per socket. And, the code will provide
per-socket queue length and overflow counts. It also provides a way to
change the period between log messages using a sysctl.

Reviewed by:	jhb (previous version), bcr (manpages)
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24316
2020-04-14 15:38:18 +00:00
Kyle Evans
7c5e60c72e libc: remove shm_open(2)'s compat fallback
This had been introduced to ease any pain for using slightly older kernels
with a newer libc, e.g., for bisecting a kernel across the introduction of
shm_open2(2). 6 months has passed, retire the fallback and let shm_open()
unconditionally call shm_open2().

Stale includes are removed as well.
2020-04-13 15:59:15 +00:00
Kyle Evans
472ced39ef Implement a close_range(2) syscall
close_range(min, max, flags) allows for a range of descriptors to be
closed. The Python folk have indicated that they would much prefer this
interface to closefrom(2), as the case may be that they/someone have special
fds dup'd to higher in the range and they can't necessarily closefrom(min)
because they don't want to hit the upper range, but relocating them to lower
isn't necessarily feasible.

sys_closefrom has been rewritten to use kern_close_range() using ~0U to
indicate closing to the end of the range. This was chosen rather than
requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the
call to kern_close_range for simplicity.

The flags argument of close_range(2) is currently unused, so any flags set
is currently EINVAL. It was added to the interface in Linux so that future
flags could be added for, e.g., "halt on first error" and things of this
nature.

This patch is based on a syscall of the same design that is expected to be
merged into Linux.

Reviewed by:	kib, markj, vangyzen (all slightly earlier revisions)
Differential Revision:	https://reviews.freebsd.org/D21627
2020-04-12 21:23:19 +00:00
Konstantin Belousov
09bae0a023 libc: Fix possible overflow in binuptime().
This is an application of the kernel overflow fix from r357948 to
userspace, based on the algorithm developed by Bruce Evans. To keep
the ABI of the vds_timekeep stable, instead of adding the large_delta
member, MSB of both multipliers are added to quickly estimate the overflow.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2020-04-09 23:22:35 +00:00
John Baldwin
da8c654e99 Trim some duplicate EIO descriptions.
While here, drop an extra conjunction from the list of error
conditions for the remaining EIO description in symlink(2).

Discussed with:	mckusick (trimming duplicates)
MFC after:	2 weeks
2020-03-30 21:48:47 +00:00
John Baldwin
e42b096439 Document EINTEGRITY errors for many system calls.
EINTEGRITY was previously documented as a UFS-specific error for
mount(2).  This documents EINTEGRITY as a filesystem-independent error
that may be reported by the backing store of a filesystem.

While here, document EIO as a filesystem-independent error for both
mount(2) and posix_fadvise(2).  EIO was previously only documented for
UFS for mount(2).

Reviewed by:	mckusick
Suggested by:	mckusick
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24168
2020-03-30 21:44:00 +00:00
Warner Losh
56c995d658 exec{l,v}{e,p} arrived in 7th Edition research Unix to support the Bourne Shell
which introduced environment variables. Document that here. Verified by
consulting the TUHS archive.
2020-03-24 19:33:21 +00:00
Michael Tuexen
db4493f7b6 sendfile() does currently not support SCTP sockets.
Therefore, fail the call.

Reviewed by:		markj@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D24059
2020-03-13 18:38:28 +00:00
Kirk McKusick
95ca762da8 When mounting a UFS filesystem, return EINTEGRITY rather than EIO
when a superblock check-hash error is detected. This change clarifies
a mount that failed due to media hardware failures (EIO) from a mount
that failed due to media errors (EINTEGRITY) that can be corrected by
running fsck(8).

Sponsored by: Netflix
2020-03-11 21:00:40 +00:00
Ed Maste
0a052459e6 umtx_op.2: correct typo
PR:		244611
Submitted by:	John F. Carr <jfc@mit.edu>
MFC after:	3 days
2020-03-05 15:51:44 +00:00
Mateusz Piotrowski
a81c96922d thr_self.2: Fix some typos in the thread identifier range
Reported by:	kaktus
Approved by:	bcr (mentor)
Differential Revision:	https://reviews.freebsd.org/D23936
2020-03-03 09:51:53 +00:00
Ed Maste
acb8858f05 Return ENOTSUP for mmap/mprotect if prot not subset of prot_max
From POSIX,

[ENOTSUP]
    The implementation does not support the combination of accesses
    requested in the prot argument.

This fits the case that prot contains permissions which are not a subset
of prot_max.

Reviewed by:	brooks, cem
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23843
2020-02-26 20:03:43 +00:00
Warner Losh
a5b6c2960d Remove sparc64 specific parts of libc.
Also update comments for which architectures use 128 bit long doubles,
as appropriate.

The softfloat specialization routines weren't updated since they
appear to be from an upstream source which we may want to update in
the future to get a more favorable license.

Reviewed by: emaste@
Differential Revision:  https://reviews.freebsd.org/D23658
2020-02-26 18:55:09 +00:00
Ed Maste
5c6d07fb9c mprotect.2: sort errors alphabetically
Reported by:	brooks
MFC after:	3 days
2020-02-26 18:46:41 +00:00
Eric van Gyzen
3ae8839afe truncate(2): extending the file is required by POSIX 2008
Update the man page to mention that extending a file with truncate(2)
is required by POSIX as of 2008.

Reviewed by:	bcr
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23354
2020-02-20 23:47:09 +00:00
Konstantin Belousov
146fc63fce Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery.  Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.

The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).

With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.

The syscall is not exported from the stable libc version namespace on
purpose.  It is intended to be used only by our C runtime
implementation internals.

Tested by:	pho
Disscussed with:	cem, emaste, jilles
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
Kyle Evans
6a5abb1ee5 Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23247
2020-02-02 16:34:57 +00:00
Mateusz Guzik
d3cc535474 vfs: provide F_ISUNIONSTACK as a kludge for libc
Prior to introduction of this op libc's readdir would call fstatfs(2), in
effect unnecessarily copying kilobytes of data just to check fs name and a
mount flag.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D23162
2020-01-17 14:42:25 +00:00
Conrad Meyer
86def3dcd6 getrandom(2): Add Linux GRND_INSECURE API flag
Treat it as a synonym for GRND_NONBLOCK.  The reasoning is this:

We have two choices for handling Linux's GRND_INSECURE API flag.

1. We could ignore it completely (like GRND_RANDOM).  However, this might
produce the surprising result of GRND_INSECURE requests blocking, when the
Linux API does not block.

2. Alternatively, we could treat GRND_INSECURE requests as requests for
GRND_NONBLOCk.  Here, the surprising result for Linux programs is that
invocations with unseeded random(4) will produce EAGAIN, rather than
garbage.

Honoring the flag in the way Linux does seems fraught.  If we actually use
the output of a random(4) implementation prior to seeding, we leak some
entropy (in an information theory and also practical sense) from what will
be the initial seed to attackers (or allow attackers to arbitrary DoS
initial seeding, if we don't leak).  This seems unacceptable -- it defeats
the purpose of blocking on initial seeding.

Secondary to that concern, before seeding we may have arbitrarily little
entropy collected; producing output from zero or a handful of entropy bits
does not seem particularly useful to userspace.

If userspace can accept garbage, insecure, non-random bytes, they can create
their own insecure garbage with srandom(time(NULL)) or similar.  Any program
which would be satisfied with a 3-bit key CTR stream has no need for CSPRNG
bytes.  So asking the kernel to produce such an output from the secure
getrandom(2) API seems inane.

For now, we've elected to emulate GRND_INSECURE as an alternative spelling
of GRND_NONBLOCK (2).  Consider this API not-quite stable for now.  We
guarantee it will never block.  But we will attempt to monitor actual port
uptake of this bizarre API and may revise our plans for the unseeded
behavior (prior stable/13 branching).

Approved by:	csprng(markm), manpages(bcr)
See also:	https://lwn.net/ml/linux-kernel/cover.1577088521.git.luto@kernel.org/
See also:	https://lwn.net/ml/linux-kernel/20200107204400.GH3619@mit.edu/
Differential Revision:	https://reviews.freebsd.org/D23130
2020-01-12 20:47:38 +00:00
Kyle Evans
2856d85ecb posix_fallocate: push vnop implementation into the fileop layer
This opens the door for other descriptor types to implement
posix_fallocate(2) as needed.

Reviewed by:	kib, bcr (manpages)
Differential Revision:	https://reviews.freebsd.org/D23042
2020-01-08 19:05:32 +00:00
Konstantin Belousov
0cc9fb7551 Only return EPERM from kill(-pid) when no process was signalled.
As mandated by POSIX.  Also clarify the kill(2) manpage.

While there, restructure the code in killpg1() to use helper which
keeps overall state of the process list iteration in the killpg1_ctx
structued, later used to infer the error returned.

Reported by:	amdmi3
Reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D22621
2019-12-07 18:07:49 +00:00
Alan Somers
8d3443b1fc clock_gettime(2): add a HISTORY section
MFC after:	2 weeks
2019-12-07 16:45:12 +00:00
Alan Somers
fbf7102d14 lio_listio(2): add a HISTORY section
MFC after:	2 weeks
2019-12-07 16:29:56 +00:00
Warner Losh
f86e60008b Regularize my copyright notice
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
  All Rights Reserved on same line as other copyright holders (but not
  me). Other such holders are also listed last where it's clear.
2019-12-04 16:56:11 +00:00
Mark Johnston
a6d05b9be7 Fix typos in the cpuset_{get,set}domain() man page.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-11-22 16:25:00 +00:00
Rick Macklem
51e069ac10 Update the copy_file_range man page to reflect the semantic change
done by r354574.

This is a content change.
2019-11-10 01:13:41 +00:00
Rick Macklem
fef163e117 Update the copy_file_range.2 man page to reflect the semantic change
implemented by r354564.

This is a content change.
2019-11-08 23:49:27 +00:00
Kyle Evans
142c5c8c36 memfd_create(3): Don't actually force hugetlb size with MFD_HUGETLB
The size flags are only required to select a size on systems that support
multiple sizes. MFD_HUGETLB by itself is valid.
2019-09-29 17:30:10 +00:00
Warner Losh
ab311b7f12 Revert the mode_t -> int changes and add a warning in the BUGS section instead.
While FreeBSD's implementation of these expect an int inside of libc, that's an
implementation detail that we can hide from the user as it's the natural
promotion of the current mode_t type and before it is used in the kernel, it's
converted back to the narrower type that's the current definition of mode_t. As
such, documenting int is at best confusing and at worst misleading. Instead add
a note that these args are variadic and as such calling conventions may differ
from non-variadic arguments.
2019-09-28 17:15:48 +00:00
Warner Losh
4470d73996 Document varadic args as int, since you can't have short varadic args (they are
promoted to ints).

- `mode_t` is `uint16_t` (`sys/sys/_types.h`)
- `openat` takes variadic args
- variadic args cannot be 16-bit, and indeed the code uses int
- the manpage currently kinda implies the argument is 16-bit by saying `mode_t`

Prompted by Rust things: https://github.com/tailhook/openat/issues/21
Submitted by: Greg V at unrelenting
Differential Revision: https://reviews.freebsd.org/D21816
2019-09-27 16:11:47 +00:00
Kyle Evans
e12ff89136 Further normalize copyright notices
- s/C/c/ where I've been inconsistent about it
- +SPDX tags
- Remove "All rights reserved" where possible

Requested by:	rgrimes (all rights reserved)
2019-09-26 16:19:22 +00:00
David Bright
d4f4430503 Correct mistake in MLINKS introduced in r352747
Messed up a merge conflict resolution and didn't catch that before
commit.

Sponsored by:	Dell EMC Isilon
2019-09-26 16:13:17 +00:00
David Bright
9afb12bab4 Add an shm_rename syscall
Add an atomic shm rename operation, similar in spirit to a file
rename. Atomically unlink an shm from a source path and link it to a
destination path. If an existing shm is linked at the destination
path, unlink it as part of the same atomic operation. The caller needs
the same permissions as shm_unlink to the shm being renamed, and the
same permissions for the shm at the destination which is being
unlinked, if it exists. If those fail, EACCES is returned, as with the
other shm_* syscalls.

truss support is included; audit support will come later.

This commit includes only the implementation; the sysent-generated
bits will come in a follow-on commit.

Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	jilles (earlier revision)
Reviewed by:	brueffer (manpages, earlier revision)
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21423
2019-09-26 15:32:28 +00:00
Kyle Evans
a631497fca Add SPDX tags to recently added files
Reported by:	Pawel Biernacki
2019-09-25 22:53:30 +00:00
Kyle Evans
c34a5f16fa posix_spawn(3): handle potential signal issues with vfork
Described in [1], signal handlers running in a vfork child have
opportunities to corrupt the parent's state. Address this by adding a new
rfork(2) flag, RFSPAWN, that has vfork(2) semantics but also resets signal
handlers in the child during creation.

x86 uses rfork_thread(3) instead of a direct rfork(2) because rfork with
RFMEM/RFSPAWN cannot work when the return address is stored on the stack --
further information about this problem is described under RFMEM in the
rfork(2) man page.

Addressing this has been identified as a prerequisite to using posix_spawn
in subprocess on FreeBSD [2].

[1] https://ewontfix.com/7/
[2] https://bugs.python.org/issue35823

Reviewed by:	jilles, kib
Differential Revision:	https://reviews.freebsd.org/D19058
2019-09-25 19:22:03 +00:00
Kyle Evans
079c5b9ed8 rfork(2): add RFSPAWN flag
When RFSPAWN is passed, rfork exhibits vfork(2) semantics but also resets
signal handlers in the child during creation to avoid a point of corruption
of parent state from the child.

This flag will be used by posix_spawn(3) to handle potential signal issues.

Reviewed by:	jilles, kib
Differential Revision:	https://reviews.freebsd.org/D19058
2019-09-25 19:20:41 +00:00
Kyle Evans
a9ac5e1424 sysent: regenerate after r352705
This also implements it, fixes kdump, and removes no longer needed bits from
lib/libc/sys/shm_open.c for the interim.
2019-09-25 18:09:19 +00:00
Kyle Evans
3e25d1fb61 Add linux-compatible memfd_create
memfd_create is effectively a SHM_ANON shm_open(2) mapping with optional
CLOEXEC and file sealing support. This is used by some mesa parts, some
linux libs, and qemu can also take advantage of it and uses the sealing to
prevent resizing the region.

This reimplements shm_open in terms of shm_open2(2) at the same time.

shm_open(2) will be moved to COMPAT12 shortly.

Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D21393
2019-09-25 18:03:18 +00:00
Kyle Evans
f17221ee7a Update fcntl(2) after r352695 2019-09-25 17:33:12 +00:00
Sean Eric Fagan
ba7a55d934 Add two options to allow mount to avoid covering up existing mount points.
The two options are

* nocover/cover:  Prevent/allow mounting over an existing root mountpoint.
E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local
is already a mountpoint.
* emptydir/noemptydir:  Prevent/allow mounting on a non-empty directory.
E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail.

Neither of these options is intended to be a default, for historical and
compatibility reasons.

Reviewed by:	allanjude, kib
Differential Revision:	https://reviews.freebsd.org/D21458
2019-09-23 04:28:07 +00:00
Konstantin Belousov
55894117b1 Return EISDIR when directory is opened with O_CREAT without O_DIRECTORY.
Reviewed by:	bcr (man page), emaste (previous version)
PR:	240452
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
DIfferential revision:	https://reviews.freebsd.org/D21634
2019-09-17 18:32:18 +00:00
Alan Somers
8d910a4282 getsockopt.2: clarify that SO_TIMESTAMP is not 100% reliable
When SO_TIMESTAMP is set, the kernel will attempt to attach a timestamp as
ancillary data to each IP datagram that is received on the socket. However,
it may fail, for example due to insufficient memory. In that case the
packet will still be received but not timestamp will be attached.

Reviewed by:	kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D21607
2019-09-11 19:48:32 +00:00
Mitchell Horne
d1bc2d79f2 Fix cpuwhich_t column width
Not bumping .Dd since this is purely a format change.

Approved by:	markj (mentor)
2019-09-08 21:37:52 +00:00
Konstantin Belousov
fe69291ff4 Add procctl(PROC_STACKGAP_CTL)
It allows a process to request that stack gap was not applied to its
stacks, retroactively.  Also it is possible to control the gaps in the
process after exec.

PR:	239894
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21352
2019-09-03 18:56:25 +00:00
Mateusz Guzik
d05b53e0ba Add sysctlbyname system call
Previously userspace would issue one syscall to resolve the sysctl and then
another one to actually use it. Do it all in one trip.

Fallback is provided in case newer libc happens to be running on an older
kernel.

Submitted by:	Pawel Biernacki
Reported by:	kib, brooks
Differential Revision:	https://reviews.freebsd.org/D17282
2019-09-03 04:16:30 +00:00
Ed Maste
3afdc7303c Add @generated tag to libc syscall asm wrappers
Although libc syscall wrappers do not get checked in this can aid in
finding the source of generated files when spelunking in the objdir.

Multiple tools use @generated to identify generated files (for example,
in a review Phabricator will by default hide diffs in generated files).
For consistency use the @generated tag in makesyscalls.sh as we've done
for other generated files, even though these wrappers aren't checked in
to the tree.
2019-08-16 14:14:57 +00:00
Konstantin Belousov
a60c863ced wait(2): clarify reparenting of children of the exiting process.
Point to the existence of reapers and mention that init is the default
reaper.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-08-11 15:47:48 +00:00
Konstantin Belousov
cd6a6b772d wait(2): split long line by using .Fo/.Fa instead of .Ft.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-08-11 15:44:36 +00:00
Benjamin Kaduk
1f0a85545e Fix grammar nit in copy_file_range docs
Bytes are countable, so we have fewer of them, not less of them.
2019-07-25 15:43:15 +00:00
Rick Macklem
78756b9e6f Add libc support for the copy_file_range(2) syscall added by r350315.
copy_file_range.2 is a new man page (content change).

Reviewed by:	kib, asomers
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20584
2019-07-25 06:05:49 +00:00
John Baldwin
32451fb9fc Add ptrace op PT_GET_SC_RET.
This ptrace operation returns a structure containing the error and
return values from the current system call.  It is only valid when a
thread is stopped during a system call exit (PL_FLAG_SCX is set).

The sr_error member holds the error value from the system call.  Note
that this error value is the native FreeBSD error value that has _not_
been translated to an ABI-specific error value similar to the values
logged to ktrace.

If sr_error is zero, then the return values of the system call will be
set in sr_retval[0] and sr_retval[1].

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D20901
2019-07-15 21:48:02 +00:00
Konstantin Belousov
8c95181495 Document atomicity for read(2) and write(2).
Take part of the text from POSIX 2018 edition and describe the
atomicity requirements for read and write syscalls.  See p1003.1-2018,
Vol.2, 2.9.7 Threads interaction with Regular File Operations.

Reviewed by:	asomers
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D20867
2019-07-06 20:31:37 +00:00
Konstantin Belousov
5dc7e31a09 Control implicit PROT_MAX() using procctl(2) and the FreeBSD note
feature bit.

In particular, allocate the bit to opt-out the image from implicit
PROTMAX enablement.  Provide procctl(2) verbs to set and query
implicit PROTMAX handling.  The knobs mimic the same per-image flag
and per-process controls for ASLR.

Reviewed by:	emaste, markj (previous version)
Discussed with:	brooks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D20795
2019-07-02 19:07:17 +00:00
Konstantin Belousov
e0a126f6d2 Typo.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-06-28 16:42:44 +00:00
Brooks Davis
ee37749af6 Add PROT_MAX to the HISTORY section.
In the case of mmap(), add a HISTORY section.  Mention that mmap() and
mprotect()'s documentation predates an implementation.  The
implementation first saw wide use in 4.3-Reno, but there seems to be no
easy way to express that in mdoc so stick with 4.4BSD.

Reviewed by:	emaste
Requested by:	cem
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20713
2019-06-20 21:52:30 +00:00
Brooks Davis
74a1b66cf4 Extend mmap/mprotect API to specify the max page protections.
A new macro PROT_MAX() alters a protection value so it can be OR'd with
a regular protection value to specify the maximum permissions.  If
present, these flags specify the maximum permissions.

While these flags are non-portable, they can be used in portable code
with simple ifdefs to expand PROT_MAX() to 0.

This change allows (e.g.) a region that must be writable during run-time
linking or JIT code generation to be made permanently read+execute after
writes are complete.  This complements W^X protections allowing more
precise control by the programmer.

This change alters mprotect argument checking and returns an error when
unhandled protection flags are set.  This differs from POSIX (in that
POSIX only specifies an error), but is the documented behavior on Linux
and more closely matches historical mmap behavior.

In addition to explicit setting of the maximum permissions, an
experimental sysctl vm.imply_prot_max causes mmap to assume that the
initial permissions requested should be the maximum when the sysctl is
set to 1.  PROT_NONE mappings are excluded from this for compatibility
with rtld and other consumers that use such mappings to reserve
address space before mapping contents into part of the reservation.  A
final version this is expected to provide per-binary and per-process
opt-in/out options and this sysctl will go away in its current form.
As such it is undocumented.

Reviewed by:	emaste, kib (prior version), markj
Additional suggestions from:	alc
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18880
2019-06-20 18:24:16 +00:00
Alan Somers
5993fa5582 open(2): fix the description of O_FSYNC
The man page claims that with O_FSYNC (aka O_SYNC) the kernel will not cache
written data. However, that's not true. Nor does POSIX require it.
Perhaps it was true when that section of the man page was written in r69336
(I haven't checked). But it's not true now.  Now the effect is simply that
writes are sent to disk immediately and synchronously, but they're still
cached.

See also: https://pubs.opengroup.org/onlinepubs/9699919799/
See also: ffs_write in sys/ufs/ffs/ffs_vnops.c

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20641
2019-06-14 20:35:37 +00:00
Mariusz Zaborski
5c816e43b4 unlink: add missing function to unlink.2 man page 2019-06-05 22:36:19 +00:00
Alan Somers
8bbd9a3839 Link fhlinkat(2) man page
Reviewed by:	kib
MFC after:	3 days
MFC-With:	r341689
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20339
2019-05-22 01:11:21 +00:00
Mark Johnston
54a3a11421 Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes.  User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.

The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2).  Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks.  In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process.  The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.

The choice to count virtual user-wired pages rather than physical
pages was done for simplicity.  There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.

The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded.  For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.

Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM.  Users that wish to exceed the limit must tune
vm.max_user_wired.

Reviewed by:	kib, ngie (mlock() test changes)
Tested by:	pho (earlier version)
MFC after:	45 days
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19908
2019-05-13 16:38:48 +00:00
Edward Tomasz Napierala
9b7448fcad .Xr protect(1) and proccontrol(1) from procctl(2).
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-04-09 10:09:59 +00:00
Mariusz Zaborski
a1304030b8 Introduce funlinkat syscall that always us to check if we are removing
the file associated with the given file descriptor.

Reviewed by:	kib, asomers
Reviewed by:	cem, jilles, brooks (they reviewed previous version)
Discussed with:	pjd, and many others
Differential Revision:	https://reviews.freebsd.org/D14567
2019-04-06 09:34:26 +00:00
Konstantin Belousov
5d00c5a657 Fix initial exec TLS mode for dynamically loaded shared objects.
If dso uses initial exec TLS mode, rtld tries to allocate TLS in
static space. If there is no space left, the dlopen(3) fails. If space
if allocated, initial content from PT_TLS segment is distributed to
all threads' pcbs, which was missed and caused un-initialized TLS
segment for such dso after dlopen(3).

The mode is auto-detected either due to the relocation used, or if the
DF_STATIC_TLS dynamic flag is set.  In the later case, the TLS segment
is tried to allocate earlier, which increases chance of the dlopen(3)
to succeed.  LLD was recently fixed to properly emit the flag, ld.bdf
did it always.

Initial test by:	dumbbell
Tested by:	emaste (amd64), ian (arm)
Tested by:	Gerald Aryeetey <aryeeteygerald_rogers.com> (arm64)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19072
2019-03-29 17:52:57 +00:00
Ed Maste
f6a10ccc53 Use consistent struct stat arg name in stat man page
stat, lstat, and fstat use `*sb` as the stat struct pointer arg name,
while fstatat previously used `*buf`.

MFC after:	1 week
2019-03-13 15:18:14 +00:00
Ed Maste
d95826c43d poll.2: POLLNVAL is returned also for insufficient rights
Reported by:	"Bora Özarslan" <borako.ozarslan@gmail.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-02-27 17:52:22 +00:00
Konstantin Belousov
9fb91a0a7d procctl(2): document ASLR knobs.
Reviewed by:	0mp
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D19308
2019-02-26 17:41:41 +00:00
Konstantin Belousov
80a3fa4893 procctl(2): fix -width parameter to .Bl.
According to 0mp, macros are not expanded in the argument provided to
-width.  Use plain identifiers for width specification.

Noted and reviewed by:	0mp
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D19308
2019-02-26 17:35:06 +00:00
Gleb Smirnoff
5bfb2e008d Imaginary cat jumped my keyboard! 2019-02-15 23:46:34 +00:00
Gleb Smirnoff
66fb0b1ad7 For 32-bit machines rollback the default number of vnode pager pbufs
back to the lever before r343030.  For 64-bit machines reduce it slightly,
too.  Together with r343030 I bumped the limit up to the value we use at
Netflix to serve 100 Gbit/s of sendfile traffic, and it probably isn't a
good default.

Provide a loader tunable to change vnode pager pbufs count. Document it.
2019-02-15 23:36:22 +00:00
Sergey Kandaurov
78c8b9477c Document the ENOBUFS errno in setsockopt(2).
In particular, it is the case if SO_SNDBUF/SO_RCVBUF would exceed sb_max_adj.

PR:		200649
MFC after:	1 week
2019-02-09 21:33:32 +00:00
Enji Cooper
2b9ecf4896 Document that sendfile will return an invalid value for sbytes if provided an invalid address
This is meant to clarify the fact that the system call will not fail
with -1/EFAULT, as one might expect, when reading the sendfile(2)
manpage today.

While here, pet the mandoc linter, when dealing with the section that
describes valid values for `flags`.

PR:	232210
MFC after:	2 weeks
Approved by:	emaste (mentor)
Reviewed by:	glebius, 0mp
Differential Revision: https://reviews.freebsd.org/D18949
2019-01-25 19:56:02 +00:00
Kirk McKusick
88640c0e8b Create new EINTEGRITY error with message "Integrity check failed".
An integrity check such as a check-hash or a cross-correlation failed.
The integrity error falls between EINVAL that identifies errors in
parameters to a system call and EIO that identifies errors with the
underlying storage media. EINTEGRITY is typically raised by intermediate
kernel layers such as a filesystem or an in-kernel GEOM subsystem when
they detect inconsistencies. Uses include allowing the mount(8) command
to return a different exit value to automate the running of fsck(8)
during a system boot.

These changes make no use of the new error, they just add it. Later
commits will be made for the use of the new error number and it will
be added to additional manual pages as appropriate.

Reviewed by:    gnn, dim, brueffer, imp
Discussed with: kib, cem, emaste, ed, jilles
Differential Revision: https://reviews.freebsd.org/D18765
2019-01-17 06:35:45 +00:00
Konstantin Belousov
ea7e7006db Implement shmat(2) flag SHM_REMAP.
Based on the description in Linux man page.

Reviewed by:	markj, ngie (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18837
2019-01-16 05:15:57 +00:00
Konstantin Belousov
3fbc2e00d1 Add a tunable which changes mincore(2) algorithm to only report data
from the local mapping.

Enable the setting by default.
The article behind the change: https://arxiv.org/abs/1901.01161

Reviewed by:	markj
Discussed with:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18764
2019-01-07 22:10:48 +00:00
Jilles Tjoelker
8cc4b29d5a thr_wake(2): Minor mdoc fixes
MFC after:	1 week
2019-01-06 21:34:05 +00:00
Mark Johnston
2f2ddd68a5 Support MSG_DONTWAIT in send*(2).
As it does for recv*(2), MSG_DONTWAIT indicates that the call should
not block, returning EAGAIN instead.  Linux and OpenBSD both implement
this, so the change makes porting easier, especially since we do not
return EINVAL or so when unrecognized flags are specified.

Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	tuexen
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D18728
2019-01-04 17:31:50 +00:00
Konstantin Belousov
eba8ab0e3e Remove special case handling for getfhat(fd, NULL, handle).
There is no reason for it to behave differently from openat(fd, NULL).
Also the handling did not worked because the substituted path was from
the system address space, causing EFAULT.

Submitted by:	Jack Halford <jack@gandi.net>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18501
2018-12-11 02:48:49 +00:00
Konstantin Belousov
d1fd400a80 Add new file handle system calls.
Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2).  The
syscalls are provided for a NFS userspace server (nfs-ganesha).

Submitted by:	Jack Halford <jack@gandi.net>
Sponsored by:	Gandi.net
Tested by:	pho
Feedback from:	brooks, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18359
2018-12-07 15:17:29 +00:00
Alan Somers
006678fd05 stat(2): clarify which syscalls modify file timestamps
The list of syscalls that modify st_atim, st_mtim, and st_ctim was quite out
of date and probably not accurate to begin with.  Update it, and make it
clear that the list is open-ended.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D18410
2018-12-05 17:28:40 +00:00
Alan Somers
a14a34ef62 fcntl.2: document an additional error condition
MFC after:	2 weeks
2018-11-15 16:13:25 +00:00
Konstantin Belousov
1c4ca77890 Add d_off support for multiple filesystems.
The d_off field has been added to the dirent structure recently.
Currently filesystems don't support this feature.  Support has been
added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs.
A stub implementation is available for cd9660, nandfs, udf and
pseudofs but hasn't been tested.

Motivation for this feature: our usecase is for a userspace nfs server
(nfs-ganesha) with zfs.  At the moment we cache direntry offsets by
calling lseek once per entry, with this patch we can get the offset
directly from getdirentries(2) calls which provides a significant
speedup.

Submitted by:	Jack Halford <jack@gandi.net>
Reviewed by:	mckusick, pfg, rmacklem (previous versions)
Sponsored by:	Gandi.net
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17917
2018-11-14 14:18:35 +00:00
Konstantin Belousov
5b1fb8ec66 First draft of documentation for AT/O_BENEATH handling of the absolute
paths.

It was decided that committing the code and drafting of the man page
update is better than allowing the code to rot until wordsmithing
happens.

Reviewed by:	jilles (previous version)
Discussed with:	brooks, jilles, emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17714
2018-11-11 01:46:48 +00:00
Conrad Meyer
78c2a9806e kern_poll: Restore explanatory comment removed in r177374
The comment isn't stale.  The check is bogus in the sense that poll(2)
does not require pollfd entries to be unique in fd space, so there is no
reason there cannot be more pollfd entries than open or even allowed
fds.  The check is mostly a seatbelt against accidental misuse or
abuse.  FD_SETSIZE, while usually unrelated to poll, is used as an
arbitrary floor for systems with very low kern.maxfilesperproc.

Additionally, document this possible EINVAL condition in the poll.2
manual.

No functional change.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17671
2018-11-01 23:46:23 +00:00
Warner Losh
f64bccc6d9 Bump .Dd forgotten in last commit. 2018-10-28 03:02:09 +00:00
Warner Losh
5669c6748d Note that the kenrel doesn't keep track daylight savings time, nor
timezone offset. These values are generally zero.

While one still theoreticall could set these values, that's almost
never done. Users wishing to have an offset between the time of day
clock hardware and UTC use adjkerntz(8) instead.

localtime(3) should be used to find these values for the current
timezone.
2018-10-28 02:58:22 +00:00
Konstantin Belousov
4f77f48884 Implement O_BENEATH and AT_BENEATH.
Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory.  Supposedly the interface is similar
to the same proposed Linux flags.

Reviewed by:	jilles (code, previous version of manpages), 0mp (manpages)
Discussed with:	allanjude, emaste, jonathan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17547
2018-10-25 22:16:34 +00:00
Mark Johnston
b8e4cdda35 Clarify slightly the interaction between wait*() and pdfork().
There are multiple ways to wait for any child process to return a
status (e.g., waitpid(-1, ...), waitid(P_ALL, ...)), so don't be so
specific.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-10-24 18:42:13 +00:00
Poul-Henning Kamp
01652e9c8e Update example to something people less than 40 years old have heard about. 2018-10-21 07:30:26 +00:00
Edward Tomasz Napierala
edbedaf4dc Add .Xrs to kqueue(2) from pdfork(2) and procdesc(4), to make EVFILT_PROCDESC
easier to find.

Approved by:	re (rgrimes)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2018-10-14 18:42:54 +00:00
Allan Jude
c452913091 Document that sendfile(2) can return ENOTCAPABLE
PR:		232207
Submitted by:	Enji Cooper <yaneurabeya@gmail.com>
Approved by:	re (rgrimes)
2018-10-13 02:20:16 +00:00
Michael Tuexen
6b01d4d433 Add SOL_SOCKET level socket option with name SO_DOMAIN to get
the domain of a socket.

This is helpful when testing and Solaris and Linux have the same
socket option using the same name.

Reviewed by:		bcr@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16791
2018-08-21 14:04:30 +00:00
Mateusz Piotrowski
c8b8b38e5f Document socket control message routines for ancillary data access (CMSG_DATA).
PR:		227777
Reviewed by:	bcr, eadler
Approved by:	mat (mentor), manpages (bcr)
Obtained from:	OpenBSD
Differential Revision:	https://reviews.freebsd.org/D15215
2018-08-19 17:42:49 +00:00
Jamie Gritton
c542c43ef1 Revert r337922, except for some documention-only bits. This needs to wait
until user is changed to stop using jail(2).

Differential Revision:	D14791
2018-08-16 19:09:43 +00:00
Jamie Gritton
284001a222 Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating
jails since FreeBSD 7.

Along with the system call, put the various security.jail.allow_foo and
security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or
BURN_BRIDGES).  These sysctls had two disparate uses: on the system side,
they were global permissions for jails created via jail(2) which lacked
fine-grained permission controls; inside a jail, they're read-only
descriptions of what the current jail is allowed to do.  The first use
is obsolete along with jail(2), but keep them for the second-read-only use.

Differential Revision:	D14791
2018-08-16 18:40:16 +00:00
Conrad Meyer
ba9ace7436 settimeofday(2): Remove stale note about timezone
Contrary to the removed comment, the kernel does appear to use the timezone
argument of settimeofday.  The comment dates to the BSD4.4 import; I assume it
is just stale.
2018-08-04 22:08:24 +00:00
Ruslan Bukin
42570cd1d4 MAXLOGNAME changed to 33 in r243023.
Update man pages.

Sponsored by:	DARPA, AFRL
2018-08-03 16:05:03 +00:00
David Bright
95c05062ec Allow a EVFILT_TIMER kevent to be updated.
If a timer is updated (re-added) with a different time period
(specified in the .data field of the kevent), the new time period has
no effect; the timer will not expire until the original time has
elapsed. This violates the documented behavior as the kqueue(2) man
page says (in part) "Re-adding an existing event will modify the
parameters of the original event, and not result in a duplicate
entry."

This modification, adapted from a patch submitted by cem@ to PR214987,
fixes the kqueue system to allow updating a timer entry. The
kevent timer behavior is changed to:

  * When a timer is re-added, update the timer parameters to and
    re-start the timer using the new parameters.
  * Allow updating both active and already expired timers.
  * When the timer has already expired, dequeue any undelivered events
    and clear the count of expirations.

All of these changes address the original PR and also bring the
FreeBSD and macOS kevent timer behaviors into agreement.

A few other changes were made along the way:

  * Update the kqueue(2) man page to reflect the new timer behavior.
  * Fix man page style issues in kqueue(2) diagnosed by igor.
  * Update the timer libkqueue system test to test for the updated
    timer behavior.
  * Fix the (test) libkqueue common.h file so that it includes
    config.h which defines various HAVE_* feature defines, before the
    #if tests for such variables in common.h. This enables the use of
    the actual err(3) family of functions.
  * Fix the usages of the err(3) functions in the tests for incorrect
    type of variables. Those were formerly undiagnosed due to the
    disablement of the err(3) functions (see previous bullet point).

PR:		214987
Reported by:	Brian Wellington <bwelling@xbill.org>
Reviewed by:	kib
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D15778
2018-07-27 13:49:17 +00:00
Konstantin Belousov
b3042426d0 Remove bits of the old NUMA.
Remove numactl(1), edit numa(4) to bring it some closer to reality,
provide libc ABI shims for old NUMA syscalls.

Noted and reviewed by:	brooks (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D16142
2018-07-10 22:00:20 +00:00
Brooks Davis
7cc923f8a8 Get rid of netbsd_lchown and netbsd_msync syscall entries.
No valid FreeBSD binary very called them (they would call lchown and
msync directly) and we haven't supported NetBSD binaries in ages.

This is a respin of r335983 with a workaround for the ancient BFD linker
in the libc stubs.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16193
2018-07-10 13:32:04 +00:00
Warner Losh
bdea3adca6 Tweak documentation to RB_ constants to reflect current use
RB_ASKNAME is no longer instructions to the boot loader to request a
prompt for which kernel to boot. Instead, it asks for what the root
file system to use. RB_INITNAME is unused, and never has been in
FreeBSD as far as I can tell. Remove it from the documentation and fix
comment. RB_SELFTEST and RB_MINIROOT likewise (though they were
completely undocumented). These last three constants can likely just
be deleted as nothing references them (even to set useless bits).

RB_ASKNAME doesn't actually survive reboot, however, so needs to be
communicated to the bootloader via other means. If the bootloader sets
it, though, it will be honored.
2018-07-10 00:01:14 +00:00
Brooks Davis
714c03c81e Revert r335983.
The bfd linker in tree doesn't support multiple names for the same
symbol (at least with current flags).
2018-07-05 16:03:03 +00:00
Brooks Davis
5b04a71dae Get rid of netbsd_lchown and netbsd_msync syscall entries.
No valid FreeBSD binary ever called them (they would call lchown and
msync directly) and we haven't supported NetBSD binaries in ages.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15814
2018-07-05 14:12:56 +00:00
Conrad Meyer
e02d32f72e sigaction.2: Minor cleanups
Add vertical space between struct definition and function prototype.

Use "NULL" to describe zero pointers, instead of "zero."

Remove perhaps unclear "can not" and replace.  Tag struct member names used
with appropriate tags.
2018-06-28 18:17:20 +00:00
Ian Lepore
25b10ed4b7 Add some words clarifying that rename(2) does nothing when the 'from' and
'to' args are the same file.  Wording borrowed from POSIX.1-2017, but
the freebsd code to implement this behavior was added in 2002 (r103180).
2018-06-21 15:21:17 +00:00
Sean Bruno
1a43cff92a Load balance sockets with new SO_REUSEPORT_LB option.
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).

This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967.  Thanks to rwatson@
for the substantive feedback that is included in this commit.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Obtained from:	DragonflyBSD
Relnotes:	Yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003
2018-06-06 15:45:57 +00:00
Mark Johnston
9f9c9b22ec Reimplement brk() and sbrk() to avoid the use of _end.
Previously, libc.so would initialize its notion of the break address
using _end, a special symbol emitted by the static linker following
the bss section.  Compatibility issues between lld and ld.bfd could
cause the wrong definition of _end (libc.so's definition rather than
that of the executable) to be used, breaking the brk()/sbrk()
interface.

Avoid this problem and future interoperability issues by simply not
relying on _end.  Instead, modify the break() system call to return
the kernel's view of the current break address, and have libc
initialize its state using an extra syscall upon the first use of the
interface.  As a side effect, this appears to fix brk()/sbrk() usage
in executables run with rtld direct exec, since the kernel and libc.so
no longer maintain separate views of the process' break address.

PR:		228574
Reviewed by:	kib (previous version)
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D15663
2018-06-04 19:35:15 +00:00
Justin Hibbits
2e65567500 Added ptrace support for reading/writing powerpc VSX registers
Summary:
Added ptrace support for getting/setting the remaining part of the VSX registers
(the part that's not already covered by FPR or VR registers).

This is necessary to add support for VSX registers in debuggers.

Submitted by:	Luis Pires
Differential Revision: https://reviews.freebsd.org/D15458
2018-06-02 19:17:11 +00:00
Mark Johnston
e2c1730299 Remove an inaccuracy from mincore.2.
Super pages are supported on non-x86 architectures, so just remove the
incorrect note.  While here, change terminology to be consistent with
mmap.2.

MFC after:	1 week
2018-06-01 23:40:43 +00:00
Brooks Davis
7351a8bdb5 Make vadvise compat freebsd11.
The vadvise syscall (aka ovadvise) is undocumented and has always been
implmented as returning EINVAL.  Put the syscall under COMPAT11 and
provide a userspace implementation.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15557
2018-05-25 20:40:23 +00:00
Brooks Davis
2357535254 Indicate the brk/sbrk are deprecated and not portable.
More firmly suggest mmap(2) instead.

Include the history of arm64 and riscv shipping without brk/sbrk.

Mention that sbrk(0) produces unreliable results.

Reviewed by:	emaste, Marcin Cieślak
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15535
2018-05-24 18:32:54 +00:00
Konstantin Belousov
84ffdd6a81 Note that PT_SETSTEP is auto-cleared.
Wording and reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D15054
2018-05-23 17:55:30 +00:00
Sevan Janiyan
d3fff23be8 Use St macro for specifying C standards.
Reported by:	rgrimes@
2018-05-20 21:56:08 +00:00
Sevan Janiyan
d55b77df03 Fix a typo and remove an unneeded Tn macro as highlighted by mandoc -Tlint.
Submitted by:		Mateusz Piotrowski
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D15204
2018-05-20 20:28:17 +00:00
Lawrence Stewart
9891578a40 Plug a memory leak and potential NULL-pointer dereference introduced in r331214.
Each TCP connection that uses the system default cc_newreno(4) congestion
control algorithm module leaks a "struct newreno" (8 bytes of memory) at
connection initialisation time. The NULL-pointer dereference is only germane
when using the ABE feature, which is disabled by default.

While at it:

- Defer the allocation of memory until it is actually needed given that ABE is
  optional and disabled by default.

- Document the ENOMEM errno in getsockopt(2)/setsockopt(2).

- Document ENOMEM and ENOBUFS in tcp(4) as being synonymous given that they are
  used interchangeably throughout the code.

- Fix a few other nits also accidentally omitted from the original patch.

Reported by:	Harsh Jain on freebsd-net@
Tested by:	tjh@
Differential Revision:	https://reviews.freebsd.org/D15358
2018-05-17 02:46:27 +00:00
Konstantin Belousov
450cd8475a PROC_PDEATHSIG_CTL will appear first in 11.2.
Submitted by:	Thomas Munro
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D15399
2018-05-12 10:11:33 +00:00
Xin LI
b6f7731dba Remove "All rights reserved" from my files.
See r333391 for the rationale.

MFC after:	1 week
2018-05-10 06:41:08 +00:00
Eric van Gyzen
488ab515d6 Remove 'All rights reserved' from my files
See r333391 for the rationale.

Approved by:	emaste (for the Foundation copyright)
Sponsored by:	Dell EMC
2018-05-09 20:12:59 +00:00
Kyle Evans
1921252001 fcntl(2): Vaguely document that ENOTTY is possible, with light examples
Reported by:	vs (2006, FreeBSD 6.1-BETA3)
Reported by:	me (2018, angry debugging session)
MFC after:	1 month
2018-05-03 02:42:13 +00:00
Ed Maste
e2811155f1 Clarify bindat/connectat use with AT_FDCWD
Discovered during investigation into the PR - the description of
AT_FDCWD was somewhat confusing.

PR:		222632
Submitted by:	Jan Kokemüller <jan.kokemueller@gmail.com>
MFC after:	1 week
2018-04-30 17:16:17 +00:00
Konstantin Belousov
1302eea7bb Rename PROC_PDEATHSIG_SET -> PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_GET
-> PROC_PDEATHSIG_STATUS for consistency with other procctl(2)
operations names.

Requested by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2018-04-20 15:19:27 +00:00
Konstantin Belousov
b940886338 Add PROC_PDEATHSIG_SET to procctl interface.
Allow processes to request the delivery of a signal upon death of
their parent process.  Supposed consumer of the feature is PostgreSQL.

Submitted by:	Thomas Munro
Reviewed by:	jilles, mjg
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D15106
2018-04-18 21:31:13 +00:00
Edward Tomasz Napierala
604f1c416c Don't put multiple names on a single .Nm line. This fixes apropos(1)
output, from this:

strnlen, strlen, strlen,(3) - find length of string                                                                                                                                                     │·······

... to this:

strlen, strnlen(3) - find length of string

PR:		223525
MFC after:	2 weeks
2018-04-17 09:05:46 +00:00
Jeff Roberson
ac8f2d6e4b Add missing file from 4331508
Document cpuset_{get,set}domain()
2018-03-25 07:42:44 +00:00
Jeff Roberson
93f31533df Document new NUMA related syscalls and utility options.
Sponsored by:	Netflix, Dell/EMC Isilon
2018-03-24 23:58:44 +00:00
Conrad Meyer
08a7e74c7c getentropy(3): Fallback to kern.arandom sysctl on older kernels
On older kernels, when userspace program disables SIGSYS, catch ENOSYS and
emulate getrandom(2) syscall with the kern.arandom sysctl (via existing
arc4_sysctl wrapper).

Special care is taken to faithfully emulate EFAULT on NULL pointers, because
sysctl(3) as used by kern.arandom ignores NULL oldp.  (This was caught by
getentropy(3) ATF tests.)

Reported by:	kib
Reviewed by:	kib
Discussed with:	delphij
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14785
2018-03-21 23:52:37 +00:00
Conrad Meyer
e9ac27430c Implement getrandom(2) and getentropy(3)
The general idea here is to provide userspace programs with well-defined
sources of entropy, in a fashion that doesn't require opening a new file
descriptor (ulimits) or accessing paths (/dev/urandom may be restricted
by chroot or capsicum).

getrandom(2) is the more general API, and comes from the Linux world.
Since our urandom and random devices are identical, the GRND_RANDOM flag
is ignored.

getentropy(3) is added as a compatibility shim for the OpenBSD API.

truss(1) support is included.

Tests for both system calls are provided.  Coverage is believed to be at
least as comprehensive as LTP getrandom(2) test coverage.  Additionally,
instructions for running the LTP tests directly against FreeBSD are provided
in the "Test Plan" section of the Differential revision linked below.  (They
pass, of course.)

PR:		194204
Reported by:	David CARLIER <david.carlier AT hardenedbsd.org>
Discussed with:	cperciva, delphij, jhb, markj
Relnotes:	maybe
Differential Revision:	https://reviews.freebsd.org/D14500
2018-03-21 01:15:45 +00:00
Mark Johnston
f0eaf8ec5e Remove a lingering inaccuracy from mlock.2.
User wirings of the same address range don't stack.

Noted by:	Dan Nelson
MFC after:	3 days
2018-03-20 20:45:47 +00:00
Mark Johnston
d09fcbd30e Add a space between a section number and a following comma.
Fix some nits from igor while here.

MFC after:	3 days
2018-03-15 19:03:54 +00:00
Brooks Davis
b85a98949f Refer to SysV IPC permissions as numeric constants.
POSIX defines no macros for these permissions.

Also remove unneeded headers from synopsis.

PR:		225905
Reviewed by:	wblock
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14461
2018-03-04 20:06:02 +00:00
Brooks Davis
6d0fe480a8 Don't declare union semun in userspace unless _WANT_SEMUN is defined.
POSIX explicitly states that the application must declare union semun.
This makes no sense, but it is what it is.  This brings us into line
with Linux, MacOS/Darwin, and NetBSD.

In a ports exp-run a moderate number of ports fail due to a lack of
approprate autotools-like discovery mechanisms or local patches.  A
commit to address them will follow shortly.

PR:		224300, 224443 (exp-run)
Reviewed by:	emaste, jhb, kib
Exp-run by:	antoine
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14492
2018-03-02 22:32:53 +00:00
Brooks Davis
93e48a303a Rename kernel-only members of semid_ds and msgid_ds.
This deliberately breaks the API in preperation for future syscall
revisions which will remove these nonstandard members.

In an exp-run a single port (devel/qemu-user-static) was found to
use them which it did becuase it emulates system calls.  This has
been fixed in the ports tree.

PR:		224443 (exp-run)
Reviewed by:	kib, jhb (previous version)
Exp-run by:	antoine
Sponsored by:	DARPA, AFRP
Differential Revision:	https://reviews.freebsd.org/D14490
2018-03-02 22:10:48 +00:00
Conrad Meyer
e9180d6956 socketpair.2: Reference relevant POSIX standards
Sponsored by:	Dell EMC Isilon
2018-02-10 19:41:32 +00:00
Conrad Meyer
6e876d695e fsync.2: Cross-reference fsync(1)
Reported by:	rpokala
Sponsored by:	Dell EMC Isilon
2018-02-06 23:12:47 +00:00
Maxim Konovalov
c042d0ca4a o EMFILE errno documented.
PR:		219209
Submitted by:	yuri (with minor adjustment)
Reviewed by:	brooks
2018-01-26 08:38:26 +00:00
Kirk McKusick
4cfb30ed21 Update .Dd missed in -r328304.
Reported by: Bjoern Zeeb (bz)
MFC with:    328304
2018-01-24 22:36:21 +00:00
Kirk McKusick
8557409f20 In the C library, the setting up of the group array by various
utilities is done by calling gr_addgid() for each group to be
added (usually found by traversing /etc/group) then calling the
setgroups() system call after the group set has been created.
The gr_addgid() function (helpfully?) deduplicates the addition
of group members. So, if you call it to add a group member that
already exists, it is just dropped. Because group[0] is the
effective group-ID and is over-written when a setgid program
is run, The value in group[0] is usually duplicated so that
group value is not lost when a setgid program is run.

Historically this happened because the group value indicated
in the password file also appears in /etc/group (e.g., if you
are group staff in the password file, you will also appear in
the staff line in /etc/group). But, with the addition of the
deduplication, the attempt to add group staff was lost because
it already appeared in group[0]. So, the fix is to deduplicate
starting from group[1] which allows a duplicate of the entry in
group[0], but not in later entries.

There is some confusion about the setgroups system call because in
BSD it has (always) set the entire group including the egid group
(in group[0]). However, in Linux, it skips over group[0] and starts
setting from group[1]. See this comment from linux_setgroups:

      /*
       * cr_groups[0] holds egid. Setting the whole set from
       * the supplied set will cause egid to be changed too.
       * Keep cr_groups[0] unchanged to prevent that.
       */

To make it clear what the BSD setgroups system call does, I
added the following paragraph to the setgroups(2) manual page:

   The first entry of the group array (gidset[0]) is used as the effective
   group-ID for the process.  This entry is over-written when a setgid
   program is run.  To avoid losing access to the privileges of the
   gidset[0] entry, it should be duplicated later in the group array.
   By convention, this happens because the group value indicated in the
   password file also appears in /etc/group.  The group value in the
   password file is placed in gidset[0] and that value then gets added a
   second time when the /etc/group file is scanned to create the group set.

Reported by: Paul McMath  paulm at tetrardus.net
Reviewed by: kib
MFC after:   2 weeks
2018-01-23 22:18:45 +00:00
Alan Somers
76f9d2759b mlock(2): correct documentation for error conditions.
The man page is years out of date regarding errors. Our implementation _does_
allow unaligned addresses, and it _does_not_ check for negative lengths,
because the length is unsigned. It checks for overflow instead.

Update the tests accordingly.

Reviewed by:	bcr
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D13826
2018-01-22 21:45:54 +00:00
Jeff Roberson
3f289c3fcf Implement 'domainset', a cpuset based NUMA policy mechanism. This allows
userspace to control NUMA policy administratively and programmatically.

Implement domainset based iterators in the page layer.

Remove the now legacy numa_* syscalls.

Cleanup some header polution created by having seq.h in proc.h.

Reviewed by:	markj, kib
Discussed with:	alc
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13403
2018-01-12 22:48:23 +00:00
Eitan Adler
837fe32558 Fix a few more speelling errors
Reviewed by:		bjk
Reviewed by:		jilles (incl formal "accept")
Differential Revision:	https://reviews.freebsd.org/D13650
2017-12-28 01:31:28 +00:00
Benjamin Kaduk
9e6e05e43f Note that old sys/event.h required manual sys/types.h inclusion
ed fixed this in r313704 but older versions are still affected.
2017-12-07 01:50:17 +00:00
Ed Maste
19164ee6cd use @@@ instead of @@ in __sym_default
Using
    .symver foo,foo@@VER
causes foo and foo@@VER to be output to the .o file. This requires foo
to be weak since the linker handles foo@@VER as foo.

Using
    .symver foo,foo@@@VER
causes just foo@@ver to be output and avoid the need for making foo
weak. It also reduces the constraint on how exactly a linker has to
handle foo and foo@@VER being present.

Submitted by:	Rafael Espíndola
Reviewed by:	dim, kib
Differential Revision:	https://reviews.freebsd.org/D11653
2017-12-05 20:19:13 +00:00
Warner Losh
94ebc05f37 Fix missing .Dd bump 2017-12-01 22:52:45 +00:00
Warner Losh
8e0cd68ff4 Correct history for Unix 2nd Edition through 6th Edition for the
system calls. Man pages are missing for v2 and v5, so any entries for
those versions were inferred by new implementations of these functions
in libc.

Obtained from: http://www.tuhs.org/cgi-bin/utree.pl
2017-12-01 22:48:20 +00:00
Warner Losh
aeb71118e6 Mark all the system calls that were in 1st Edition Unix as such in the
HISTORY section. Note: Any system calls that were added prior to v7,
but after v1 weren't changed.

Obtained from: http://www.tuhs.org/cgi-bin/utree.pl?file=V1/man/man2
2017-12-01 22:26:36 +00:00
Pedro F. Giffuni
d915a14ef0 libc: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-25 17:12:48 +00:00