Commit Graph

1743 Commits

Author SHA1 Message Date
Konstantin Belousov
19bd0d9c85 Implement address space guards.
Guard, requested by the MAP_GUARD mmap(2) flag, prevents the reuse of
the allocated address space, but does not allow instantiation of the
pages in the range.  It is useful for more explicit support for usual
two-stage reserve then commit allocators, since it prevents accidental
instantiation of the mapping, e.g. by mprotect(2).

Use guards to reimplement stack grow code.  Explicitely track stack
grow area with the guard, including the stack guard page.  On stack
grow, trivial shift of the guard map entry and stack map entry limits
makes the stack expansion.  Move the code to detect stack grow and
call vm_map_growstack(), from vm_fault() into vm_map_lookup().

As result, it is impossible to get random mapping to occur in the
stack grow area, or to overlap the stack guard page.

Enable stack guard page by default.

Reviewed by:	alc, markj
Man page update reviewed by:	alc, bjk, emaste, markj, pho
Tested by:	pho, Qualys
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D11306 (man pages)
2017-06-24 17:01:11 +00:00
Konstantin Belousov
2351218ca8 Remove the description of MAP_HASSEMAPHORE.
The flag is not implemented, all FreeBSD architectures correctly
handle locks on normal cacheable mappings.  On the other hand, the
flag was specified by some software, so it is kept in the header as
nop.  Removal from the man page should discourage its use.

Reviewed by:	alc, bjk, emaste, markj, pho
MFC after:	3 days
X-Differential revision:	https://reviews.freebsd.org/D11306
2017-06-24 16:36:30 +00:00
Konstantin Belousov
287c1c8c13 Fix typo.
Noted by:	alc
MFC after:	3 days
2017-06-24 16:21:34 +00:00
Warner Losh
a639d52309 Be sure to free allocated statfs11 buffer.
Submitted by: Alistair Crooks
2017-06-24 00:28:35 +00:00
Warren Block
6d0f80c921 Remove redundant wording, minor edits for clarity.
MFC after:	1 week
Sponsored by:	iXsystems
2017-06-23 18:38:27 +00:00
Warner Losh
5ab191c42b Forward compatibility for ino64.
Add forward compatibility so that new binaries can run on old
kernels. If the new system call from ino64 isn't available on your
system, then the old one will be used and the results translated.  The
stat and statfs families of functions are fully emulated. While not
required by policy, in this case it is helpful to our users to provide
this compatibility. In this case, it allows rollback of the kernel
after installing a new userland should a problem be discovered. It
also prevents foot-shooting if a user does an install before rebooting
with the new kernel. Finally, it allows the use case where one needs
to run new binaries on an old kernel as part of an upgrade process.

The getdirentries family uses tricks that may not work on remote
filesystems. Specifically, it uses a buffer 1/4 the size requested to
get the data from he old syscall.

The code carefully uses direct syscalls for old system calls to avoid
referencing freebsd11_* symbols, which contaminate ld-elf.so.1's
export table due to its use of stat functions, which causes errno to
be incorrect in client programs due to the wrong *stat* function being
resolved in some cases.

This code should removed sometime after 12 is branched.

Tested on: 12-current binaries on a 10.3-beta kernel run and return
       consistent results. 12-current kernel and userland with
       packages from before ino64 was committed also work.

Differential Revision: https://reviews.freebsd.org/D11185
Reviewed by: kib@, emaste@
2017-06-23 18:06:20 +00:00
Alan Somers
09986d3bd2 Clarify usage of aio(4) with kqueue(2)
Reviewed by:	jhb
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D11299
2017-06-23 00:40:09 +00:00
Conrad Meyer
a13136cdb7 pdwait4(2): Remove documentation of vaporware
This syscall has never existed and is not at risk of existing any time soon.
Remove documentation referencing it, which has been wrong since FreeBSD 9.

Reported by:	allanjude@
2017-06-17 17:32:40 +00:00
Konstantin Belousov
2b34e84335 Add abstime kqueue(2) timers and expand struct kevent members.
This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.

To make this useful, data member of the struct kevent must be extended
to 64bit.  Using the opportunity, I also added ext members.  This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.

The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).

Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2).  Compat shims
are provided for both host native and compat32.

Requested by:	bapt
Reviewed by:	bapt, brooks, ngie (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D11025
2017-06-17 00:57:26 +00:00
Konstantin Belousov
d60fa657b2 Move the description of kern.kq_calloutmax sysctl into a new paragraph
for better presentation.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-06-16 23:25:11 +00:00
Konstantin Belousov
17c847c1ff Start a new sentence on the new line.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-06-16 23:17:31 +00:00
Maxim Sobolev
3d751650c1 Document st_flags in the stat(2).
Approved by:	mckusick,vangyzen,jilles
Differential Revision:	https://reviews.freebsd.org/D10852
2017-06-16 15:09:43 +00:00
Konstantin Belousov
b43ce76c77 Add ptrace(PT_GET_SC_ARGS) command to return debuggee' current syscall
arguments.

Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D11080
2017-06-12 21:15:43 +00:00
Mark Johnston
df633e60c9 Remove an inaccuracy from socket.2.
SOCK_SEQPACKET is implemented for several protocols.

MFC after:	1 week
2017-06-10 21:07:55 +00:00
Jilles Tjoelker
e0e0323354 libc: Remove futimens() and utimensat() compat stubs.
The futimens() and utimensat() compat stubs allowed using these functions on
kernels that did not have the system calls yet (10.2, old 11-current).

Also remove the documentation of the [ENOTSUP] error that could occur with
an old kernel.

A -DNO_CLEAN build may fail because the depend files refer to the deleted
files.
2017-06-07 21:21:14 +00:00
John Baldwin
60b67035f2 Remove stale cap_rights_get(2) manpage.
The documentation moved to section 3 several years ago, but
'man cap_rights_get' pulls up cap_rights_limit(2) (which is
MLINKed to cap_rights_get.2) instead of cap_rights_get(3).

MFC after:	1 week
2017-06-02 03:53:34 +00:00
Konstantin Belousov
a327b06f81 Mention that the basep argument to getdirentries(2) can be NULL.
Noted by:	dim
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D10972
2017-05-28 21:42:47 +00:00
Konstantin Belousov
3449821376 Update getdirentries(2) page for new struct dirent layout.
Sponsored by:	The FreeBSD Foundation
2017-05-28 09:29:53 +00:00
Edward Tomasz Napierala
a9a393b390 Don't end up manpage titles with a full stop.
MFC after:	2 weeks
2017-05-24 21:02:53 +00:00
Glen Barber
4adb408018 Update the "first appeared in" version in several manual pages.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2017-05-24 17:50:34 +00:00
Allan Jude
f299c47b52 Allow cpuset_{get,set}affinity in capabilities mode
bhyve was recently sandboxed with capsicum, and needs to be able to
control the CPU sets of its vcpu threads

Reviewed by:	emaste, oshogbo, rwatson
MFC after:	2 weeks
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D10170
2017-05-24 00:58:30 +00:00
Konstantin Belousov
6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
Enji Cooper
9227de8c73 kill(2): add missing section for sysctl(9)
Reported by:	make manlint
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-05-23 07:46:10 +00:00
Enji Cooper
945cb7775f ptrace(2): clean up trailing whitespace
Reviewed by:	make manlint
MFC after:	2 weeks
2017-05-23 07:45:29 +00:00
Enji Cooper
7661c8b028 open(2): fix manlint warnings
- Sort SEE ALSO .Xr entries.
- Sort sections (HISTORY comes after STANDARDS).

Reported by:	make manlint
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-05-23 07:44:43 +00:00
Enji Cooper
c0f64185c6 rctl_add_rule(2): fix manlint warnings
- Fix commas (either missing or misused) after .Nm entries in SYNOPSIS

Reported by:	make manlint
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-05-23 07:32:57 +00:00
Enji Cooper
60f14b3186 cap_enter(2): fix manlint issues
- Sort SEE ALSO section appropriately.
- Correct section for sysctl(9).

Reported by:	make manlint
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-05-23 07:31:03 +00:00
Enji Cooper
bb09af5f58 _umtx_op(2): fix minor manlint issues
- Sort .Xr entries in SEE ALSO section.
- Sort SEE ALSO and STANDARDS sections properly, in terms of the
  entire document.

Reported by:	make manlint
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-05-23 07:26:45 +00:00
Stephen J. Kiernan
f55d7dd1c9 Add information to open(2) man page about the O_VERIFY flag.
Reviewed by:	bjk wblock
Approved by:	sjg (mentor)
Obtained from:	Juniper Networks, Inc.
2017-05-15 19:32:26 +00:00
Brooks Davis
f19351aad8 Provide a freebsd32 implementation of sigqueue()
The previous misuse of sys_sigqueue() was sending random register or
stack garbage to 64-bit targets.  The freebsd32 implementation preserves
the sival_int member of value when signaling a 64-bit process.

Document the mixed ABI implementation of union sigval and the
incompability of sival_ptr with pointer integrity schemes.

Reviewed by:	kib, wblock
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D10605
2017-05-05 18:49:39 +00:00
Conrad Meyer
9ac9bc28bb cpuset.2: Document new API options
A follow-up to r317756.  Adrian will chase up the userspace cpuset(1)
additions.

Reported by:	kib@
Sponsored by:	Dell EMC Isilon
2017-05-03 18:46:33 +00:00
Sergey Kandaurov
b11ef6e971 Document kevent EVFILT_EMPTY.
Reviewed by:	hiren
X-MFC with:	r312277
2017-04-18 15:36:13 +00:00
Eric van Gyzen
16fe28bb9f clock_gettime.2: add some clock IDs
Add the CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clock_id
values to the clock_gettime(2) man page.  Reformat the excessively
long paragraph (sentence!) into a tag list.

Reported by:	jilles in https://reviews.freebsd.org/D10020
MFC after:	3 days
Sponsored by:	Dell EMC
2017-03-22 00:50:36 +00:00
Xin LI
73065ae826 Make space style consistent with earlier entries.
X-MFC with:	r315526
2017-03-20 03:47:15 +00:00
Eric van Gyzen
3f8455b090 Add clock_nanosleep()
Add a clock_nanosleep() syscall, as specified by POSIX.
Make nanosleep() a wrapper around it.

Attach the clock_nanosleep test from NetBSD. Adjust it for the
FreeBSD behavior of updating rmtp only when interrupted by a signal.
I believe this to be POSIX-compliant, since POSIX mentions the rmtp
parameter only in the paragraph about EINTR. This is also what
Linux does. (NetBSD updates rmtp unconditionally.)

Copy the whole nanosleep.2 man page from NetBSD because it is complete
and closely resembles the POSIX description. Edit, polish, and reword it
a bit, being sure to keep any relevant text from the FreeBSD page.

Reviewed by:	kib, ngie, jilles
MFC after:	3 weeks
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10020
2017-03-19 00:51:12 +00:00
Maxim Konovalov
a6c1047fce More trap_enotcap spelling fixes.
PR:		217839
Submitted by:	tobik
2017-03-16 13:19:38 +00:00
Maxim Konovalov
f24fc4834a Spell kern.trap_enotcap.
PR:		217836
Submitted by:	tobik
2017-03-16 12:16:23 +00:00
Xin LI
78d7964b46 Implement INHERIT_ZERO for minherit(2).
INHERIT_ZERO is an OpenBSD feature.

When a page is marked as such, it would be zeroed
upon fork().

This would be used in new arc4random(3) functions.

PR:	182610
Reviewed by:	kib (earlier version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D427
2017-03-14 17:10:42 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Eric van Gyzen
90c1b723a5 Make several improvements and corrections in the kenv(2) man page
MFC after:	3 days
Sponsored by:	Dell EMC
2017-02-21 19:51:41 +00:00
Ed Schouten
8eb15797b1 Remove unnecessary #includes from the kqueue(2) man page.
Now that <sys/event.h> can be included on its own, adjust the manual
page accordingly. Remove both unnecessary #include statements from the
synopsis and the example code.

While there, also add a note to the BUGS section to mention that
previous versions of this header file still depend on <sys/types.h>.

Reviewed by:	ngie, vangyzen
Differential Revision:	https://reviews.freebsd.org/D9605
2017-02-16 06:52:53 +00:00
Konstantin Belousov
987ff18184 Consistently handle negative or wrapping offsets in the mmap(2) syscalls.
For regular files and posix shared memory, POSIX requires that
[offset, offset + size) range is legitimate.  At the maping time,
check that offset is not negative.  Allowing negative offsets might
expose the data that filesystem put into vm_object for internal use,
esp. due to OFF_TO_IDX() signess treatment.  Fault handler verifies
that the mapped range is valid, assuming that mmap(2) checked that
arithmetic gives no undefined results.

For device mappings, leave the semantic of negative offsets to the
driver.  Correct object page index calculation to not erronously
propagate sign.

In either case, disallow overflow of offset + size.

Update mmap(2) man page to explain the requirement of the range
validity, and behaviour when the range becomes invalid after mapping.

Reported and tested by:	royger (previous version)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-02-12 21:05:44 +00:00
Jilles Tjoelker
e301fd984a Clean up documentation of AF_UNIX control messages.
Document AF_UNIX control messages in unix(4) only, not split between unix(4)
and recv(2).

Also, warn about LOCAL_CREDS effective uid/gid fields, since the write could
be from a setuid or setgid program (with the explicit SCM_CREDS and
LOCAL_PEERCRED, the credentials are read at such a time that it can be
assumed that the process intends for them to be used in this context).

Reviewed by:	wblock
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9298
2017-02-03 20:33:23 +00:00
Maxim Sobolev
dd1badb4a3 Improve wording around SO_TS_CLOCK documentation.
Submitted by:	wblock
Differential Revision:	https://reviews.freebsd.org/D9171
2017-01-20 18:37:14 +00:00
Warren Block
7fd5cf0544 Mention sendfile(2) by popular demand.
Submitted by:	alc, kib
MFC after:	1 week
Sponsored by:	iXsystems
Differential Revision:	https://reviews.freebsd.org/D9259
2017-01-20 17:29:59 +00:00
Enji Cooper
d0fd0203fb Replace dot-dot relative pathing with SRCTOP-relative paths where possible
This reduces build output, need for recalculating paths, and makes it clearer
which paths are relative to what areas in the source tree. The change in
performance over a locally mounted UFS filesystem was negligible in my testing,
but this may more positively impact other filesystems like NFS.

LIBC_SRCTOP was left alone so Juniper (and other users) can continue to
manipulate lib/libc/Makefile (and other Makefile.inc's under lib/libc) as
include Makefiles with custom options.

Discussed with:	marcel, sjg
MFC after:	1 week
Reviewed by:	emaste
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D9207
2017-01-20 03:23:24 +00:00
Hans Petter Selasky
f3e7afe2d7 Implement kernel support for hardware rate limited sockets.
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by:		wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision:	https://reviews.freebsd.org/D3687
Sponsored by:		Mellanox Technologies
MFC after:		3 months
2017-01-18 13:31:17 +00:00
Maxim Sobolev
339efd75a4 Add a new socket option SO_TS_CLOCK to pick from several different clock
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:

o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).

In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.

Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.

This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.

There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.

Reviewed by:    adrian, gnn
Differential Revision:  https://reviews.freebsd.org/D9171
2017-01-16 17:46:38 +00:00
Warren Block
b44047f3df Update the shm_open.2 man page to reflect objective reality.
PR:		215612
Submitted by:	rwatson
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9066
2017-01-13 19:41:02 +00:00
John Baldwin
34ed0c63c8 Rename the 'flags' argument to getfsstat() to 'mode' and validate it.
This argument is not a bitmask of flags, but only accepts a single value.
Fail with EINVAL if an invalid value is passed to 'flag'.  Rename the
'flags' argument to getmntinfo(3) to 'mode' as well to match.

This is a followup to r308088.

Reviewed by:	kib
MFC after:	1 month
2016-12-27 20:21:11 +00:00