Commit Graph

1450 Commits

Author SHA1 Message Date
jilles
299afd25fd Add accept4() system call.
The accept4() function, compared to accept(), allows setting the new file
descriptor atomically close-on-exec and explicitly controlling the
non-blocking status on the new socket. (Note that the latter point means
that accept() is not equivalent to any form of accept4().)

The linuxulator's accept4 implementation leaves a race window where the new
file descriptor is not close-on-exec because it calls sys_accept(). This
implementation leaves no such race window (by using falloc() flags). The
linuxulator could be fixed and simplified by using the new code.

Like accept(), accept4() is async-signal-safe, a cancellation point and
permitted in capability mode.
2013-05-01 20:10:21 +00:00
jilles
535eef9123 intro(2): Fix some errors in ENFILE and EMFILE descriptions.
MFC after:	1 week
2013-04-27 11:55:23 +00:00
jilles
41b0ffe4ca getdtablesize(2): Describe what this function actually does.
getdtablesize() returns the limit on new file descriptors; this says nothing
about existing descriptors.

MFC after:	1 week
2013-04-24 21:24:35 +00:00
pluknet
eda2444ae0 Keep up with negative addrlen check removal in r249649. 2013-04-22 09:18:50 +00:00
jilles
670f533a57 dup(2): Remove incorrect sentence about getdtablesize().
There are no getdtablesize() bounds on the file descriptor to be duplicated;
it only has to be open. If the RLIMIT_NOFILE rlimit was decreased after
opening the file descriptor, it may be greater than or equal to
getdtablesize() but still valid.

MFC after:	1 week
2013-04-21 19:42:04 +00:00
joel
3cd1c80381 Remove cross-references to nonexistent CPU_SET(3) manpage.
Also fix cpu_getaffinity(2) document title.

PR:		176317
Submitted by:	brucec
2013-04-21 06:46:41 +00:00
gnn
8bbd0c98b7 Correct the returned message lengths for timeval and bintime control
messages (SO_BINTIME, SO_TIMEVAL).

Obtained from:	phk
2013-04-05 18:09:43 +00:00
mdf
da578c6492 Fix return type of extattr_set_* and fix rmextattr(8) utility.
extattr_set_{fd,file,link} is logically a write(2)-like operation and
should return ssize_t, just like extattr_get_*.  Also, the user-space
utility was using an int for the return value of extattr_get_* and
extattr_list_*, both of which return an ssize_t.

MFC after:	1 week
2013-04-02 05:30:41 +00:00
jilles
e69d25bd00 accept(2): Mention inheritance of O_ASYNC and signal destination.
While almost nobody uses O_ASYNC, and rightly so, the inheritance of the
related properties across accept() is a portability issue like the
inheritance of O_NONBLOCK.
2013-03-26 22:46:56 +00:00
pjd
01401cc9bc Document chflagsat(2).
Obtained from:	jilles
2013-03-21 23:05:44 +00:00
pjd
635dbe90f2 Implement chflagsat(2) system call, similar to fchmodat(2), but operates on
file flags.

Reviewed by:	kib, jilles
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:59:01 +00:00
pjd
2a3cf7f364 - Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
  in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
  for lchflags(2)) stated that it was u_long. Now some related functions
  use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on:	arch
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:44:33 +00:00
jilles
bd09044d61 Allow O_CLOEXEC in posix_openpt() flags.
PR:		kern/162374
Reviewed by:	ed
2013-03-21 21:39:15 +00:00
jilles
c9066bd014 Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC.
This change allows creating file descriptors with close-on-exec set in some
situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and
socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file
descriptors (SCM_RIGHTS) atomically close-on-exec.

The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD.
MSG_CMSG_CLOEXEC is the first free bit for MSG_*.

The SOCK_* flags are not passed to MAC because this may cause incorrect
failures and can be done later via fcntl() anyway. On the other hand, audit
is expected to cope with the new flags.

For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags
argument.

Reviewed by:	kib
2013-03-19 20:58:17 +00:00
glebius
03b86a1452 There are actually two different cases when mlock(2) returns
ENOMEM. Clarify this, taking text from SUS.

Reviewed by:	kib
2013-03-19 05:44:25 +00:00
pjd
a31543af15 Add a note to the HISTORY section about lchflags(2) being introduced in
FreeBSD 5.0.
2013-03-16 22:44:14 +00:00
pjd
702516e70b - Implement two new system calls:
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
	int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

  which allow to bind and connect respectively to a UNIX domain socket with a
  path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
  the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
  in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwatson, jilles, kib, des
2013-03-02 21:11:30 +00:00
joel
4c5c303d09 mdoc: remove superfluous paragraph macro. 2013-03-02 06:55:55 +00:00
pjd
f07ebb8888 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
pjd
2485337005 Provide cap_sandboxed(3) function, which is a wrapper around cap_getmode(2)
system call, which has a nice property - it never fails, so it is a bit
easier to use. If there is no support for capability mode in the kernel
the function will return false (not in a sandbox). If the kernel is compiled
with the support for capability mode, the function will return true or false
depending if the calling process is in the capability mode sandbox or not
respectively.

Sponsored by:	The FreeBSD Foundation
2013-03-02 00:11:27 +00:00
pjd
cc57b32cb6 Put one file per line so it is easier to read diffs against those files. 2013-02-16 22:21:46 +00:00
ian
cfdf5de939 Make the F_READAHEAD option to fcntl(2) work as documented: a value of zero
now disables read-ahead.  It used to effectively restore the system default
readahead hueristic if it had been changed; a negative value now restores
the default.

Reviewed by:	kib
2013-02-13 15:09:16 +00:00
jilles
4633ab5d81 sigqueue(2): Fix typo (EEPERM -> EPERM).
MFC after:	3 days
2013-02-10 13:20:23 +00:00
eadler
ba1233bc71 Fix logic inversion.
PR:		docs/174966
Submitted by:	Christian Ullrich <chris+freebsd@chrullrich.net>
Approved by:	bcr (mentor)
2013-02-09 17:13:51 +00:00
kib
2a34503ae7 Document the detail of interaction between vfork and PT_TRACEME.
MFC after:	2 weeks
2013-02-07 15:36:24 +00:00
kib
305acfc1e8 Document the ERESTART translation to EINTR for devfs nodes.
Based on the submission by:	jilles
MFC after:	2 weeks
2013-02-07 15:11:43 +00:00
kib
3913be8cf6 Rework the __vdso_* symbols attributes to only make the symbols weak,
but use normal references instead of weak.  This makes the statically
linked binaries to use fast gettimeofday(2) by forcing the linker to
resolve references and providing the neccessary functions.

Reported by:	bde
Tested by:	marius (sparc64)
MFC after:	2 weeks
2013-01-30 12:48:16 +00:00
glebius
107e0c39a8 posix_fadvise(2) first appeared in FreeBSD 9.1 2013-01-23 10:50:52 +00:00
pjd
5ab6899eb1 Note that SIGCHLD is special and if ignored, won't be recorded by the filter. 2013-01-21 22:07:34 +00:00
zont
01fbf755f7 - Use standard RETURN VALUES section.
Approved by:	kib (mentor)
MFC after:	1 week
2013-01-15 14:09:08 +00:00
zont
1c06919638 - Update manual pages accordingly to r244384 and r244385.
Approved by:	kib (mentor)
MFC after:	1 week
2012-12-25 13:43:01 +00:00
kevlo
6170f80efb Document that socket(2) may fail with EAFNOSUPPORT if the family cannot
be found.

Reviewed by:	glebius
Obtained from:	NetBSD
2012-12-07 02:26:08 +00:00
kevlo
a8471d78be Document that bind(2) can fail with EAFNOSUPPORT.
Reviewed by:	glebius
2012-12-04 09:53:09 +00:00
kevlo
ec5aeeffd0 Document that getpeername(2) and getsockname(2) can fail with EINVAL.
Reviewed by:	glebius
2012-11-23 10:14:54 +00:00
kevlo
5c97761dcf Document that rtprio(2) and rtprio_thread(2) can fail with EFAULT
due to the invoked copyout(9).

Reviewed by:	davidxu
2012-11-16 09:56:25 +00:00
kevlo
0ebb652007 Document that sendfile(2) can fail with ENOBUFS.
Reviewed by:	glebius
2012-11-14 01:45:10 +00:00
kib
39bfcc5eed Document wait6() and waitid().
PR:	standards/170346
Submitted by:	"Jukka A. Ukkonen" <jau@iki.fi>
MFC after:	1 month
2012-11-13 12:56:42 +00:00
kib
37c97ba01b Implement the waitid() SUSv4 function using wait6() system call.
PR:	standards/170346
Submitted by:	"Jukka A. Ukkonen" <jau@iki.fi>
MFC after:	1 month
2012-11-13 12:55:52 +00:00
jilles
4be8d9368b fcntl(2): Fix typos in name of constant "F_DUP2FD_CLOEXEC".
MFC after:	1 week
2012-11-01 09:38:28 +00:00
eadler
7f780401e5 Update the kill(2) and killpg(2) man pages to the modern permission
checks. Also indicate killpg(2) is POSIX compliant.

Reviewed by:	jilles
Reviewed by:	wblock
Approved by:	cperciva
MFC after:	3 days
2012-10-22 03:37:00 +00:00
andre
6062df3e66 Grammar fixes to r241781.
Submitted by:	alc
2012-10-20 19:38:22 +00:00
andre
1211b4598c Hide the unfortunate named sysctl kern.ipc.somaxconn from sysctl -a
output and replace it with a new visible sysctl kern.ipc.acceptqueue
of the same functionality.  It specifies the maximum length of the
accept queue on a listen socket.

The old kern.ipc.somaxconn remains available for reading and writing
for compatibility reasons so that existing programs, scripts and
configurations continue to work.  There no plans to ever remove the
orginal and now hidden kern.ipc.somaxconn.
2012-10-20 12:53:14 +00:00
jilles
e4c096aeb4 sigaction(2),sigwait(2),sigwaitinfo(2): Remove [EFAULT] error condition.
Passing an invalid pointer results in undefined behaviour.

The wrappers in libthr access some of the data pointed to by the arguments
in userland, so that an invalid pointer will cause a signal and not an
[EFAULT] error return.

Furthermore, if the [EFAULT] error occurs when the kernel is writing, it is
not a proper error in the sense that the call still commits (changing the
signal disposition or accepting the signal).

MFC after:	1 week
2012-09-27 17:48:04 +00:00
kevlo
02ad3abcdd Remove the restrict qualifier to match function prototype. 2012-09-20 02:25:18 +00:00
glebius
78d32fb722 Describe in detail required conditions for receiving the SCM_CREDS
control message and suggest to use LOCAL_CREDS setsockopt() for
reliability.
2012-09-12 09:50:17 +00:00
jhb
dfb185f559 When WIFCONTINUED was added, the number of "first" macros grew from
three to four.

MFC after:	1 week
2012-09-05 11:55:53 +00:00
zeising
9e17ee4443 Add missing .Pp macro.
PR:		docs/170380
Submitted by:	Garrett Cooper <yanegomi@gmail.com>
Approved by:	joel (mentor)
2012-08-21 16:35:14 +00:00
davidxu
3f0806aa1f Implement syscall clock_getcpuclockid2, so we can get a clock id
for process, thread or others we want to support.
Use the syscall to implement POSIX API clock_getcpuclock and
pthread_getcpuclockid.

PR:	168417
2012-08-17 02:26:31 +00:00
kib
4e7839007a Document F_DUP2FD_CLOEXEC.
MFC after:	1 week
2012-07-27 10:41:53 +00:00
kib
e04825c920 (Incomplete) fixes for symbols visibility issues and style in fcntl.h.
Append '__' prefix to the tag of struct oflock, and put it under BSD
namespace. Structure is needed both by libc and kernel, thus cannot be
hidden under #ifdef _KERNEL.

Move a set of non-standard F_* and O_* constants into BSD namespace.
SUSv4 explicitely allows implemenation to pollute F_* and O_* names
after fcntl.h is included, but it costs us nothing to adhere
to the specification if exact POSIX compliance level is requested by
user code.

Change some spaces after #define to tabs.

Noted by and discussed with:	     bde
MFC after:   1 week
2012-07-21 13:02:11 +00:00