1526 Commits

Author SHA1 Message Date
Jilles Tjoelker
550ac4a8e8 wait(2): Add some possible caveats to standards section. 2013-09-07 11:41:52 +00:00
Jilles Tjoelker
75b1cda430 Update some signal man pages for multithreading. 2013-09-06 09:08:40 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Robert Watson
7b223d2286 Xref capsicum(4) and procdesc(4) from pdfork(2).
Suggested by:	sbruno
MFC after:	3 days
2013-08-28 20:00:25 +00:00
Joel Dahl
4d5c7c633b Remove EOL whitespace. 2013-08-22 16:02:20 +00:00
Kenneth D. Merry
7da1a731c6 Expand the use of stat(2) flags to allow storing some Windows/DOS
and CIFS file attributes as BSD stat(2) flags.

This work is intended to be compatible with ZFS, the Solaris CIFS
server's interaction with ZFS, somewhat compatible with MacOS X,
and of course compatible with Windows.

The Windows attributes that are implemented were chosen based on
the attributes that ZFS already supports.

The summary of the flags is as follows:

UF_SYSTEM:	Command line name: "system" or "usystem"
		ZFS name: XAT_SYSTEM, ZFS_SYSTEM
		Windows: FILE_ATTRIBUTE_SYSTEM

		This flag means that the file is used by the
		operating system.  FreeBSD does not enforce any
		special handling when this flag is set.

UF_SPARSE:	Command line name: "sparse" or "usparse"
		ZFS name: XAT_SPARSE, ZFS_SPARSE
		Windows: FILE_ATTRIBUTE_SPARSE_FILE

		This flag means that the file is sparse.  Although
		ZFS may modify this in some situations, there is
		not generally any special handling for this flag.

UF_OFFLINE:	Command line name: "offline" or "uoffline"
		ZFS name: XAT_OFFLINE, ZFS_OFFLINE
		Windows: FILE_ATTRIBUTE_OFFLINE

		This flag means that the file has been moved to
		offline storage.  FreeBSD does not have any special
		handling for this flag.

UF_REPARSE:	Command line name: "reparse" or "ureparse"
		ZFS name: XAT_REPARSE, ZFS_REPARSE
		Windows: FILE_ATTRIBUTE_REPARSE_POINT

		This flag means that the file is a Windows reparse
		point.  ZFS has special handling code for reparse
		points, but we don't currently have the other
		supporting infrastructure for them.

UF_HIDDEN:	Command line name: "hidden" or "uhidden"
		ZFS name: XAT_HIDDEN, ZFS_HIDDEN
		Windows: FILE_ATTRIBUTE_HIDDEN

		This flag means that the file may be excluded from
		a directory listing if the application honors it.
		FreeBSD has no special handling for this flag.

		The name and bit definition for UF_HIDDEN are
		identical to the definition in MacOS X.

UF_READONLY:	Command line name: "urdonly", "rdonly", "readonly"
		ZFS name: XAT_READONLY, ZFS_READONLY
		Windows: FILE_ATTRIBUTE_READONLY

		This flag means that the file may not written or
		appended, but its attributes may be changed.

		ZFS currently enforces this flag, but Illumos
		developers have discussed disabling enforcement.

		The behavior of this flag is different than MacOS X.
		MacOS X uses UF_IMMUTABLE to represent the DOS
		readonly permission, but that flag has a stronger
		meaning than the semantics of DOS readonly permissions.

UF_ARCHIVE:	Command line name: "uarch", "uarchive"
		ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE
		Windows name: FILE_ATTRIBUTE_ARCHIVE

		The UF_ARCHIVED flag means that the file has changed and
		needs to be archived.  The meaning is same as
		the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and
		the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute.

		msdosfs and ZFS have special handling for this flag.
		i.e. they will set it when the file changes.

sys/param.h:		Bump __FreeBSD_version to 1000047 for the
			addition of new stat(2) flags.

chflags.1:		Document the new command line flag names
			(e.g. "system", "hidden") available to the
			user.

ls.1:			Reference chflags(1) for a list of file flags
			and their meanings.

strtofflags.c:		Implement the mapping between the new
			command line flag names and new stat(2)
			flags.

chflags.2:		Document all of the new stat(2) flags, and
			explain the intended behavior in a little
			more detail.  Explain how they map to
			Windows file attributes.

			Different filesystems behave differently
			with respect to flags, so warn the
			application developer to take care when
			using them.

zfs_vnops.c:		Add support for getting and setting the
			UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN,
			UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags.

			All of these flags are implemented using
			attributes that ZFS already supports, so
			the on-disk format has not changed.

			ZFS currently doesn't allow setting the
			UF_REPARSE flag, and we don't really have
			the other infrastructure to support reparse
			points.

msdosfs_denode.c,
msdosfs_vnops.c:	Add support for getting and setting
			UF_HIDDEN, UF_SYSTEM and UF_READONLY
			in MSDOSFS.

			It supported SF_ARCHIVED, but this has been
			changed to be UF_ARCHIVE, which has the same
			semantics as the DOS archive attribute instead
			of inverse semantics like SF_ARCHIVED.

			After discussion with Bruce Evans, change
			several things in the msdosfs behavior:

			Use UF_READONLY to indicate whether a file
			is writeable instead of file permissions, but
			don't actually enforce it.

			Refuse to change attributes on the root
			directory, because it is special in FAT
			filesystems, but allow most other attribute
			changes on directories.

			Don't set the archive attribute on a directory
			when its modification time is updated.
			Windows and DOS don't set the archive attribute
			in that scenario, so we are now bug-for-bug
			compatible.

smbfs_node.c,
smbfs_vnops.c:		Add support for UF_HIDDEN, UF_SYSTEM,
			UF_READONLY and UF_ARCHIVE in SMBFS.

			This is similar to changes that Apple has
			made in their version of SMBFS (as of
			smb-583.8, posted on opensource.apple.com),
			but not quite the same.

			We map SMB_FA_READONLY to UF_READONLY,
			because UF_READONLY is intended to match
			the semantics of the DOS readonly flag.
			The MacOS X code maps both UF_IMMUTABLE
			and SF_IMMUTABLE to SMB_FA_READONLY, but
			the immutable flags have stronger meaning
			than the DOS readonly bit.

stat.h:			Add definitions for UF_SYSTEM, UF_SPARSE,
			UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY
			and UF_HIDDEN.

			The definition of UF_HIDDEN is the same as
			the MacOS X definition.

			Add commented-out definitions of
			UF_COMPRESSED and UF_TRACKED.  They are
			defined in MacOS X (as of 10.8.2), but we
			do not implement them (yet).

ufs_vnops.c:		Add support for getting and setting
			UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY,
			UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS.
			Alphabetize the flags that are supported.

			These new flags are only stored, UFS does
			not take any action if the flag is set.

Sponsored by:	Spectra Logic
Reviewed by:	bde (earlier version)
2013-08-21 23:04:48 +00:00
Pawel Jakub Dawidek
fe0670cfb3 Correct function name and return value. 2013-08-17 14:55:31 +00:00
John Baldwin
5aa60b6f21 Add new mmap(2) flags to permit applications to request specific virtual
address alignment of mappings.
- MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n).
  Requests for n >= number of bits in a pointer or less than the size of
  a page fail with EINVAL.  This matches the API provided by NetBSD.
- MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED.  It can be used
  to optimize the chances of using large pages.  By default it will align
  the mapping on a large page boundary (the system is free to choose any
  large page size to align to that seems best for the mapping request).
  However, if the object being mapped is already using large pages, then
  it will align the virtual mapping to match the existing large pages in
  the object instead.
- Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and
  VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment.
  MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while
  MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE.
- mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than
  explicitly using VMFS_SUPER_SPACE.  All device objects are forced to
  use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively
  equivalent.

Reviewed by:	alc
MFC after:	1 month
2013-08-16 21:13:55 +00:00
Jilles Tjoelker
fdafa7840f pselect(2): Add xref to sigsuspend(2). 2013-08-16 14:06:29 +00:00
Jilles Tjoelker
5219e2caba Add man page dup3(3). 2013-08-16 13:16:27 +00:00
Jilles Tjoelker
f57087b21c sigsuspend(2): Add xrefs to pselect(2) and sigwait-alikes. 2013-08-15 22:33:27 +00:00
John Baldwin
513bfc4fe2 Enhance the description of NOTE_TRACK:
- NOTE_TRACK has never triggered a NOTE_TRACK event from the parent pid.
  If NOTE_FORK is set, the listener will get a NOTE_FORK event from
  the parent pid, but not a separate NOTE_TRACK event.
- Explicitly note that the event added to monitor the child process
  preserves the fflags from the original event.
- Move the description of NOTE_TRACKERR under NOTE_TRACK as it is not a
  bit for the user to set (which is what this list pupports to be).
  Also, explicitly note that if an error occurs, the NOTE_CHILD event
  will not be generated.

MFC after:	1 week
2013-07-25 19:34:24 +00:00
Ed Maste
4e1d691281 Document EINVAL error return from PT_LWPINFO 2013-07-22 18:18:21 +00:00
Joel Dahl
2a82581d9b Minor mdoc fixes. 2013-06-09 07:15:43 +00:00
Jilles Tjoelker
172886a93e sigaction(2): Document various non-POSIX functions as async-signal safe. 2013-06-08 13:45:43 +00:00
Gleb Smirnoff
6160e12c10 Add new system call - aio_mlock(). The name speaks for itself. It allows
to perform the mlock(2) operation, which can consume a lot of time, under
control of aio(4).

Reviewed by:	kib, jilles
Sponsored by:	Nginx, Inc.
2013-06-08 13:27:57 +00:00
Jilles Tjoelker
4b08438c22 dup(2): Clarify return value, in particular of dup2(). 2013-05-31 22:09:31 +00:00
Jilles Tjoelker
4e3f0e45cf sigaction(2): *at system calls are async-signal safe. 2013-05-31 21:31:38 +00:00
Jilles Tjoelker
f8732c7fc3 sigaction(2): Extend description of async-signal safe functions:
* Improve description when unsafe functions are unsafe.
* Add various safe functions from POSIX.1-2008 and Austin Group issue #692.
2013-05-31 21:25:51 +00:00
Jilles Tjoelker
0bbe34c35d fork(2): Add information about fork() in multi-threaded processes.
There is nothing about pthread_atfork(3) or extensions like calling
malloc(3) in the child process as this may be unreliable or broken.
2013-05-31 20:46:08 +00:00
Jilles Tjoelker
45100a722a fork(2): #include <sys/types.h> is not needed. 2013-05-31 14:48:37 +00:00
Ed Maste
e2e9c35fa4 Remove the advertising clause from the Regents of the University of
California's license, per the letter dated July 22, 1999.
2013-05-28 21:05:06 +00:00
Jilles Tjoelker
24f3b0bcd0 cap_rights_limit(2): CAP_ACCEPT also permits accept4(2). 2013-05-27 21:37:19 +00:00
Jilles Tjoelker
0bbacb9c66 sigreturn(2): Remove ancient compatibility warning about 4.2BSD.
The HISTORY subsection still says that sigreturn() was added in 4.3BSD.
2013-05-25 13:59:40 +00:00
Julian Elischer
956e8eee53 Update the setfib man page to reflect recent changes. 2013-05-20 20:47:40 +00:00
Sergey Kandaurov
e0906c9a0d POSIX 1003.1-2008: add ENOTRECOVERABLE, EOWNERDEAD errnos. 2013-05-04 19:07:22 +00:00
Jilles Tjoelker
ed5987bd08 accept(2), pipe(2): Fix .Dd. 2013-05-01 22:47:47 +00:00
Jilles Tjoelker
dc570d5e56 Add pipe2() system call.
The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and
O_NONBLOCK (on both sides) as part of the function.

If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p).

If the pointer is not valid, behaviour differs: pipe2() writes into the
array from the kernel like socketpair() does, while pipe() writes into the
array from an architecture-specific assembler wrapper.

Reviewed by:	kan, kib
2013-05-01 22:42:42 +00:00
Jilles Tjoelker
da7d2afb6d Add accept4() system call.
The accept4() function, compared to accept(), allows setting the new file
descriptor atomically close-on-exec and explicitly controlling the
non-blocking status on the new socket. (Note that the latter point means
that accept() is not equivalent to any form of accept4().)

The linuxulator's accept4 implementation leaves a race window where the new
file descriptor is not close-on-exec because it calls sys_accept(). This
implementation leaves no such race window (by using falloc() flags). The
linuxulator could be fixed and simplified by using the new code.

Like accept(), accept4() is async-signal-safe, a cancellation point and
permitted in capability mode.
2013-05-01 20:10:21 +00:00
Jilles Tjoelker
3143f63a23 intro(2): Fix some errors in ENFILE and EMFILE descriptions.
MFC after:	1 week
2013-04-27 11:55:23 +00:00
Jilles Tjoelker
e160aec9a5 getdtablesize(2): Describe what this function actually does.
getdtablesize() returns the limit on new file descriptors; this says nothing
about existing descriptors.

MFC after:	1 week
2013-04-24 21:24:35 +00:00
Sergey Kandaurov
89bbe1496d Keep up with negative addrlen check removal in r249649. 2013-04-22 09:18:50 +00:00
Jilles Tjoelker
2cd19a510a dup(2): Remove incorrect sentence about getdtablesize().
There are no getdtablesize() bounds on the file descriptor to be duplicated;
it only has to be open. If the RLIMIT_NOFILE rlimit was decreased after
opening the file descriptor, it may be greater than or equal to
getdtablesize() but still valid.

MFC after:	1 week
2013-04-21 19:42:04 +00:00
Joel Dahl
cd088fc43a Remove cross-references to nonexistent CPU_SET(3) manpage.
Also fix cpu_getaffinity(2) document title.

PR:		176317
Submitted by:	brucec
2013-04-21 06:46:41 +00:00
George V. Neville-Neil
599c412493 Correct the returned message lengths for timeval and bintime control
messages (SO_BINTIME, SO_TIMEVAL).

Obtained from:	phk
2013-04-05 18:09:43 +00:00
Matthew D Fleming
e324bf91e8 Fix return type of extattr_set_* and fix rmextattr(8) utility.
extattr_set_{fd,file,link} is logically a write(2)-like operation and
should return ssize_t, just like extattr_get_*.  Also, the user-space
utility was using an int for the return value of extattr_get_* and
extattr_list_*, both of which return an ssize_t.

MFC after:	1 week
2013-04-02 05:30:41 +00:00
Jilles Tjoelker
de9dcfba06 accept(2): Mention inheritance of O_ASYNC and signal destination.
While almost nobody uses O_ASYNC, and rightly so, the inheritance of the
related properties across accept() is a portability issue like the
inheritance of O_NONBLOCK.
2013-03-26 22:46:56 +00:00
Pawel Jakub Dawidek
2883fbd521 Document chflagsat(2).
Obtained from:	jilles
2013-03-21 23:05:44 +00:00
Pawel Jakub Dawidek
e948704e4b Implement chflagsat(2) system call, similar to fchmodat(2), but operates on
file flags.

Reviewed by:	kib, jilles
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:59:01 +00:00
Pawel Jakub Dawidek
b4b2596b97 - Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
  in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
  for lchflags(2)) stated that it was u_long. Now some related functions
  use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on:	arch
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:44:33 +00:00
Jilles Tjoelker
46f10cc265 Allow O_CLOEXEC in posix_openpt() flags.
PR:		kern/162374
Reviewed by:	ed
2013-03-21 21:39:15 +00:00
Jilles Tjoelker
c2e3c52e0d Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC.
This change allows creating file descriptors with close-on-exec set in some
situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and
socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file
descriptors (SCM_RIGHTS) atomically close-on-exec.

The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD.
MSG_CMSG_CLOEXEC is the first free bit for MSG_*.

The SOCK_* flags are not passed to MAC because this may cause incorrect
failures and can be done later via fcntl() anyway. On the other hand, audit
is expected to cope with the new flags.

For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags
argument.

Reviewed by:	kib
2013-03-19 20:58:17 +00:00
Gleb Smirnoff
8863cc408c There are actually two different cases when mlock(2) returns
ENOMEM. Clarify this, taking text from SUS.

Reviewed by:	kib
2013-03-19 05:44:25 +00:00
Pawel Jakub Dawidek
136cbf84ef Add a note to the HISTORY section about lchflags(2) being introduced in
FreeBSD 5.0.
2013-03-16 22:44:14 +00:00
Pawel Jakub Dawidek
7493f24ee6 - Implement two new system calls:
int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
	int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

  which allow to bind and connect respectively to a UNIX domain socket with a
  path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
  the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
  in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwatson, jilles, kib, des
2013-03-02 21:11:30 +00:00
Joel Dahl
fdf25068b7 mdoc: remove superfluous paragraph macro. 2013-03-02 06:55:55 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
Pawel Jakub Dawidek
d6f122f4fb Provide cap_sandboxed(3) function, which is a wrapper around cap_getmode(2)
system call, which has a nice property - it never fails, so it is a bit
easier to use. If there is no support for capability mode in the kernel
the function will return false (not in a sandbox). If the kernel is compiled
with the support for capability mode, the function will return true or false
depending if the calling process is in the capability mode sandbox or not
respectively.

Sponsored by:	The FreeBSD Foundation
2013-03-02 00:11:27 +00:00
Pawel Jakub Dawidek
1f2ce2a086 Put one file per line so it is easier to read diffs against those files. 2013-02-16 22:21:46 +00:00
Ian Lepore
74938cbb7f Make the F_READAHEAD option to fcntl(2) work as documented: a value of zero
now disables read-ahead.  It used to effectively restore the system default
readahead hueristic if it had been changed; a negative value now restores
the default.

Reviewed by:	kib
2013-02-13 15:09:16 +00:00