freebsd-nq

Author	SHA1	Message	Date
Brooks Davis	7cc923f8a8	Get rid of netbsd_lchown and netbsd_msync syscall entries. No valid FreeBSD binary very called them (they would call lchown and msync directly) and we haven't supported NetBSD binaries in ages. This is a respin of r335983 with a workaround for the ancient BFD linker in the libc stubs. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16193	2018-07-10 13:32:04 +00:00
Brooks Davis	714c03c81e	Revert r335983. The bfd linker in tree doesn't support multiple names for the same symbol (at least with current flags).	2018-07-05 16:03:03 +00:00
Brooks Davis	5b04a71dae	Get rid of netbsd_lchown and netbsd_msync syscall entries. No valid FreeBSD binary ever called them (they would call lchown and msync directly) and we haven't supported NetBSD binaries in ages. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15814	2018-07-05 14:12:56 +00:00
Brooks Davis	9da5364ed9	Name the implementation of brk and sbrk sys_break(). The break() system call was renamed (several times) starting in v3 AT&T UNIX when C was invented and break was a language keyword. The last vestage of a need for it to be called something else (eg obreak) was removed in r225617 which consistantly prefixed all syscall implementations. Reviewed by: emaste, kib (older version) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15638	2018-06-14 21:27:25 +00:00
Mark Johnston	9f9c9b22ec	Reimplement brk() and sbrk() to avoid the use of _end. Previously, libc.so would initialize its notion of the break address using _end, a special symbol emitted by the static linker following the bss section. Compatibility issues between lld and ld.bfd could cause the wrong definition of _end (libc.so's definition rather than that of the executable) to be used, breaking the brk()/sbrk() interface. Avoid this problem and future interoperability issues by simply not relying on _end. Instead, modify the break() system call to return the kernel's view of the current break address, and have libc initialize its state using an extra syscall upon the first use of the interface. As a side effect, this appears to fix brk()/sbrk() usage in executables run with rtld direct exec, since the kernel and libc.so no longer maintain separate views of the process' break address. PR: 228574 Reviewed by: kib (previous version) MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D15663	2018-06-04 19:35:15 +00:00
Brooks Davis	64b378f1e1	Remove alternative names that are identical to the default. Verified by make sysent producing no changes.	2018-05-30 22:22:58 +00:00
Brooks Davis	7351a8bdb5	Make vadvise compat freebsd11. The vadvise syscall (aka ovadvise) is undocumented and has always been implmented as returning EINVAL. Put the syscall under COMPAT11 and provide a userspace implementation. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15557	2018-05-25 20:40:23 +00:00
Brooks Davis	89ea4a30d6	Added SAL annotatations to system calls. Modify makesyscalls.sh to strip out SAL annotations. No functional change. This is based on work I started in CheriBSD and use to validate fat pointers at the syscall boundary. Tal Garfinkel reviewed the changes, added annotations to COMPAT* syscalls and is using them in a record and playback framework. One can envision other uses such as a WITNESS-like validator for copyin/out as speculated on in the review. As this time we are only annotating sys/kern/syscalls.master as that is sufficient for userspace work. If kernel use cases materialize, we can annotate other syscalls.master as needed. Submitted by: Tal Garfinkel <talg@cs.stanford.edu> Sponsored by: DARPA, AFRL (in part) Differential Revision: https://reviews.freebsd.org/D14285	2018-04-05 20:31:45 +00:00
Conrad Meyer	e9ac27430c	Implement getrandom(2) and getentropy(3) The general idea here is to provide userspace programs with well-defined sources of entropy, in a fashion that doesn't require opening a new file descriptor (ulimits) or accessing paths (/dev/urandom may be restricted by chroot or capsicum). getrandom(2) is the more general API, and comes from the Linux world. Since our urandom and random devices are identical, the GRND_RANDOM flag is ignored. getentropy(3) is added as a compatibility shim for the OpenBSD API. truss(1) support is included. Tests for both system calls are provided. Coverage is believed to be at least as comprehensive as LTP getrandom(2) test coverage. Additionally, instructions for running the LTP tests directly against FreeBSD are provided in the "Test Plan" section of the Differential revision linked below. (They pass, of course.) PR: 194204 Reported by: David CARLIER <david.carlier AT hardenedbsd.org> Discussed with: cperciva, delphij, jhb, markj Relnotes: maybe Differential Revision: https://reviews.freebsd.org/D14500	2018-03-21 01:15:45 +00:00
Brooks Davis	1c1b4c66b6	Remove remenants of 1990s efforts to let us run Net/OpenBSD binaries. No functional change (comments change in some generated files.) Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14571	2018-03-05 17:02:16 +00:00
Ed Maste	315fbaeca2	Correct pseudo misspelling in sys/ comments contrib code and #define in intel_ata.h unchanged.	2018-02-23 18:15:50 +00:00
Jeff Roberson	3f289c3fcf	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
Brooks Davis	5cd667e65f	Disable vim syntax highlighting. Vim's default pick doesn't understand that ';' is a comment character and the result looks horrible. Reviewed by: emaste	2017-11-28 18:23:17 +00:00
Konstantin Belousov	2b34e84335	Add abstime kqueue(2) timers and expand struct kevent members. This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which specifies that the data field contains absolute time to fire the event. To make this useful, data member of the struct kevent must be extended to 64bit. Using the opportunity, I also added ext members. This changes struct kevent almost to Apple struct kevent64, except I did not changed type of ident and udata, the later would cause serious API incompatibilities. The type of ident was kept uintptr_t since EVFILT_AIO returns a pointer in this field, and e.g. CHERI is sensitive to the type (discussed with brooks, jhb). Unlike Apple kevent64, symbol versioning allows us to claim ABI compatibility and still name the new syscall kevent(2). Compat shims are provided for both host native and compat32. Requested by: bapt Reviewed by: bapt, brooks, ngie (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D11025	2017-06-17 00:57:26 +00:00
Konstantin Belousov	6992112349	Commit the 64-bit inode project. Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439	2017-05-23 09:29:05 +00:00
Brooks Davis	982519d10f	Change the size argument of __getcwd() to size_t. This matches the getcwd() definition. This is technically an ABI change, but that would only effect 64-bit big-endian platforms that pass arguments on the stack. We have none of those. Reviewed by: jhb Obtained from: CheriABI Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9428	2017-04-06 23:40:13 +00:00
Robert Watson	d8ca0a2b70	Hook up new audit event identifiers for various non-Orange Book/CAPP system calls supported by OpenBSM 1.2-alpha5. Obtained from: TrustedBSD Project MFC after: 3 weeks Sponsored by: DARPA, AFRL	2017-03-29 22:33:56 +00:00
Eric van Gyzen	3f8455b090	Add clock_nanosleep() Add a clock_nanosleep() syscall, as specified by POSIX. Make nanosleep() a wrapper around it. Attach the clock_nanosleep test from NetBSD. Adjust it for the FreeBSD behavior of updating rmtp only when interrupted by a signal. I believe this to be POSIX-compliant, since POSIX mentions the rmtp parameter only in the paragraph about EINTR. This is also what Linux does. (NetBSD updates rmtp unconditionally.) Copy the whole nanosleep.2 man page from NetBSD because it is complete and closely resembles the POSIX description. Edit, polish, and reword it a bit, being sure to keep any relevant text from the FreeBSD page. Reviewed by: kib, ngie, jilles MFC after: 3 weeks Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D10020	2017-03-19 00:51:12 +00:00
John Baldwin	34ed0c63c8	Rename the 'flags' argument to getfsstat() to 'mode' and validate it. This argument is not a bitmask of flags, but only accepts a single value. Fail with EINVAL if an invalid value is passed to 'flag'. Rename the 'flags' argument to getmntinfo(3) to 'mode' as well to match. This is a followup to r308088. Reviewed by: kib MFC after: 1 month	2016-12-27 20:21:11 +00:00
Robert Watson	82d8d2b8bc	Replace spaces with tabs in definition of SCTP system calls, for consistency with the remainder of the syscalls.master file. This problem does not occur in the freebsd32 version of the same system calls.	2016-12-07 16:11:55 +00:00
George V. Neville-Neil	5cba398b0c	Remove unusedd and obsolete openbsd_poll system call. (Phase 1) Reported by: brooks Reviewed by: brooks,jhb Differential Revision: https://reviews.freebsd.org/D7548	2016-08-18 10:50:40 +00:00
Konstantin Belousov	295af703a0	Add an implementation of fdatasync(2). The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing code with fsync(2). For all filesystems, this commit provides the implementation which delegates the work of VOP_FDATASYNC() to VOP_FSYNC(). This is functionally correct but not efficient. This is not yet POSIX-compliant implementation, because it does not ensure that queued AIO requests are completed before returning. Reviewed by: mckusick Discussed with: avg (ZFS), jhb (AIO part) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7471	2016-08-15 19:08:51 +00:00
Bryan Drewery	78be18ae6e	Correct some comments. Sponsored by: EMC / Isilon Storage Division MFC after: 3 days	2016-08-03 18:48:56 +00:00
Ed Schouten	1b42875d4b	Re-add traling slash that was removed in r303699. I must have accidentally pressed some random key in vim.	2016-08-03 06:35:58 +00:00
Ed Schouten	a813fdc6c3	mprotect(): Change prototype to comply to POSIX. Our mprotect() function seems to take a "const void " address to the pages whose permissions need to be adjusted. POSIX uses "void ". Simply stick to the POSIX one to prevent us from writing unportable code. PR: 211423 (exp-run) Tested by: antoine@ (Thanks!)	2016-08-03 06:33:04 +00:00
Ed Schouten	d9c4cd2fbc	Change the return type of msgrcv() to ssize_t as required by POSIX. It looks like the msgrcv() system call is already written in such a way that the size is internally computed as a size_t and written into all of td_retval[0]. This means that it is effectively already returning ssize_t. It's just that the userspace prototype doesn't match up.	2016-07-28 12:22:01 +00:00
Robert Watson	e5ec733909	Do allow auditing of read(2) and write(2) system calls, by assigning those system calls audit event identifiers AUE_READ and AUE_WRITE. While auditing file-descriptor I/O is not required by the Common Criteria, in practice this proves useful for both live and forensic analysis. NB: freebsd32 already assigns AUE_READ and AUE_WRITE to read(2) and write(2). MFC after: 3 days Sponsored by: DARPA, AFRL	2016-07-10 13:42:33 +00:00
Brooks Davis	e16e64098c	Mark the pipe() system call as COMPAT10. As of r302092 libc uses pipe2() with a zero flags value instead of pipe(). Commit with regenerated files and implementation to follow. Approved by: re (gjb) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D6816	2016-06-22 21:15:59 +00:00
John Baldwin	bb430bc740	Fully handle size_t lengths in AIO requests. First, update the return types of aio_return() and aio_waitcomplete() to ssize_t. POSIX requires aio_return() to return a ssize_t so that it can represent all return values from read() and write(). aio_waitcomplete() should use ssize_t for the same reason. aio_return() has used ssize_t in <aio.h> since r31620 but the manpage and system call entry were not updated. aio_waitcomplete() has always returned int. Note that this does not require new system call stubs as this is effectively only an API change in how the compiler interprets the return value. Second, allow aio_nbytes values up to IOSIZE_MAX instead of just INT_MAX. aio_read/write should now honor the same length limits as normal read/write. Third, use longs instead of ints in the aio_return() and aio_waitcomplete() system call functions so that the 64-bit size_t in the in-kernel aiocb isn't truncated to 32-bits before being copied out to userland or being returned. Finally, a simple test has been added to verify the bounds checking on the maximum read size from a file.	2016-03-21 21:37:33 +00:00
John Baldwin	399e8c1773	Simplify AIO initialization now that it is standard. - Mark AIO system calls as STD and remove the helpers to dynamically register them. - Use COMPAT6 for the old system calls with the older sigevent instead of an 'o' prefix. - Simplify the POSIX configuration to note that AIO is always available. - Handle AIO in the default VOP_PATHCONF instead of special casing it in the pathconf() system call. fpathconf() is still hackish. - Remove freebsd32_aio_cancel() as it just called the native one directly. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5589	2016-03-09 19:05:11 +00:00
Adrian Chadd	871ef8b0d8	Regenerate syscalls.	2015-07-11 15:22:11 +00:00
Konstantin Belousov	0538aafc41	The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x kernels which required padding before the off_t parameter. The fcntl(2) contains compatibility code to handle kernels before the struct flock was changed during the 8.x CURRENT development. The shims were reasonable to allow easier revert to the older kernel at that time. Now, two or three major releases later, shims do not serve any purpose. Such old kernels cannot handle current libc, so revert the compatibility code. Make padded syscalls support conditional under the COMPAT6 config option. For COMPAT32, the syscalls were under COMPAT6 already. Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to (partially) disable the removed shims. Reviewed by: jhb, imp (previous versions) Discussed with: peter Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-18 21:50:13 +00:00
Jilles Tjoelker	2205e0d1bd	Add futimens and utimensat system calls. The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support. A new UTIME_* constant might allow setting birthtimes in future. Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes	2015-01-23 21:07:08 +00:00
Dmitry Chagin	9f7a06f27e	Indeed, instead of hiding the kern___getcwd() bug by bogus cast in r276564, change path type to char * (pathnames are always char ). And remove bogus casts of malloc(). kern___getcwd() internally doesn't actually use or support u_char paths, except to copy them to a normal char * path. These changes are not visible to libc as libc/gen/getcwd.c misdeclares __getcwd() as taking a plain char * path. While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as we always have sysproto.h. Pointed out by: bde MFC after: 1 week	2015-01-04 10:34:02 +00:00
Dmitry Chagin	186d9c3473	Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month	2014-11-13 05:26:14 +00:00
Marcel Moolenaar	80b47aefa1	Move the SCTP syscalls to netinet with the rest of the SCTP code. The syscalls themselves are tightly coupled with the network stack and therefore should not be in the generic socket code. The following four syscalls have been marked as NOSTD so they can be dynamically registered in sctp_syscalls_init() function: sys_sctp_peeloff sys_sctp_generic_sendmsg sys_sctp_generic_sendmsg_iov sys_sctp_generic_recvmsg The syscalls are also set up to be dynamically registered when COMPAT32 option is configured. As a side effect of moving the SCTP syscalls, getsock_cap needs to be made available outside of the uipc_syscalls.c source file. A proper prototype has been added to the sys/socketvar.h header file. API tests from the SCTP reference implementation have been run to ensure compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout) Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:16:52 +00:00
Attilio Rao	ce42e79310	Remove dead code from umtx support: - Retire long time unused (basically always unused) sys__umtx_lock() and sys__umtx_unlock() syscalls - struct umtx and their supporting definitions - UMUTEX_ERROR_CHECK flag - Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall __FreeBSD_version is not bumped yet because it is expected that further breakages to the umtx interface will follow up in the next days. However there will be a final bump when necessary. Sponsored by: EMC / Isilon storage division Reviewed by: jhb	2014-03-18 21:32:03 +00:00
John Baldwin	55648840de	Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month	2013-09-19 18:53:42 +00:00
John Baldwin	84c21af119	Fix the type of the idtype argument to wait6() in syscalls.master. Approved by: re (kib) MFC after: 1 week	2013-09-12 17:52:18 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
Gleb Smirnoff	6160e12c10	Add new system call - aio_mlock(). The name speaks for itself. It allows to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4). Reviewed by: kib, jilles Sponsored by: Nginx, Inc.	2013-06-08 13:27:57 +00:00
Konstantin Belousov	48947eccee	Fix the wait6(2) on 32bit architectures and for the compat32, by using the right type for the argument in syscalls.master. Also fix the posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the architectures which require padding of the 64bit argument. Noted and reviewed by: jhb Pointy hat to: kib MFC after: 1 week	2013-05-21 11:40:16 +00:00
Jilles Tjoelker	dc570d5e56	Add pipe2() system call. The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function. If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p). If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper. Reviewed by: kan, kib	2013-05-01 22:42:42 +00:00
Jilles Tjoelker	da7d2afb6d	Add accept4() system call. The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().) The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code. Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.	2013-05-01 20:10:21 +00:00
Matthew D Fleming	e324bf91e8	Fix return type of extattr_set_* and fix rmextattr(8) utility. extattr_set_{fd,file,link} is logically a write(2)-like operation and should return ssize_t, just like extattr_get_. Also, the user-space utility was using an int for the return value of extattr_get_ and extattr_list_*, both of which return an ssize_t. MFC after: 1 week	2013-04-02 05:30:41 +00:00
Pawel Jakub Dawidek	e948704e4b	Implement chflagsat(2) system call, similar to fchmodat(2), but operates on file flags. Reviewed by: kib, jilles Sponsored by: The FreeBSD Foundation	2013-03-21 22:59:01 +00:00
Pawel Jakub Dawidek	b4b2596b97	- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type u_long. Before this change it was of type int for syscalls, but prototypes in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not for lchflags(2)) stated that it was u_long. Now some related functions use u_long type for flags (strtofflags(3), fflagstostr(3)). - Make path argument of type 'const char *' for consistency. Discussed on: arch Sponsored by: The FreeBSD Foundation	2013-03-21 22:44:33 +00:00
Pawel Jakub Dawidek	7493f24ee6	- Implement two new system calls: int bindat(int fd, int s, const struct sockaddr addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des	2013-03-02 21:11:30 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
Konstantin Belousov	f13b5a0f01	Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage. Allow to get the current rusage information for non-exited processes as well, similar to Solaris. The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in. Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state. PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month	2012-11-13 12:52:31 +00:00

1 2 3 4 5 ...

325 Commits