freebsd-skq

Author	SHA1	Message	Date
Edward Tomasz Napierala	225636dccb	Fix bunch of .Xrs. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-28 16:48:28 +00:00
John Baldwin	bb430bc740	Fully handle size_t lengths in AIO requests. First, update the return types of aio_return() and aio_waitcomplete() to ssize_t. POSIX requires aio_return() to return a ssize_t so that it can represent all return values from read() and write(). aio_waitcomplete() should use ssize_t for the same reason. aio_return() has used ssize_t in <aio.h> since r31620 but the manpage and system call entry were not updated. aio_waitcomplete() has always returned int. Note that this does not require new system call stubs as this is effectively only an API change in how the compiler interprets the return value. Second, allow aio_nbytes values up to IOSIZE_MAX instead of just INT_MAX. aio_read/write should now honor the same length limits as normal read/write. Third, use longs instead of ints in the aio_return() and aio_waitcomplete() system call functions so that the 64-bit size_t in the in-kernel aiocb isn't truncated to 32-bits before being copied out to userland or being returned. Finally, a simple test has been added to verify the bounds checking on the maximum read size from a file.	2016-03-21 21:37:33 +00:00
Julian Elischer	efdd41da26	Use the right argumant name MFC after: 1 week Sponsored by: Panzura inc	2016-03-18 08:47:17 +00:00
John Baldwin	6d3eca246c	Remove Symbol.map entries for old AIO system calls for FreeBSD 6 compat. These entries should have never been present since they only exist for compat with FreeBSD 6.x (and older) binaries. This was missed in r296572. Technically this breaks the ABI by removing versioned symbols. However, no binaries should be linked against these symbols. No release has shipped with a header that contained a prototype for these functions. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5615	2016-03-12 07:13:20 +00:00
Edward Tomasz Napierala	62411b41c4	Fix spelling of MAXNAMLEN. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-09 13:45:03 +00:00
Edward Tomasz Napierala	0ca11f9ded	kenv(8) -> kenv(1) MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-02-29 17:22:34 +00:00
Edward Tomasz Napierala	406e4bde38	sysconf(2) -> sysconf(3) MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-02-29 17:20:04 +00:00
Benjamin Kaduk	24183025a5	Bump .Dd for r295764 Also fix a spelling and grammar nit while here.	2016-02-18 18:50:03 +00:00
Maxim Sobolev	a050ef0997	Right now, the "virtual hole" API feature of lseek(2) is very vaguely documented and easy to miss. At the same time, it's pretty important for anyone who is trying to use SEEK_HOLE/SEEK_DATA in real app. Try to bridge that gap by making that description more pronounced and also document how it affects failure codes. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5162	2016-02-18 18:41:40 +00:00
Jamie Gritton	e94b881ba1	Remove man page references to rndassociates.com, which has been taken over by a domain squatter.	2016-02-10 14:48:49 +00:00
Konstantin Belousov	bd43f0691c	If libthr.so is dlopened without RTLD_GLOBAL flag, the libthr symbols do not participate in the global symbols namespace, but rtld locks are still replaced and functions are interposed. In particular, __pthread_map_stacks_exec is resolved to the libc version. If a library is loaded later, which requires adjustment of the stack protection mode, rtld calls into libc __pthread_map_stacks_exec due to the symbols scope. The libc version might recurse into binder and recursively acquire rtld bind lock, causing the hang. Make libc __pthread_map_stacks_exec() interposed, which synchronizes rtld locks and version of the stack exec hook when libthr loaded, regardless of the symbol scope control or symbol resolution order. The __pthread_map_stacks_exec() symbol is removed from the private version in libthr since libc symbol now operates correctly in presence of libthr. Reported and tested by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-02-08 19:24:13 +00:00
Jilles Tjoelker	9ef7a36255	semget(2): Add missing [EINVAL] conditions. PR: 206927	2016-02-07 21:25:08 +00:00
Jason Helfman	74f9cea2d3	- connect(2) Clarify namelen PR: 206838 Submitted by: t@tobik.me Approved by: bcr (mentor) MFH: after 1 week Differential Revision: https://reviews.freebsd.org/D5194	2016-02-04 18:03:06 +00:00
Konstantin Belousov	bf420ace0a	Add implementations of sendmmsg(3) and recvmmsg(3) functions which wraps sendmsg(2) and recvmsg(2) into batch send and receive operation. The goal of this implementation is only to provide API compatibility with Linux. The cancellation behaviour of the functions is not quite right, but due to relative rare use of cancellation it is considered acceptable comparing with the complexity of the correct implementation. If functions are reimplemented as syscalls, the fix would come almost trivial. The direct use of the syscall trampolines instead of libc wrappers for sendmsg(2) and recvmsg(2) is to avoid data loss on cancellation. Submitted by: Boris Astardzhiev <boris.astardzhiev@gmail.com> Discussed with: jilles (cancellation behaviour) MFC after: 1 month	2016-01-29 14:12:12 +00:00
Konstantin Belousov	88d74d64d7	Restore flushing of output for revoke(2) again. Document revoke()'s intended behaviour in its man page. Simplify tty_drain() to match. Don't call ttydevsw methods in tty_flush() if the device is gone since we now sometimes call it then. The flushing was supposed to be implemented by passing the FNONBLOCK flag to VOP_CLOSE() for revoke(). The tty driver is one of the few that can block in close and was one of the fewer that knew about this. This almost worked in FreeBSD-1 and similarly in Net/2. These versions only almost worked because there was and is considerable confusion between IO_NDELAY and FNONBLOCK (aka O_NONBLOCK). IO_NDELAY is only valid for VOP_READ() and VOP_WRITE(). For other VOPs it has the same value as O_SHLOCK. But since vfs_subr.c and tty.c consistently used the wrong flag and the O_SHLOCK flag is rarely set, this mostly worked. It also gave the feature than applications could get the non-blocking close by abusing O_SHLOCK. This was first broken then fixed in 1995. I changed only the tty driver to use FNONBLOCK, as a hack to get non-blocking via the normal flag FNONBLOCK for last closes. I didn't know about revoke()'s use of IO_NDELAY or change it to be consistent, so revoke() was broken. Then I changed revoke() to match. This was next broken in 1997 then fixed in 1998. Importing Lite2 made the flags inconsistent again by undoing the fix only in vfs_subr.c. This was next broken in 2008 by replacing everything in tty.c and not checking any flags in last close. Other bugs in draining limited the resulting unbounded waits to drain in some cases. It is now possible to fix this better using the new FREVOKE flag. Just restore flushing for revoke() for now. Don't restore or undo any hacks for ordinary last closes yet. But remove dead code in the 1-second relative timeout (r272789). This did extra work to extend the buggy draining for revoke() for as long as possible. The 1-second timeout made this not very long by usually flushing after 1 second. Submitted by: bde MFC after: 2 weeks	2016-01-26 07:57:44 +00:00
Joel Dahl	5837aafd13	mdoc: sort Xr	2016-01-18 20:21:38 +00:00
Jilles Tjoelker	9ff8318f65	utimensat(2): Correct description of [EINVAL] error. MFC after: 4 days	2016-01-17 21:14:27 +00:00
Kevin Lo	c911734adb	- Add the 'restrict' type qualifier to match function prototype. - Remove sys/types.h.	2016-01-14 01:33:16 +00:00
Jilles Tjoelker	b956ae7c20	Update futimens/utimensat for MFC to stable/10: * Fix __FreeBSD_version check. * Update history section in man page. An MFC of this commit to stable/10 will allow using the new system calls instead of the fallback. MFC after: 3 days	2016-01-12 20:53:57 +00:00
Gleb Smirnoff	2bab0c5535	New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and up to now. The new sendfile is the code that Netflix uses to send their multiple tens of gigabits of data per second. The new implementation features asynchronous I/O, when I/O operations are launched, but not awaited to be complete. An explanation of why such behavior is beneficial compared to old one is going to be too long for a commit message, so we will skip it here. Additional features of new syscall are extra flags, which provide an application more control over data sent. The SF_NOCACHE flag tells kernel that data shouldn't be cached after it was sent. The SF_READAHEAD() macro allows to specify readahead size in pages. The new syscalls is a drop in replacement. No modifications are required to applications. One can take nginx binary for stable/10 and run it successfully on head. Although SF_NODISKIO lost its original sense, as now sendfile doesn't block, and now means something completely different (tm), using the new sendfile the old way is absolutely safe. Celebrates: Netflix global launch! Sponsored by: Nginx, Inc. Sponsored by: Netflix Relnotes: yes	2016-01-08 20:34:57 +00:00
John Baldwin	80f6797f4b	Document the recently added support for ptrace(2) LWP events.	2015-12-30 00:04:57 +00:00
Dmitry Chagin	3e18d701de	Verify that tv_sec value specified in settimeofday() and clock_settime() (CLOCK_REALTIME case) system calls is non negative. This commit hides a kernel panic in atrtc_settime() as the clock_ts_to_ct() does not properly convert negative tv_sec. ps. in my opinion clock_ts_to_ct() should be rewritten to properly handle negative tv_sec values. Differential Revision: https://reviews.freebsd.org/D4714 Reviewed by: kib MFC after: 1 week	2015-12-27 15:37:07 +00:00
Jilles Tjoelker	11022e84d8	clock_gettime(2),gettimeofday(2): Remove [EFAULT] error. Depending on system configuration and parameters, clock_gettime() and gettimeofday() may not be system calls. If so, passing an invalid pointer will cause a signal and not an [EFAULT] error. From a standards perspective, this is OK since passing an invalid pointer is undefined behaviour. MFC after: 1 week	2015-12-20 15:11:11 +00:00
Kevin Lo	d0ec8fd065	Remove sys/types.h due to STANDARDS and unistd.h also includes sys/types.h.	2015-12-15 15:19:06 +00:00
Kevin Lo	13230220de	Remove sys/types.h due to STANDARDS and unistd.h also includes sys/types.h. Reviewed by: bde	2015-12-15 15:08:29 +00:00
John Baldwin	d6fb489498	Start on a new library (libsysdecode) that provides routines for decoding system call information such as system call arguments. Initially this will consist of pulling duplicated code out of truss and kdump though it may prove useful for other utilities in the future. This commit moves the shared utrace(2) record parser out of kdump into the library and updates kdump and truss to use it. One difference from the previous version is that the library version treats unknown events that start with the "RTLD" signature as unknown events. This simplifies the interface and allows the consumer to decide how to handle all non-recognized events. Instead, this function only generates a string description for known malloc() and RTLD records. Reviewed by: bdrewery Differential Revision: https://reviews.freebsd.org/D4537	2015-12-15 00:05:07 +00:00
Conrad Meyer	8b584e9d74	cpuset.9: Link to/from the new page A follow-up to r289667. Sponsored by: EMC / Isilon Storage Division	2015-10-20 23:52:37 +00:00
John Baldwin	c814b86843	Switch pl_child_pid from int to pid_t. Reviewed by: emaste, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D3857	2015-10-20 17:58:21 +00:00
Edward Tomasz Napierala	92001b9497	Change the default setting of kern.ipc.shm_allow_removed from 0 to 1. This removes the need for manually changing this flag for Google Chrome users. It also improves compatibility with Linux applications running under Linuxulator compatibility layer, and possibly also helps in porting software from Linux. Generally speaking, the flag allows applications to create the shared memory segment, attach it, remove it, and then continue to use it and to reattach it later. This means that the kernel will automatically "clean up" after the application exits. It could be argued that it's against POSIX. However, SUSv3 says this about IPC_RMID: "Remove the shared memory identifier specified by shmid from the system and destroy the shared memory segment and shmid_ds data structure associated with it." From my reading, we break it in any case by deferring removal of the segment until it's detached; we won't break it any more by also deferring removal of the identifier. This is the behaviour exhibited by Linux since... probably always, and also by OpenBSD since the following commit: revision 1.54 date: 2011/10/27 07:56:28; author: robert; state: Exp; lines: +3 -8; Allow segments to be used even after they were marked for deletion with the IPC_RMID flag. This is permitted as an extension beyond the standards and this is similar to what other operating systems like linux do. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3603	2015-10-10 09:29:47 +00:00
John Baldwin	d07c923bda	Document the recently added pl_syscall_* fields in struct ptrace_lwpinfo. Reviewed by: emaste, kib Differential Revision: https://reviews.freebsd.org/D3833	2015-10-07 17:52:18 +00:00
Bryan Drewery	195aef9962	truss: Add support for utrace(2). This uses the kdump(1) utrace support code directly until a common library is created. This allows malloc(3) tracing with MALLOC_CONF=utrace:true and rtld tracing with LD_UTRACE=1. Unknown utrace(2) data is just printed as hex. PR: 43819 [inspired by] Reviewed by: jhb MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D3819	2015-10-06 21:58:38 +00:00
Mark Johnston	403ec61cbb	Revert r288628 and instead fix a discrepancy between the posix_fadvise(2) man page and POSIX: posix_fadvise(2) returns an error number on failure. Reported by: jilles MFC after: 1 week	2015-10-03 22:27:14 +00:00
Konstantin Belousov	96cdb0ab9d	Annotate arm userspace assembler sources stating their tolerance to the non-executable stack. Reviewed by: andrew Sponsored by: The FreeBSD Foundation	2015-09-29 16:09:58 +00:00
Bryan Drewery	cca3306a7f	Avoid adding duplicates into OBJS. bsd.lib.mk already handles adding entries to OBJS based on SRCS. MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-09-22 04:55:28 +00:00
Craig Rodrigues	7ca26e3831	Add missing include to eliminate -Wmissing-prototypes warnings	2015-09-20 03:49:08 +00:00
Craig Rodrigues	cfb65fa249	Add missing includes to eliminate -Wmissing-prototypes warnings	2015-09-20 03:45:57 +00:00
Xin LI	ac1a32b454	There is no HP 300 support in FreeBSD anymore, so remove the obsolete BUGS section. While I'm there also bump Dd date. MFC after: 2 weeks	2015-09-18 20:28:37 +00:00
Edward Tomasz Napierala	0d3d0cc358	Kernel part of reroot support - a way to change rootfs without reboot. Note that the mountlist manipulations are somewhat fragile, and not very pretty. The reason for this is to avoid changing vfs_mountroot(), which is (obviously) rather mission-critical, but not very well documented, and thus hard to test properly. It might be possible to rework it to use its own simple root mount mechanism instead of vfs_mountroot(). Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D2698	2015-09-18 17:32:22 +00:00
Jilles Tjoelker	6b46581ed9	setuid(2): Suggest O_CLOEXEC instead of fcntl(F_SETFD).	2015-09-13 14:00:49 +00:00
Konstantin Belousov	bd6060a1c6	Switch libc from using _sig{procmask,action,suspend} symbols, which are aliases for the syscall stubs and are plt-interposed, to the libc-private aliases of internally interposed sigprocmask() etc. Since e.g. _sigaction is not interposed by libthr, calling signal() removes thr_sighandler() from the handler slot etc. The result was breaking signal semantic and rtld locking. The added __libc_sigprocmask and other symbols are hidden, they are not exported and cannot be called through PLT. The setjmp/longjmp functions for x86 were changed to use direct calls, and since PIC_PROLOGUE only needed for functional PLT indirection on i386, it is removed as well. The PowerPC bug of calling the syscall directly in the setjmp/longjmp implementation is kept as is. Reported by: Pete French <petefrench@ingresso.co.uk> Tested by: Michiel Boland <boland37@xs4all.nl> Reviewed by: jilles (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-08-29 14:25:01 +00:00
Benjamin Kaduk	328b9e0bca	Editing pass on procctl.2 Spell "descendant" correctly. Grammar fixes. Use correct width argument to Bl. Use Po and Pc to avoid leaving a dangling '(' on the end of a line.	2015-08-21 02:42:14 +00:00
Konstantin Belousov	41d50cd6b7	If process becomes reaper (procctl(PROC_REAP_ACQUIRE)) while already having some children, the children' reaper is not reset to the parent. This allows for the situation where reaper has children but not descendands and the too strict asserts in the reap_status() fire. Remove the wrong asserts, add some clarification for the situation to the procctl(2) REAP_STATUS. Reported and tested by: feld Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-08-20 22:44:26 +00:00
Conrad Meyer	971c424c7e	getrlimit.2: Document RSS, AS/VMEM limit behavior more clearly Alphabetize the RLIMIT_ list while here. Reviewed by: jilles (previous version), wblock (previous version) Approved by: markj (mentor) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3433	2015-08-20 00:00:15 +00:00
Pedro F. Giffuni	842898ceec	Remove a stale comment and clarify the original where it was taken from The comment in the libc/sys symbol map referenced the generated symbols for the syscall trampolines. Such comment was out of place in the secure symbol map so remove the stale comment and attempt to clarify the old one to avoid risks of confusion. Pointed out by: kib	2015-08-14 14:58:04 +00:00
Pedro F. Giffuni	fe0d386cf3	Move the stack protector to a new "secure" directory As part of the code refactoring to support FORTIFY_SOURCE we want a new subdirectory "secure" to keep the files related to security. Move the stack protector functions to this new directory. No functional change. Differential Review: https://reviews.freebsd.org/D3333	2015-08-14 03:03:13 +00:00
Ed Schouten	2433a4eb04	Make it possible to implement poll(2) on top of kqueue(2). It looks like EVFILT_READ and EVFILT_WRITE trigger under the same conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX. The only difference is that POLLRDNORM has to be triggered on regular files unconditionally, whereas EVFILT_READ only triggers when not EOF. Introduce a new flag, NOTE_FILE_POLL, that can be used to make EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag will be used by cloudlibc's poll() function. Reviewed by: jmg Differential Revision: https://reviews.freebsd.org/D3303	2015-08-05 07:34:29 +00:00
Konstantin Belousov	35dfc644f5	Copy the fencing of the algorithm to do lock-less update and reading of the timehands, from the kern_tc.c implementation to vdso. Add comments giving hints where to look for the algorithm explanation. To compensate the removal of rmb() in userspace binuptime(), add explicit lfence instruction before rdtsc. On i386, add usual complications to detect SSE2 presence; assume that old CPUs which do not implement SSE2 also execute rdtsc almost in order. Reviewed by: alc, bde (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-04 12:33:51 +00:00
Bryan Drewery	b7551bceeb	unlink(2): Note the possibility for ENOSPC to be returned on ZFS. PR: 154930	2015-07-28 22:48:58 +00:00
Ed Schouten	b114aa7959	Make shutdown() return ENOTCONN as required by POSIX, part deux. Summary: Back in 2005, maxim@ attempted to fix shutdown() to return ENOTCONN in case the socket was not connected (r150152). This had to be rolled back (r150155), as it broke some of the existing programs that depend on this behavior. I reapplied this change on my system and indeed, syslogd failed to start up. I fixed this back in February (279016) and MFC'ed it to the supported stable branches. Apart from that, things seem to work out all right. Since at least Linux and Mac OS X do the right thing, I'd like to go ahead and give this another try. To keep old copies of syslogd working, only start returning ENOTCONN for recent binaries. I took a look at the XNU sources and they seem to test against both SS_ISCONNECTED, SS_ISCONNECTING and SS_ISDISCONNECTING, instead of just SS_ISCONNECTED. That seams reasonable, so let's do the same. Test Plan: This issue was uncovered while writing tests for shutdown() in CloudABI: https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/sys/socket/shutdown_test.c#L26 Reviewers: glebius, rwatson, #manpages, gnn, #network Reviewed By: gnn, #network Subscribers: bms, mjg, imp Differential Revision: https://reviews.freebsd.org/D3039	2015-07-27 13:17:57 +00:00
Edward Tomasz Napierala	5e95c31051	Add missing capitalization.	2015-07-24 18:13:13 +00:00
Konstantin Belousov	b4490c6e93	The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation	2015-07-18 09:02:50 +00:00
Alan Cox	131041fa2a	Correct the description of MADV_DONTNEED. Specifically, after using MADV_DONTNEED, while pages faults on the affected address range are more likely to occur, they are not guaranteed to occur. MFC after: 3 days	2015-07-12 19:18:19 +00:00
Adrian Chadd	6520495abc	Add an initial NUMA affinity/policy configuration for threads and processes. This is based on work done by jeff@ and jhb@, as well as the numa.diff patch that has been circulating when someone asks for first-touch NUMA on -10 or -11. * Introduce a simple set of VM policy and iterator types. * tie the policy types into the vm_phys path for now, mirroring how the initial first-touch allocation work was enabled. * add syscalls to control changing thread and process defaults. * add a global NUMA VM domain policy. * implement a simple cascade policy order - if a thread policy exists, use it; if a process policy exists, use it; use the default policy. * processes inherit policies from their parent processes, threads inherit policies from their parent threads. * add a simple tool (numactl) to query and modify default thread/process policities. * add documentation for the new syscalls, for numa and for numactl. * re-enable first touch NUMA again by default, as now policies can be set in a variety of methods. This is only relevant for very specific workloads. This doesn't pretend to be a final NUMA solution. The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by 'sysctl vm.default_policy=rr'. This is only relevant if MAXMEMDOM is set to something other than 1. Ie, if you're using GENERIC or a modified kernel with non-NUMA, then this is a glorified no-op for you. Thank you to Norse Corp for giving me access to rather large (for FreeBSD!) NUMA machines in order to develop and verify this. Thank you to Dell for providing me with dual socket sandybridge and westmere v3 hardware to do NUMA development with. Thank you to Scott Long at Netflix for providing me with access to the two-socket, four-domain haswell v3 hardware. Thank you to Peter Holm for running the stress testing suite against the NUMA branch during various stages of development! Tested: * MIPS (regression testing; non-NUMA) * i386 (regression testing; non-NUMA GENERIC) * amd64 (regression testing; non-NUMA GENERIC) * westmere, 2 socket (thankyou norse!) * sandy bridge, 2 socket (thankyou dell!) * ivy bridge, 2 socket (thankyou norse!) * westmere-EX, 4 socket / 1TB RAM (thankyou norse!) * haswell, 2 socket (thankyou norse!) * haswell v3, 2 socket (thankyou dell) * haswell v3, 2x18 core (thankyou scott long / netflix!) * Peter Holm ran a stress test suite on this work and found one issue, but has not been able to verify it (it doesn't look NUMA related, and he only saw it once over many testing runs.) * I've tested bhyve instances running in fixed NUMA domains and cpusets; all seems to work correctly. Verified: * intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different NUMA policies for processes under test. Review: This was reviewed through phabricator (https://reviews.freebsd.org/D2559) as well as privately and via emails to freebsd-arch@. The git history with specific attributes is available at https://github.com/erikarn/freebsd/ in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy). This has been reviewed by a number of people (stas, rpaulo, kib, ngie, wblock) but not achieved a clear consensus. My hope is that with further exposure and testing more functionality can be implemented and evaluated. Notes: * The VM doesn't handle unbalanced domains very well, and if you have an overly unbalanced memory setup whilst under high memory pressure, VM page allocation may fail leading to a kernel panic. This was a problem in the past, but it's much more easily triggered now with these tools. * This work only controls the path through vm_phys; it doesn't yet strongly/predictably affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory isn't really guaranteed in any way. That's next on my plate. Sponsored by: Norse Corp, Inc.; Dell	2015-07-11 15:21:37 +00:00
Edward Tomasz Napierala	a238a79872	Fix markup. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2015-07-07 19:23:59 +00:00
Konstantin Belousov	eb89622653	Grammar and language fixes. Submitted by: wblock Review: https://reviews.freebsd.org/D2969 MFC after: 12 days	2015-07-03 17:30:31 +00:00
Konstantin Belousov	23e1c1251c	Document x86 machine-specific ptrace(2) requests. Provide list of the ppc requests. Reviewed by: brueffer, emaste, gjb (previous version) Sponsored by: The FreeBSD Foundation Review: https://reviews.freebsd.org/D2962 MFC after: 2 weeks	2015-06-30 18:53:42 +00:00
Jeremie Le Hen	b7c4ed65cc	NetBSD commit log: Use a constant array for the MIB. Newer LLVM decided that mib[] warranted stack protections, with the obvious crash after the setup was done. As a positive side effect, code size shrinks a bit. I'm not sure why this hasn't bitten us yes, but it is certainly possible and there are no real drawbacks to this change anyway. Submitted by: pfg Obtained from: NetBSD MFC after: 1 week	2015-06-14 07:47:18 +00:00
John Baldwin	196cd80898	Various updates to the ftruncate(2) documentation: - Note that ftruncate(2) can operate on shared memory objects and cross reference shm_open(2). - Note that ftruncate(2) does not change the file position pointer (aka seek pointer) of the file descriptor. - ftruncate(2) will fail with EINVAL for all sorts of other fd types than just sockets, so instead note that it fails for all but regular files and shared memory objects. - Note that ftruncate(2) also appeared in 4.2BSD along with truncate(2). (Or at least the manpage for both appeared in 4.2, I did not check the kernel code itself to see if either predated 4.2.) PR: 199472 (2) Submitted by: andrew@ugh.net.au (2) MFC after: 1 week	2015-05-04 14:47:00 +00:00
John Baldwin	afa94a3f97	Partially revert r255486, the first argument to socketpair() is a socket domain, not a file descriptor. Use 'domain' instead of the original 'd' for this argument to match socket(2). PR: 199491 Reported by: sp55aa@qq.com MFC after: 1 week	2015-05-04 14:23:31 +00:00
Mark Johnston	93c9677b94	fork(2): Add a note to the effect that kqueue descriptors, unlike other descriptor types, are not inherited from the parent process. Reported by: kmacy MFC after: 1 week	2015-05-02 00:29:27 +00:00
Baptiste Daroussin	18c5321d06	Escape "Ed"	2015-04-26 10:52:37 +00:00
John Baldwin	179fa75e6e	Reassign copyright statements on several files from Advanced Computing Technologies LLC to Hudson River Trading LLC. Approved by: Hudson River Trading LLC (who owns ACT LLC) MFC after: 1 week	2015-04-23 14:22:20 +00:00
Konstantin Belousov	0538aafc41	The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x kernels which required padding before the off_t parameter. The fcntl(2) contains compatibility code to handle kernels before the struct flock was changed during the 8.x CURRENT development. The shims were reasonable to allow easier revert to the older kernel at that time. Now, two or three major releases later, shims do not serve any purpose. Such old kernels cannot handle current libc, so revert the compatibility code. Make padded syscalls support conditional under the COMPAT6 config option. For COMPAT32, the syscalls were under COMPAT6 already. Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to (partially) disable the removed shims. Reviewed by: jhb, imp (previous versions) Discussed with: peter Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-18 21:50:13 +00:00
Konstantin Belousov	3d0045bb2b	Make wait6(2), waitid(3) and ppoll(2) cancellation points. The waitid() function is required to be cancellable by the standard. The wait6() and ppoll() follow the other syscalls in their groups. Reviewed by: jhb, jilles (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-18 21:35:41 +00:00
Sergey Kandaurov	d359191f7e	Remove obsolete bits about maximum number of file systems. NMOUNT has gone together with static mount table in 4.3BSD-Reno. MFC after: 1 week	2015-04-12 21:14:58 +00:00
John Baldwin	b871bfa1a2	vfork() first appeared in 3BSD which pre-dates 2.9BSD. Verified via the copy of 3BSD on disc 1 of "The CSRG Archives". PR: 198612 MFC after: 1 week	2015-04-06 20:40:01 +00:00
Ed Maste	541236cf60	libc: Eliminate duplicate copies of __vdso_gettc.c Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D2152	2015-04-02 21:18:11 +00:00
Edward Tomasz Napierala	522196b5ed	Update open(2) to make it more obvious that O_NOCTTY and O_TTY_INIT are ignored. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-02 11:41:04 +00:00
Konstantin Belousov	1849df3006	Correctly handle __fcntl_compat symbol for the !SYSCALL_COMPAT case. Both .weak and .alias assembler directives only work when assembling the file which defines the symbol. Reported and tested by: andrew Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-01 16:55:30 +00:00
Konstantin Belousov	b072e86d09	Make kevent(2) a cancellation point. Note that to cancel blocked kevent(2) call, changelist must be empty, since we cannot cancel a call which already made changes to the process state. And in reverse, call which only makes changes to the kqueue state, without waiting for an event, is not cancellable. This makes a natural usage model to migrate kqueue loop to support cancellation, where existing single kevent(2) call must be split into two: first uncancellable update of kqueue, then cancellable wait for events. Note that this is ABI-incompatible change, but it is believed that there is no cancel-safe code that relies on kevent(2) not being a cancellation point. Option to preserve the ABI would be to keep kevent(2) as is, but add new call with flags to specify cancellation behaviour, which only value seems to add complications. Suggested and reviewed by: jilles Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-03-29 19:14:41 +00:00
John-Mark Gurney	32d52c275d	forgot to bump date, and replace contraction (igor)...	2015-03-07 03:48:32 +00:00
John-Mark Gurney	4a46673183	make things a bit more clear.. we worked together on language.. Submitted by: Justin Cormack	2015-03-06 23:17:18 +00:00
John-Mark Gurney	b759b8aa44	fix spelling, add comma and remove BUGS section.. it provided no useful information, and is not really bugs, but limitations for other reasons...	2015-02-19 01:51:17 +00:00
Marius Strobl	aed116911d	Unbreak sparc64 after r276630 by calling __sparc_sigtramp_setup signal trampoline as part of the MD __sys_sigaction again. Submitted by: kib (initial versions) MFC after: 3 days	2015-02-16 22:13:03 +00:00
Konstantin Belousov	45468c5356	Properly interpose libc spinlocks, was missed in r276630. In particular, stdio locking was affected. Reported and tested by: "Matthew D. Fuller" <fullermd@over-yonder.net> Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-02-14 11:47:40 +00:00
Edward Tomasz Napierala	6c316535e2	Remove useless comment. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-02-07 13:11:45 +00:00
Jilles Tjoelker	2205e0d1bd	Add futimens and utimensat system calls. The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support. A new UTIME_* constant might allow setting birthtimes in future. Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes	2015-01-23 21:07:08 +00:00
Konstantin Belousov	677258f7e7	Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger attachment to the process. Note that the command is not intended to be a security measure, rather it is an obfuscation feature, implemented for parity with other operating systems. Discussed with: jilles, rwatson Man page fixes by: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-18 15:13:11 +00:00
Konstantin Belousov	397d851d66	Reduce the size of the interposing table and amount of cancellation-handling code in the libthr. Translate some syscalls into their more generic counterpart, and remove translated syscalls from the table. List of the affected syscalls: creat, open -> openat raise -> thr_kill sleep, usleep -> nanosleep pause -> sigsuspend wait, wait3, waitpid -> wait4 Suggested and reviewed by: jilles (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-11 22:16:31 +00:00
John Baldwin	e275993995	Document CPU_WHICH_DOMAIN and bump Dd for cpuset.1. Missed in: r276829	2015-01-08 18:53:11 +00:00
Konstantin Belousov	1a744fefc2	Avoid calling internal libc function through PLT or accessing data though GOT, by staticizing and hiding. Add setter for __error_selector to hide it as well. Suggested and reviewed by: jilles Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-05 01:06:54 +00:00
Konstantin Belousov	8495e8b1e9	Fix known issues which blow up the process after dlopen("libthr.so") (or loading a dso linked to libthr.so into process which was not linked against threading library). - Remove libthr interposers of the libc functions, including __error(). Instead, functions calls are indirected through the interposing table, similar to how pthread stubs in libc are already done. Libc by default points either to syscall trampolines or to existing libc implementations. On libthr load, libthr rewrites the pointers to the cancellable implementations already in libthr. The interposition table is separate from pthreads stubs indirection table to not pull pthreads stubs into static binaries. - Postpone the malloc(3) internal mutexes initialization until libthr is loaded. This avoids recursion between calloc(3) and static pthread_mutex_t initialization. - Reinstall signal handlers with wrapper on libthr load. The _rtld_is_dlopened(3) is used to avoid useless calls to sigaction(2) when libthr is statically referenced from the main binary. In the process, fix openat(2), swapcontext(2) and setcontext(2) interposing. The libc symbols were exported at different versions than libthr interposers. Export both libc and libthr versions from libc now, with default set to the higher version from libthr. Remove unused and disconnected swapcontext(3) userspace implementation from libc/gen. No objections from: deischen Tested by: pho, antoine (exp-run) (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-03 18:38:46 +00:00
Christian Brueffer	0aee91e1fb	Various mdoc fixes and a few EOL whitespace removals. Found with: mandoc -Tlint	2014-12-21 12:36:36 +00:00
Bryan Drewery	4bb90cbe18	Bump Dd for r275846 MFC after: 3 weeks	2014-12-17 01:36:00 +00:00
Kirk McKusick	27ae6f4af7	Add some additional clarification and fix a few gammer nits. Reviewed by: kib MFC after: 3 weeks	2014-12-17 01:32:27 +00:00
Konstantin Belousov	19eaed5353	Markup fixes for kqueue(2), no content changes. Reviewed by: brueffer (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 days	2014-12-15 14:58:10 +00:00
Konstantin Belousov	237623b028	Add a facility for non-init process to declare itself the reaper of the orphaned descendants. Base of the API is modelled after the same feature from the DragonFlyBSD. Requested by: bapt Reviewed by: jilles (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-12-15 12:01:42 +00:00
Ed Maste	294246bb7d	Revert r274772: it is not valid on MIPS Reported by: sbruno	2014-11-25 03:50:31 +00:00
Baptiste Daroussin	2c900f1907	Ta is only allowed with Bl -column not in Bl -item	2014-11-23 23:35:16 +00:00
Joel Dahl	d4d112e34a	Misc mdoc fixes: - Remove superfluous paragraph macros. - Remove/fix empty or incorrect macros. - Sort sections into conventional order. - Terminate quoted strings properly. - Remove EOL whitespace.	2014-11-23 21:00:00 +00:00
Ed Maste	688fd61ae8	Use canonical __PIC__ flag It is automatically set when -fPIC is passed to the compiler. Reviewed by: dim, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D1179	2014-11-21 02:05:48 +00:00
Dmitry Chagin	186d9c3473	Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month	2014-11-13 05:26:14 +00:00
Dag-Erling Smørgrav	4b52e0d84f	<sys/param.h> is a superset of <sys/types.h> and must always come first. Coincidentally, today is the 11th anniversary of this man page's (and this bug's) first appearance in FreeBSD. MFC after: 3 days	2014-11-01 09:10:21 +00:00
Gavin Atkinson	ff8d5270bc	Slightly improve grammar in EAGAIN description. PR: 176806 Submitted by: Jeremy Chadwick MFC after: 3 days	2014-10-15 23:39:47 +00:00
Xin LI	b888b86e6f	accept(2) may and can return EAGAIN, document it. MFC after: 1 week	2014-10-10 03:05:55 +00:00
Bryan Drewery	c1efb88730	Document [EPERM] for UNIX sockets. MFC after: 2 weeks	2014-09-30 00:06:53 +00:00
John Baldwin	7a9f047ba7	- Remove mention of MAP_INHERIT. It hasn't been implemented for thirteen years. - Remove mention of unimplemented MAP_SWAP. There are no future plans to implement it. Submitted by: alc (2)	2014-09-17 19:45:34 +00:00
Enji Cooper	2bcdab32c3	Bump .Dd for the content change done to access(2) in r271655 PR: 181155 Sponsored by: EMC / Isilon Storage Division	2014-09-16 00:59:08 +00:00
Enji Cooper	257597a434	Validate the mode argument in access, eaccess, and faccessat for optional POSIX compliance and to improve compatibility with Linux and NetBSD The issue was identified with lib/libc/sys/t_access:access_inval from NetBSD Update the manpage accordingly PR: 181155 Reviewed by: jilles (code), jmmv (code), wblock (manpage), wollman (code) MFC after: 4 weeks Phabric: D678 (code), D786 (manpage) Sponsored by: EMC / Isilon Storage Division	2014-09-16 00:56:47 +00:00
John-Mark Gurney	e2cc4003e2	document mqueuefs is required for mq_open...	2014-09-15 22:32:35 +00:00
John Baldwin	5fd3f8b3b6	Add stricter checking of some mmap() arguments: - Fail with EINVAL if an invalid protection mask is passed to mmap(). - Fail with EINVAL if an unknown flag is passed to mmap(). - Fail with EINVAL if both MAP_PRIVATE and MAP_SHARED are passed to mmap(). - Require one of either MAP_PRIVATE or MAP_SHARED for non-anonymous mappings. Reviewed by: alc, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D698	2014-09-15 17:20:13 +00:00
Joel Dahl	cfaa96f327	Minor mdoc nit.	2014-09-09 14:34:54 +00:00
Baptiste Daroussin	42e62eca52	Extend kqueue's EVFILT_TIMER by adding precision unit flags support Define the precision macros as bits sets to conform with XNU equivalent. Test fflags passed for EVFILT_TIMER and return EINVAL in case an invalid flag is passed. Phabric: https://phabric.freebsd.org/D421 Reviewed by: kib	2014-07-18 14:27:04 +00:00
Kevin Lo	92511d108b	Document that listen(2) can fail with EDESTADDRREQ.	2014-07-15 02:21:51 +00:00
Mark Johnston	d3fe75eb62	Fix a typo. MFC after: 3 days	2014-07-09 01:33:35 +00:00
Konstantin Belousov	c22de76166	Note that most errors are possible for all syscalls from utimes(2) family. Minor wording corrections. Based on the suggestions by bde. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-03 11:19:16 +00:00
Sergey Kandaurov	521aa90cb8	Document EINVAL as per POSIX. This also follows r124335-r124336, r225827. PR: 191382 MFC after: 1 week Sponsored by: Nginx, Inc.	2014-06-26 10:21:00 +00:00
Garrett Wollman	775a76844f	Catch up with many years of changes: o Document PF_LOCAL as being an alias for PF_UNIX o Document POSIX standardization of this interface using AF_* constants rather than PF_* constants, and note the three particular families which POSIX standardizes. o Note anticipated POSIX standardization of SOCK_CLOEXEC. o Delete from listing protocol families that FreeBSD doesn't support (in some cases, like PF_PUP, has never supported). o Add to listing some current protocol families that have been introduced in the last decade or so. o Document the correspondence of PF_* and AF_* constants. We should probably change the documentation to make the AF_* constants primary, but this commit does not do so. Reviewed by: kevlo@ MFC after: 1 month	2014-06-24 20:23:18 +00:00
Joel Dahl	df2d82e003	mdoc: remove superfluous paragraph macros.	2014-06-23 18:40:21 +00:00
Baptiste Daroussin	8fbf3d50e3	use .Mt to mark up email addresses consistently (part4) PR: 191174 Submitted by: Franco Fichtner <franco at lastsummer.de>	2014-06-23 08:25:03 +00:00
Konstantin Belousov	11c42bcc54	Add MAP_EXCL flag for mmap(2). It should be combined with MAP_FIXED, and prevents the request from deleting existing mappings in the region, failing instead. Reviewed by: alc Discussed with: jhb Tested by: markj, pho (previous version, as part of the bigger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-19 05:00:39 +00:00
Konstantin Belousov	8efb419829	The time come to remove the wrapper, most likely, but tidy up it code instead for now. Remove spurious blank line, use C89 definition, wrap long line. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-19 04:55:00 +00:00
Benjamin Kaduk	245d93279b	Minor mdoc fix Submitted by: hrs Approved by: hrs (mentor, implicit)	2014-05-30 02:16:28 +00:00
Benjamin Kaduk	6953d7db5c	Correct documentation of the limit on how much memory can be mlock()ed vm.max_wired is a system-wide limit, not per-process. Reword the section to make this more clear. PR: docs/189214 Submitted by: Lawrence Chen (original text) Approved by: hrs (mentor)	2014-05-17 03:05:52 +00:00
Peter Holm	e103f5b1c0	msync(2) must return ENOMEM and not EINVAL when the address is outside the allowed range or when one or more pages are not mapped. This according to The Open Group Base Specifications Issue 7. Discussed with: attilio, Bruce Evans Reviewed by: alc, Garrett Cooper Reported by: ATF MFC after: 2 weeks Sponsored by: EMC / Isilon storage division	2014-05-07 08:38:02 +00:00
Ed Schouten	f56cfe8d61	Fix table alignment. EVFILT_PROCDESC is longer than the existing filters.	2014-04-07 18:17:31 +00:00
Ed Schouten	38219d6acd	Implement kqueue(2) for procdesc(4). kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that behaves the same, but operates on a procdesc(4) instead. Only implement NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also returns the exit status of the process, meaning that we can now obtain this value, even if pdwait4(2) is still unimplemented. Notes: - Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will be used on totally different descriptor types, this should not clash. - Let procdesc_kqops_event() reuse the same structure as filt_proc(). The only difference is that procdesc_kqops_event() should also be able to deal with the case where the process was already terminated after registration. Simply test this when hint == 0. - Fix some style(9) issues in filt_proc() to keep it consistent with the newly added procdesc_kqops_event(). - Save the exit status of the process in pd->pd_xstat, as we cannot pick up the proctree_lock from within procdesc_kqops_event(). Discussed on: arch@ Reviewed by: kib@	2014-04-07 18:10:49 +00:00
Warner Losh	a5fc5b6223	Convert from WITHOUT_SYSCALL_COMPAT to MK_SYSCALL_COMPAT.	2014-04-05 17:54:43 +00:00
Ed Schouten	2ad6bba714	Correct return type of pdfork(2). The pdfork(2) man page states: "pdfork() returns a PID, 0 or -1, as fork(2) does." As it returns a PID, the return type should obviously be pid_t. As int and pid_t have the same size on all architectures, this change does not affect the ABI in any way.	2014-04-04 19:53:45 +00:00
Eitan Adler	45ebf5d172	Use the correct variable name in the example code.	2014-03-30 04:40:41 +00:00
Robert Watson	cf321a51b1	Update system man pages for s/capability.h/capsicum.h/. MFC after: 3 weeks	2014-03-27 21:43:00 +00:00
Marcel Moolenaar	8876613dc5	Replace use of ${.CURDIR} by ${LIBC_SRCTOP} and define ${LIBC_SRCTOP} if not already defined. This allows building libc from outside of lib/libc using a reach-over makefile. A typical use-case is to build a standard ILP32 version and a COMPAT32 version in a single iteration by building the COMPAT32 version using a reach-over makefile. Obtained from: Juniper Networks, Inc.	2014-03-04 02:19:39 +00:00
Benjamin Kaduk	af1e239814	syncer(4) is a kernel process, not a user process Noticed by: Geoffrey Thomas <gthomas@mokafive.com> Approved by: hrs (mentor)	2014-02-27 04:06:34 +00:00
Christian Brueffer	9cba0f9670	Match the correct variable to the variable description. PR: 121173 Submitted by: Thomas Mueller <tmueller at sysgo.com> MFC after: 1 week	2014-02-21 13:53:41 +00:00
John-Mark Gurney	ad6a53db5f	document _JAIL as a possible option to set a cpuset for a jail.. MFC after: 3 days	2014-02-15 07:01:45 +00:00
Christian Brueffer	a578215eed	Fix a typo. MFC after: 1 week	2014-02-03 22:16:46 +00:00
Konstantin Belousov	49d39308ba	The posix_madvise(3) and posix_fadvise(2) should return error on failure, same as posix_fallocate(2). Noted by: Bob Bishop <rb@gid.co.uk> Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-01-30 18:04:39 +00:00
Ulrich Spörlein	d7d8b00bec	mdoc: fix several uses of the Fx macro to point to actual releases. Found by: make manlint	2014-01-28 21:40:10 +00:00
Konstantin Belousov	2852de0489	The posix_fallocate(2) syscall should return error number on error, without modifying errno. Reported and tested by: Gennady Proskurin <gpr@mail.ru> Reviewed by: mdf PR: standards/186028 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-01-23 17:24:26 +00:00
Sergey Kandaurov	dccae053f7	Update EINVAL description. This matches current POSIX standards and actual FreeBSD behavior. MFC after: 1 week	2014-01-23 09:37:03 +00:00
Jilles Tjoelker	b83686c8fe	Add some missing .Nm for newer syscalls in existing man pages. MFC after: 1 week	2014-01-11 22:00:16 +00:00
Sergey Kandaurov	d3178d7d27	- Fix EBADF description, in following the future POSIX tc and what FreeBSD actually implements. - Improve grammar: use more preferred "can", not "could". Submitted by: jilles	2013-12-27 16:57:38 +00:00
Sergey Kandaurov	4ca1cd1d63	Fix an apparent typo. MFC after: 3 days	2013-12-26 19:18:43 +00:00
Sergey Kandaurov	e44aa9fde0	Provide the manual page for aio_fsync(2). Reviewed by: davidxu MFC after: 1 week	2013-12-26 19:16:30 +00:00
Sergey Kandaurov	5acf8f8325	The compile time constant limit on number of swap devices was removed in 5.2. As such, remove the EINVAL error saying so. Currently the vm.nswapdev sysctl just represents the number of added swap devices. MFC after: 1 week	2013-12-25 16:01:29 +00:00
Ruslan Ermilov	0f987f1f08	shm_open(2): Fixed the history information. While here, sort xrefs. Reviewed by: jhb	2013-12-18 12:18:17 +00:00
Joel Dahl	2727e97436	mdoc: remove EOL whitespace.	2013-12-06 21:22:33 +00:00
John Baldwin	d4e3c0a2d7	Various updates and tweaks to the wait(2) manpage. PR: docs/183904 Submitted by: Michael Galassi <michaelgalassi@gmail.com> Reviewed by: kib, wblock (earlier version)	2013-12-03 21:00:13 +00:00
Jilles Tjoelker	b865f8ef40	chmod(2): Document S_ISVTX following SUSv3/SUSv4. S_ISTXT is non-standard. While here, also update fchmodat() standards entry to POSIX.1-2008.	2013-12-01 12:24:57 +00:00
Jilles Tjoelker	09466daf8c	waitid(2): Do not tell userland programmers to include <sys/signal.h>. Userland should get these definitions by including <signal.h>.	2013-12-01 11:59:37 +00:00
Pawel Jakub Dawidek	f2b525e6b9	Make process descriptors standard part of the kernel. rwhod(8) already requires process descriptors to work and having PROCDESC in GENERIC seems not enough, especially that we hope to have more and more consumers in the base. MFC after: 3 days	2013-11-30 15:08:35 +00:00
Sergey Kandaurov	dc211b3d40	Fix extattr(2) MLINKS. MFC after: 1 week	2013-11-09 00:36:09 +00:00
Pawel Jakub Dawidek	6f62d278e8	- Add manual pages for capability rights (rights(4)), cap_rights_init(3) family of functions and cap_rights_get(3) function. - Update remaining Capsicum-related manual pages. Reviewed by: bdrewery MFC after: 3 days	2013-11-04 14:10:22 +00:00
Jilles Tjoelker	1947c8a6d1	kqueue: Change error for kqueues rlimit from EMFILE to ENOMEM and document this error condition in the kqueue(2) manual page. Discussed with: kib	2013-11-03 23:06:24 +00:00
Konstantin Belousov	85a0ddfd0b	Add a resource limit for the total number of kqueues available to the user. Kqueue now saves the ucred of the allocating thread, to correctly decrement the counter on close. Under some specific and not real-world use scenario for kqueue, it is possible for the kqueues to consume memory proportional to the square of the number of the filedescriptors available to the process. Limit allows administrator to prevent the abuse. This is kernel-mode side of the change, with the user-mode enabling commit following. Reported and tested by: pho Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-10-21 16:46:12 +00:00
Jilles Tjoelker	0f49c96cfc	accept(2): Update portability note for accept4(). The accept(2) man page warns that O_NONBLOCK and other properties on the new socket may vary across implementations. However, this issue only applies to accept() and not to accept4(). On the other hand, accept4() is not commonly available yet. Reported by: pluknet Reviewed by: bjk Approved by: re (kib)	2013-10-01 21:17:18 +00:00
Joel Dahl	828378a6d3	Minor mdoc improvements. Approved by: re (blanket)	2013-09-19 19:43:38 +00:00
John Baldwin	55648840de	Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month	2013-09-19 18:53:42 +00:00
Bryan Drewery	c36029e6dc	Consistently reference file descriptors as "fd". 55 other manpages used "fd", while these used "d" and "filedes". MFC after: 1 week Approved by: gjb Approved by: re (delphij)	2013-09-12 00:53:38 +00:00
John Baldwin	edb572a38c	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)	2013-09-09 18:11:59 +00:00
Jilles Tjoelker	550ac4a8e8	wait(2): Add some possible caveats to standards section.	2013-09-07 11:41:52 +00:00
Jilles Tjoelker	75b1cda430	Update some signal man pages for multithreading.	2013-09-06 09:08:40 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
Robert Watson	7b223d2286	Xref capsicum(4) and procdesc(4) from pdfork(2). Suggested by: sbruno MFC after: 3 days	2013-08-28 20:00:25 +00:00
Joel Dahl	4d5c7c633b	Remove EOL whitespace.	2013-08-22 16:02:20 +00:00
Kenneth D. Merry	7da1a731c6	Expand the use of stat(2) flags to allow storing some Windows/DOS and CIFS file attributes as BSD stat(2) flags. This work is intended to be compatible with ZFS, the Solaris CIFS server's interaction with ZFS, somewhat compatible with MacOS X, and of course compatible with Windows. The Windows attributes that are implemented were chosen based on the attributes that ZFS already supports. The summary of the flags is as follows: UF_SYSTEM: Command line name: "system" or "usystem" ZFS name: XAT_SYSTEM, ZFS_SYSTEM Windows: FILE_ATTRIBUTE_SYSTEM This flag means that the file is used by the operating system. FreeBSD does not enforce any special handling when this flag is set. UF_SPARSE: Command line name: "sparse" or "usparse" ZFS name: XAT_SPARSE, ZFS_SPARSE Windows: FILE_ATTRIBUTE_SPARSE_FILE This flag means that the file is sparse. Although ZFS may modify this in some situations, there is not generally any special handling for this flag. UF_OFFLINE: Command line name: "offline" or "uoffline" ZFS name: XAT_OFFLINE, ZFS_OFFLINE Windows: FILE_ATTRIBUTE_OFFLINE This flag means that the file has been moved to offline storage. FreeBSD does not have any special handling for this flag. UF_REPARSE: Command line name: "reparse" or "ureparse" ZFS name: XAT_REPARSE, ZFS_REPARSE Windows: FILE_ATTRIBUTE_REPARSE_POINT This flag means that the file is a Windows reparse point. ZFS has special handling code for reparse points, but we don't currently have the other supporting infrastructure for them. UF_HIDDEN: Command line name: "hidden" or "uhidden" ZFS name: XAT_HIDDEN, ZFS_HIDDEN Windows: FILE_ATTRIBUTE_HIDDEN This flag means that the file may be excluded from a directory listing if the application honors it. FreeBSD has no special handling for this flag. The name and bit definition for UF_HIDDEN are identical to the definition in MacOS X. UF_READONLY: Command line name: "urdonly", "rdonly", "readonly" ZFS name: XAT_READONLY, ZFS_READONLY Windows: FILE_ATTRIBUTE_READONLY This flag means that the file may not written or appended, but its attributes may be changed. ZFS currently enforces this flag, but Illumos developers have discussed disabling enforcement. The behavior of this flag is different than MacOS X. MacOS X uses UF_IMMUTABLE to represent the DOS readonly permission, but that flag has a stronger meaning than the semantics of DOS readonly permissions. UF_ARCHIVE: Command line name: "uarch", "uarchive" ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE Windows name: FILE_ATTRIBUTE_ARCHIVE The UF_ARCHIVED flag means that the file has changed and needs to be archived. The meaning is same as the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute. msdosfs and ZFS have special handling for this flag. i.e. they will set it when the file changes. sys/param.h: Bump __FreeBSD_version to 1000047 for the addition of new stat(2) flags. chflags.1: Document the new command line flag names (e.g. "system", "hidden") available to the user. ls.1: Reference chflags(1) for a list of file flags and their meanings. strtofflags.c: Implement the mapping between the new command line flag names and new stat(2) flags. chflags.2: Document all of the new stat(2) flags, and explain the intended behavior in a little more detail. Explain how they map to Windows file attributes. Different filesystems behave differently with respect to flags, so warn the application developer to take care when using them. zfs_vnops.c: Add support for getting and setting the UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN, UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags. All of these flags are implemented using attributes that ZFS already supports, so the on-disk format has not changed. ZFS currently doesn't allow setting the UF_REPARSE flag, and we don't really have the other infrastructure to support reparse points. msdosfs_denode.c, msdosfs_vnops.c: Add support for getting and setting UF_HIDDEN, UF_SYSTEM and UF_READONLY in MSDOSFS. It supported SF_ARCHIVED, but this has been changed to be UF_ARCHIVE, which has the same semantics as the DOS archive attribute instead of inverse semantics like SF_ARCHIVED. After discussion with Bruce Evans, change several things in the msdosfs behavior: Use UF_READONLY to indicate whether a file is writeable instead of file permissions, but don't actually enforce it. Refuse to change attributes on the root directory, because it is special in FAT filesystems, but allow most other attribute changes on directories. Don't set the archive attribute on a directory when its modification time is updated. Windows and DOS don't set the archive attribute in that scenario, so we are now bug-for-bug compatible. smbfs_node.c, smbfs_vnops.c: Add support for UF_HIDDEN, UF_SYSTEM, UF_READONLY and UF_ARCHIVE in SMBFS. This is similar to changes that Apple has made in their version of SMBFS (as of smb-583.8, posted on opensource.apple.com), but not quite the same. We map SMB_FA_READONLY to UF_READONLY, because UF_READONLY is intended to match the semantics of the DOS readonly flag. The MacOS X code maps both UF_IMMUTABLE and SF_IMMUTABLE to SMB_FA_READONLY, but the immutable flags have stronger meaning than the DOS readonly bit. stat.h: Add definitions for UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY and UF_HIDDEN. The definition of UF_HIDDEN is the same as the MacOS X definition. Add commented-out definitions of UF_COMPRESSED and UF_TRACKED. They are defined in MacOS X (as of 10.8.2), but we do not implement them (yet). ufs_vnops.c: Add support for getting and setting UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY, UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS. Alphabetize the flags that are supported. These new flags are only stored, UFS does not take any action if the flag is set. Sponsored by: Spectra Logic Reviewed by: bde (earlier version)	2013-08-21 23:04:48 +00:00
Pawel Jakub Dawidek	fe0670cfb3	Correct function name and return value.	2013-08-17 14:55:31 +00:00
John Baldwin	5aa60b6f21	Add new mmap(2) flags to permit applications to request specific virtual address alignment of mappings. - MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n). Requests for n >= number of bits in a pointer or less than the size of a page fail with EINVAL. This matches the API provided by NetBSD. - MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used to optimize the chances of using large pages. By default it will align the mapping on a large page boundary (the system is free to choose any large page size to align to that seems best for the mapping request). However, if the object being mapped is already using large pages, then it will align the virtual mapping to match the existing large pages in the object instead. - Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment. MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE. - mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than explicitly using VMFS_SUPER_SPACE. All device objects are forced to use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively equivalent. Reviewed by: alc MFC after: 1 month	2013-08-16 21:13:55 +00:00
Jilles Tjoelker	fdafa7840f	pselect(2): Add xref to sigsuspend(2).	2013-08-16 14:06:29 +00:00
Jilles Tjoelker	5219e2caba	Add man page dup3(3).	2013-08-16 13:16:27 +00:00
Jilles Tjoelker	f57087b21c	sigsuspend(2): Add xrefs to pselect(2) and sigwait-alikes.	2013-08-15 22:33:27 +00:00
John Baldwin	513bfc4fe2	Enhance the description of NOTE_TRACK: - NOTE_TRACK has never triggered a NOTE_TRACK event from the parent pid. If NOTE_FORK is set, the listener will get a NOTE_FORK event from the parent pid, but not a separate NOTE_TRACK event. - Explicitly note that the event added to monitor the child process preserves the fflags from the original event. - Move the description of NOTE_TRACKERR under NOTE_TRACK as it is not a bit for the user to set (which is what this list pupports to be). Also, explicitly note that if an error occurs, the NOTE_CHILD event will not be generated. MFC after: 1 week	2013-07-25 19:34:24 +00:00
Ed Maste	4e1d691281	Document EINVAL error return from PT_LWPINFO	2013-07-22 18:18:21 +00:00
Joel Dahl	2a82581d9b	Minor mdoc fixes.	2013-06-09 07:15:43 +00:00
Jilles Tjoelker	172886a93e	sigaction(2): Document various non-POSIX functions as async-signal safe.	2013-06-08 13:45:43 +00:00
Gleb Smirnoff	6160e12c10	Add new system call - aio_mlock(). The name speaks for itself. It allows to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4). Reviewed by: kib, jilles Sponsored by: Nginx, Inc.	2013-06-08 13:27:57 +00:00
Jilles Tjoelker	4b08438c22	dup(2): Clarify return value, in particular of dup2().	2013-05-31 22:09:31 +00:00
Jilles Tjoelker	4e3f0e45cf	sigaction(2): *at system calls are async-signal safe.	2013-05-31 21:31:38 +00:00
Jilles Tjoelker	f8732c7fc3	sigaction(2): Extend description of async-signal safe functions: * Improve description when unsafe functions are unsafe. * Add various safe functions from POSIX.1-2008 and Austin Group issue #692.	2013-05-31 21:25:51 +00:00
Jilles Tjoelker	0bbe34c35d	fork(2): Add information about fork() in multi-threaded processes. There is nothing about pthread_atfork(3) or extensions like calling malloc(3) in the child process as this may be unreliable or broken.	2013-05-31 20:46:08 +00:00
Jilles Tjoelker	45100a722a	fork(2): #include <sys/types.h> is not needed.	2013-05-31 14:48:37 +00:00
Ed Maste	e2e9c35fa4	Remove the advertising clause from the Regents of the University of California's license, per the letter dated July 22, 1999.	2013-05-28 21:05:06 +00:00
Jilles Tjoelker	24f3b0bcd0	cap_rights_limit(2): CAP_ACCEPT also permits accept4(2).	2013-05-27 21:37:19 +00:00
Jilles Tjoelker	0bbacb9c66	sigreturn(2): Remove ancient compatibility warning about 4.2BSD. The HISTORY subsection still says that sigreturn() was added in 4.3BSD.	2013-05-25 13:59:40 +00:00
Julian Elischer	956e8eee53	Update the setfib man page to reflect recent changes.	2013-05-20 20:47:40 +00:00
Sergey Kandaurov	e0906c9a0d	POSIX 1003.1-2008: add ENOTRECOVERABLE, EOWNERDEAD errnos.	2013-05-04 19:07:22 +00:00
Jilles Tjoelker	ed5987bd08	accept(2), pipe(2): Fix .Dd.	2013-05-01 22:47:47 +00:00
Jilles Tjoelker	dc570d5e56	Add pipe2() system call. The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function. If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p). If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper. Reviewed by: kan, kib	2013-05-01 22:42:42 +00:00
Jilles Tjoelker	da7d2afb6d	Add accept4() system call. The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().) The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code. Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.	2013-05-01 20:10:21 +00:00
Jilles Tjoelker	3143f63a23	intro(2): Fix some errors in ENFILE and EMFILE descriptions. MFC after: 1 week	2013-04-27 11:55:23 +00:00
Jilles Tjoelker	e160aec9a5	getdtablesize(2): Describe what this function actually does. getdtablesize() returns the limit on new file descriptors; this says nothing about existing descriptors. MFC after: 1 week	2013-04-24 21:24:35 +00:00
Sergey Kandaurov	89bbe1496d	Keep up with negative addrlen check removal in r249649.	2013-04-22 09:18:50 +00:00
Jilles Tjoelker	2cd19a510a	dup(2): Remove incorrect sentence about getdtablesize(). There are no getdtablesize() bounds on the file descriptor to be duplicated; it only has to be open. If the RLIMIT_NOFILE rlimit was decreased after opening the file descriptor, it may be greater than or equal to getdtablesize() but still valid. MFC after: 1 week	2013-04-21 19:42:04 +00:00
Joel Dahl	cd088fc43a	Remove cross-references to nonexistent CPU_SET(3) manpage. Also fix cpu_getaffinity(2) document title. PR: 176317 Submitted by: brucec	2013-04-21 06:46:41 +00:00
George V. Neville-Neil	599c412493	Correct the returned message lengths for timeval and bintime control messages (SO_BINTIME, SO_TIMEVAL). Obtained from: phk	2013-04-05 18:09:43 +00:00
Matthew D Fleming	e324bf91e8	Fix return type of extattr_set_* and fix rmextattr(8) utility. extattr_set_{fd,file,link} is logically a write(2)-like operation and should return ssize_t, just like extattr_get_. Also, the user-space utility was using an int for the return value of extattr_get_ and extattr_list_*, both of which return an ssize_t. MFC after: 1 week	2013-04-02 05:30:41 +00:00
Jilles Tjoelker	de9dcfba06	accept(2): Mention inheritance of O_ASYNC and signal destination. While almost nobody uses O_ASYNC, and rightly so, the inheritance of the related properties across accept() is a portability issue like the inheritance of O_NONBLOCK.	2013-03-26 22:46:56 +00:00
Pawel Jakub Dawidek	2883fbd521	Document chflagsat(2). Obtained from: jilles	2013-03-21 23:05:44 +00:00
Pawel Jakub Dawidek	e948704e4b	Implement chflagsat(2) system call, similar to fchmodat(2), but operates on file flags. Reviewed by: kib, jilles Sponsored by: The FreeBSD Foundation	2013-03-21 22:59:01 +00:00
Pawel Jakub Dawidek	b4b2596b97	- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type u_long. Before this change it was of type int for syscalls, but prototypes in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not for lchflags(2)) stated that it was u_long. Now some related functions use u_long type for flags (strtofflags(3), fflagstostr(3)). - Make path argument of type 'const char *' for consistency. Discussed on: arch Sponsored by: The FreeBSD Foundation	2013-03-21 22:44:33 +00:00
Jilles Tjoelker	46f10cc265	Allow O_CLOEXEC in posix_openpt() flags. PR: kern/162374 Reviewed by: ed	2013-03-21 21:39:15 +00:00
Jilles Tjoelker	c2e3c52e0d	Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC. This change allows creating file descriptors with close-on-exec set in some situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file descriptors (SCM_RIGHTS) atomically close-on-exec. The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD. MSG_CMSG_CLOEXEC is the first free bit for MSG_. The SOCK_ flags are not passed to MAC because this may cause incorrect failures and can be done later via fcntl() anyway. On the other hand, audit is expected to cope with the new flags. For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags argument. Reviewed by: kib	2013-03-19 20:58:17 +00:00
Gleb Smirnoff	8863cc408c	There are actually two different cases when mlock(2) returns ENOMEM. Clarify this, taking text from SUS. Reviewed by: kib	2013-03-19 05:44:25 +00:00
Pawel Jakub Dawidek	136cbf84ef	Add a note to the HISTORY section about lchflags(2) being introduced in FreeBSD 5.0.	2013-03-16 22:44:14 +00:00
Pawel Jakub Dawidek	7493f24ee6	- Implement two new system calls: int bindat(int fd, int s, const struct sockaddr addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des	2013-03-02 21:11:30 +00:00
Joel Dahl	fdf25068b7	mdoc: remove superfluous paragraph macro.	2013-03-02 06:55:55 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
Pawel Jakub Dawidek	d6f122f4fb	Provide cap_sandboxed(3) function, which is a wrapper around cap_getmode(2) system call, which has a nice property - it never fails, so it is a bit easier to use. If there is no support for capability mode in the kernel the function will return false (not in a sandbox). If the kernel is compiled with the support for capability mode, the function will return true or false depending if the calling process is in the capability mode sandbox or not respectively. Sponsored by: The FreeBSD Foundation	2013-03-02 00:11:27 +00:00
Pawel Jakub Dawidek	1f2ce2a086	Put one file per line so it is easier to read diffs against those files.	2013-02-16 22:21:46 +00:00
Ian Lepore	74938cbb7f	Make the F_READAHEAD option to fcntl(2) work as documented: a value of zero now disables read-ahead. It used to effectively restore the system default readahead hueristic if it had been changed; a negative value now restores the default. Reviewed by: kib	2013-02-13 15:09:16 +00:00

... 2 3 4 5 6 ...

1776 Commits