freebsd-skq

Author	SHA1	Message	Date
kib	e75ba1d5c4	Commit the 64-bit inode project. Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439	2017-05-23 09:29:05 +00:00
vangyzen	d6de25428d	Add clock_nanosleep() Add a clock_nanosleep() syscall, as specified by POSIX. Make nanosleep() a wrapper around it. Attach the clock_nanosleep test from NetBSD. Adjust it for the FreeBSD behavior of updating rmtp only when interrupted by a signal. I believe this to be POSIX-compliant, since POSIX mentions the rmtp parameter only in the paragraph about EINTR. This is also what Linux does. (NetBSD updates rmtp unconditionally.) Copy the whole nanosleep.2 man page from NetBSD because it is complete and closely resembles the POSIX description. Edit, polish, and reword it a bit, being sure to keep any relevant text from the FreeBSD page. Reviewed by: kib, ngie, jilles MFC after: 3 weeks Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D10020	2017-03-19 00:51:12 +00:00
ngie	32a5b43489	Replace dot-dot relative pathing with SRCTOP-relative paths where possible This reduces build output, need for recalculating paths, and makes it clearer which paths are relative to what areas in the source tree. The change in performance over a locally mounted UFS filesystem was negligible in my testing, but this may more positively impact other filesystems like NFS. LIBC_SRCTOP was left alone so Juniper (and other users) can continue to manipulate lib/libc/Makefile (and other Makefile.inc's under lib/libc) as include Makefiles with custom options. Discussed with: marcel, sjg MFC after: 1 week Reviewed by: emaste Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D9207	2017-01-20 03:23:24 +00:00
kib	e4920be6f2	Document thr_suspend(2) and thr_wake(2). Reviewed by: bjk, jilles Discussed with: emaste, wblock Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8016	2016-09-26 08:18:34 +00:00
badger	ec84c9826b	Add manpage for rctl_* system calls Reviewed by: trasz, wblock Approved by: kib (mentor) MFC after: 3 days Sponsored by: Dell Technologies Differential Revision: https://reviews.freebsd.org/D7877	2016-09-19 02:25:30 +00:00
brooks	5072762802	Fix spelling in comment. Submitted by: brueffer	2016-09-09 16:18:44 +00:00
brooks	dc591d96b2	Reduce duplicate NOASM and PSEUDO definitions The initial value of NOASM is nearly the same in all cases and the initial value of PSEUDO is the same in all cases so reduce duplication (and hopefully, future merge conflicts) by machine independent defaults. Also document the PSEUDO variable. Reviewed by: jhb, kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D7820	2016-09-08 22:38:20 +00:00
kib	fffac29068	Rewrite ptrace(2) wrappers in C. Besides removing hand-translation to assembler, this also adds missing wrappers for arm64 and risc-v. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7694	2016-08-29 18:47:51 +00:00
kib	21a721373f	Add fdatasync(2) man page, combined with fsync(2). Reviewed by: emaste, rpokala, wblock Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7522	2016-08-17 10:16:42 +00:00
kib	e04600d300	The fdatasync(2) call must be cancellation point. Sponsored by: The FreeBSD Foundation MFC after: 13 days	2016-08-16 08:27:03 +00:00
brooks	ecead64b41	Replace use of the pipe(2) system call with pipe2(2) with a zero flags value. This eliminates the need for machine dependant assembly wrappers for pipe(2). It also make passing an invalid address to pipe(2) return EFAULT rather than triggering a segfault. Document this behavior (which was already true for pipe2(2), but undocumented). Reviewed by: andrew Approved by: re (gjb) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D6815	2016-06-22 21:11:27 +00:00
kib	b8e76aa55c	Add thr*.2 and _umtx_op.2 manpages to the build. Sponsored by: The FreeBSD Foundation	2016-05-14 09:43:28 +00:00
kib	2b6ac44d5d	Annotate arm userspace assembler sources stating their tolerance to the non-executable stack. Reviewed by: andrew Sponsored by: The FreeBSD Foundation	2015-09-29 16:09:58 +00:00
bdrewery	3faab6f866	Avoid adding duplicates into OBJS. bsd.lib.mk already handles adding entries to OBJS based on SRCS. MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-09-22 04:55:28 +00:00
pfg	9b366d35ee	Move the stack protector to a new "secure" directory As part of the code refactoring to support FORTIFY_SOURCE we want a new subdirectory "secure" to keep the files related to security. Move the stack protector functions to this new directory. No functional change. Differential Review: https://reviews.freebsd.org/D3333	2015-08-14 03:03:13 +00:00
adrian	41db4b88e0	Add an initial NUMA affinity/policy configuration for threads and processes. This is based on work done by jeff@ and jhb@, as well as the numa.diff patch that has been circulating when someone asks for first-touch NUMA on -10 or -11. * Introduce a simple set of VM policy and iterator types. * tie the policy types into the vm_phys path for now, mirroring how the initial first-touch allocation work was enabled. * add syscalls to control changing thread and process defaults. * add a global NUMA VM domain policy. * implement a simple cascade policy order - if a thread policy exists, use it; if a process policy exists, use it; use the default policy. * processes inherit policies from their parent processes, threads inherit policies from their parent threads. * add a simple tool (numactl) to query and modify default thread/process policities. * add documentation for the new syscalls, for numa and for numactl. * re-enable first touch NUMA again by default, as now policies can be set in a variety of methods. This is only relevant for very specific workloads. This doesn't pretend to be a final NUMA solution. The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by 'sysctl vm.default_policy=rr'. This is only relevant if MAXMEMDOM is set to something other than 1. Ie, if you're using GENERIC or a modified kernel with non-NUMA, then this is a glorified no-op for you. Thank you to Norse Corp for giving me access to rather large (for FreeBSD!) NUMA machines in order to develop and verify this. Thank you to Dell for providing me with dual socket sandybridge and westmere v3 hardware to do NUMA development with. Thank you to Scott Long at Netflix for providing me with access to the two-socket, four-domain haswell v3 hardware. Thank you to Peter Holm for running the stress testing suite against the NUMA branch during various stages of development! Tested: * MIPS (regression testing; non-NUMA) * i386 (regression testing; non-NUMA GENERIC) * amd64 (regression testing; non-NUMA GENERIC) * westmere, 2 socket (thankyou norse!) * sandy bridge, 2 socket (thankyou dell!) * ivy bridge, 2 socket (thankyou norse!) * westmere-EX, 4 socket / 1TB RAM (thankyou norse!) * haswell, 2 socket (thankyou norse!) * haswell v3, 2 socket (thankyou dell) * haswell v3, 2x18 core (thankyou scott long / netflix!) * Peter Holm ran a stress test suite on this work and found one issue, but has not been able to verify it (it doesn't look NUMA related, and he only saw it once over many testing runs.) * I've tested bhyve instances running in fixed NUMA domains and cpusets; all seems to work correctly. Verified: * intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different NUMA policies for processes under test. Review: This was reviewed through phabricator (https://reviews.freebsd.org/D2559) as well as privately and via emails to freebsd-arch@. The git history with specific attributes is available at https://github.com/erikarn/freebsd/ in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy). This has been reviewed by a number of people (stas, rpaulo, kib, ngie, wblock) but not achieved a clear consensus. My hope is that with further exposure and testing more functionality can be implemented and evaluated. Notes: * The VM doesn't handle unbalanced domains very well, and if you have an overly unbalanced memory setup whilst under high memory pressure, VM page allocation may fail leading to a kernel panic. This was a problem in the past, but it's much more easily triggered now with these tools. * This work only controls the path through vm_phys; it doesn't yet strongly/predictably affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory isn't really guaranteed in any way. That's next on my plate. Sponsored by: Norse Corp, Inc.; Dell	2015-07-11 15:21:37 +00:00
kib	2254748ed0	The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x kernels which required padding before the off_t parameter. The fcntl(2) contains compatibility code to handle kernels before the struct flock was changed during the 8.x CURRENT development. The shims were reasonable to allow easier revert to the older kernel at that time. Now, two or three major releases later, shims do not serve any purpose. Such old kernels cannot handle current libc, so revert the compatibility code. Make padded syscalls support conditional under the COMPAT6 config option. For COMPAT32, the syscalls were under COMPAT6 already. Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to (partially) disable the removed shims. Reviewed by: jhb, imp (previous versions) Discussed with: peter Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-18 21:50:13 +00:00
kib	9a774084c8	Make wait6(2), waitid(3) and ppoll(2) cancellation points. The waitid() function is required to be cancellable by the standard. The wait6() and ppoll() follow the other syscalls in their groups. Reviewed by: jhb, jilles (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-18 21:35:41 +00:00
kib	6531ee3ae5	Make kevent(2) a cancellation point. Note that to cancel blocked kevent(2) call, changelist must be empty, since we cannot cancel a call which already made changes to the process state. And in reverse, call which only makes changes to the kqueue state, without waiting for an event, is not cancellable. This makes a natural usage model to migrate kqueue loop to support cancellation, where existing single kevent(2) call must be split into two: first uncancellable update of kqueue, then cancellable wait for events. Note that this is ABI-incompatible change, but it is believed that there is no cancel-safe code that relies on kevent(2) not being a cancellation point. Option to preserve the ABI would be to keep kevent(2) as is, but add new call with flags to specify cancellation behaviour, which only value seems to add complications. Suggested and reviewed by: jilles Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-03-29 19:14:41 +00:00
marius	c837ced420	Unbreak sparc64 after r276630 by calling __sparc_sigtramp_setup signal trampoline as part of the MD __sys_sigaction again. Submitted by: kib (initial versions) MFC after: 3 days	2015-02-16 22:13:03 +00:00
jilles	67db24d0f2	Add futimens and utimensat system calls. The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support. A new UTIME_* constant might allow setting birthtimes in future. Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes	2015-01-23 21:07:08 +00:00
kib	8d1dfb4106	Fix known issues which blow up the process after dlopen("libthr.so") (or loading a dso linked to libthr.so into process which was not linked against threading library). - Remove libthr interposers of the libc functions, including __error(). Instead, functions calls are indirected through the interposing table, similar to how pthread stubs in libc are already done. Libc by default points either to syscall trampolines or to existing libc implementations. On libthr load, libthr rewrites the pointers to the cancellable implementations already in libthr. The interposition table is separate from pthreads stubs indirection table to not pull pthreads stubs into static binaries. - Postpone the malloc(3) internal mutexes initialization until libthr is loaded. This avoids recursion between calloc(3) and static pthread_mutex_t initialization. - Reinstall signal handlers with wrapper on libthr load. The _rtld_is_dlopened(3) is used to avoid useless calls to sigaction(2) when libthr is statically referenced from the main binary. In the process, fix openat(2), swapcontext(2) and setcontext(2) interposing. The libc symbols were exported at different versions than libthr interposers. Export both libc and libthr versions from libc now, with default set to the higher version from libthr. Remove unused and disconnected swapcontext(3) userspace implementation from libc/gen. No objections from: deischen Tested by: pho, antoine (exp-run) (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-03 18:38:46 +00:00
dchagin	162012051b	Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month	2014-11-13 05:26:14 +00:00
imp	9878392e1a	Convert from WITHOUT_SYSCALL_COMPAT to MK_SYSCALL_COMPAT.	2014-04-05 17:54:43 +00:00
marcel	99c9726a00	Replace use of ${.CURDIR} by ${LIBC_SRCTOP} and define ${LIBC_SRCTOP} if not already defined. This allows building libc from outside of lib/libc using a reach-over makefile. A typical use-case is to build a standard ILP32 version and a COMPAT32 version in a single iteration by building the COMPAT32 version using a reach-over makefile. Obtained from: Juniper Networks, Inc.	2014-03-04 02:19:39 +00:00
pluknet	903839e958	Provide the manual page for aio_fsync(2). Reviewed by: davidxu MFC after: 1 week	2013-12-26 19:16:30 +00:00
pluknet	71c1c69583	Fix extattr(2) MLINKS. MFC after: 1 week	2013-11-09 00:36:09 +00:00
jhb	d3ef75b6c7	Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month	2013-09-19 18:53:42 +00:00
glebius	9a02f3097d	Add new system call - aio_mlock(). The name speaks for itself. It allows to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4). Reviewed by: kib, jilles Sponsored by: Nginx, Inc.	2013-06-08 13:27:57 +00:00
jilles	16772c421d	Add pipe2() system call. The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function. If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p). If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper. Reviewed by: kan, kib	2013-05-01 22:42:42 +00:00
jilles	299afd25fd	Add accept4() system call. The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().) The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code. Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.	2013-05-01 20:10:21 +00:00
pjd	01401cc9bc	Document chflagsat(2). Obtained from: jilles	2013-03-21 23:05:44 +00:00
pjd	702516e70b	- Implement two new system calls: int bindat(int fd, int s, const struct sockaddr addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des	2013-03-02 21:11:30 +00:00
pjd	f07ebb8888	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
pjd	cc57b32cb6	Put one file per line so it is easier to read diffs against those files.	2013-02-16 22:21:46 +00:00
kib	39bfcc5eed	Document wait6() and waitid(). PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month	2012-11-13 12:56:42 +00:00
kib	13a9f42818	Use struct vdso_timehands data to implement fast gettimeofday(2) and clock_gettime(2) functions if supported. The speedup seen in microbenchmarks is in range 4x-7x depending on the hardware. Only amd64 and i386 architectures are supported. Libc uses rdtsc and kernel data to calculate current time, if enabled by kernel. Hopefully, this code is going to migrate into vdso in some future. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month	2012-06-22 07:13:30 +00:00
pjd	e3b0ea6c34	Link EV_SET(3) to kqueue(2). MFC after: 3 days	2012-03-05 20:59:34 +00:00
delphij	d6760e1d0d	Update rtprio(2) manual page to reflect the latest changes in -CURRENT as well as provide documentation for rtprio_thread(2) system call. MFC after: 1 month X-MFC-after: r228470	2011-12-27 10:34:00 +00:00
lstewart	cca3084242	- Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate() system calls to provide feed-forward clock management capabilities to userspace processes. ffclock_getcounter() returns the current value of the kernel's feed-forward clock counter. ffclock_getestimate() returns the current feed-forward clock parameter estimates and ffclock_setestimate() updates the feed-forward clock parameter estimates. - Document the syscalls in the ffclock.2 man page. - Regenerate the script-derived syscall related files. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-21 01:26:10 +00:00
jhb	78c075174e	Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month	2011-11-04 04:02:50 +00:00
jonathan	5ecd1c9d40	Add experimental support for process descriptors A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc	2011-08-18 22:51:30 +00:00
jonathan	f77ff62fcd	Add cap_new(2) and cap_getrights(2) symbols to libc. These system calls have already been implemented in the kernel; now we hook up libc symbols so userspace can drive them. Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc	2011-07-20 13:29:39 +00:00
mdf	9c9a32d97b	Add the posix_fallocate(2) syscall. The default implementation in vop_stdallocate() is filesystem agnostic and will run as slow as a read/write loop in userspace; however, it serves to correctly implement the functionality for filesystems that do not implement a VOP_ALLOCATE. Note that __FreeBSD_version was already bumped today to 900036 for any ports which would like to use this function. Also reserve space in the syscall table for posix_fadvise(2). Reviewed by: -arch (previous version)	2011-04-18 16:32:22 +00:00
marcel	70f17b3a42	When building libc with the syscall compatibility, don't also generate the syscall assembly files. This results in conflicting dependencies and can cause unexpected results for parallel builds. This is because the .c file and the .S file both generate the same .o file. Submitted by: Simon Gerraty <sjg@juniper.net> Sponsored by: Juniper Networks	2011-03-17 04:40:37 +00:00
trasz	d9bca0ab24	Add manual page for getloginclass(2) and setloginclass(2).	2011-03-06 08:35:50 +00:00
rwatson	817c148c9f	Make cap_new(2) and cap_getmode(2) symbols from libc public so applications can link against them. Add man pages for the new system calls, with one errant forward reference to changes not yet present in FreeBSD, but soon will be. Reviewed by: anderson Obtained from: Capsicum Project Sponsored by: Google, Inc. Discussed with: benl, kris, pjd MFC after: 3 months	2011-03-03 11:31:08 +00:00
kib	736c72ab1a	Emit .note.GNU-stack for the syscall stubs generated by libc only on architectures that support this .note. In particular, do not unneccessary emit the notes on ia64 and sparc64, which ABI require non-executable stacks. Tested by: marcel	2011-01-25 21:06:49 +00:00
kib	a7db50394e	Emit .note.GNU-stack for the syscall stubs generated by libc.	2011-01-07 14:28:54 +00:00
davidxu	e129c18a83	Because POSIX does not allow EINTR to be returned from sigwait(), add a wrapper for it in libc and rework the code in libthr, the system call still can return EINTR, we keep this feature. Discussed on: thread Reviewed by: jilles	2010-09-10 01:47:37 +00:00

1 2 3 4

196 Commits