freebsd-dev

Author	SHA1	Message	Date
Ed Schouten	808d980506	Properly format pointer size independent CloudABI system calls. CloudABI has approximately 50 system calls that do not depend on the pointer size of the system. As the ABI is pretty compact, it takes little effort to each truss(8) the formatting rules for these system calls. Start off by formatting pointer size independent system calls. Changes: - Make it possible to include the CloudABI system call definitions in FreeBSD userspace builds. Add ${root}/sys to the truss(8) Makefile so we can pull in <compat/cloudabi/cloudabi_syscalldefs.h>. - Refactoring: patch up amd64-cloudabi64.c to use the CLOUDABI_* constants instead of rolling our own table. - Add table entries for all of the system calls. - Add new generic formatting types (UInt, IntArray) that we'll be using to format unsigned integers and arrays of integers. - Add CloudABI specific formatting types. Approved by: jhb Differential Revision: https://reviews.freebsd.org/D3836	2015-10-08 05:27:45 +00:00
Bryan Drewery	a730673058	Remove redundant RFFPWAIT/vfork(2) handling in Linux fork(2) and clone(2) wrappers. r161611 added some of the code from sys_vfork() directly into the Linux module wrappers since they use RFSTOPPED. In r232240, the RFFPWAIT handling was moved to syscallret(), thus this code in the Linux module is no longer needed as it will be called later. This also allows the Linux wrappers to benefit from the fix in r275616 for threads not getting suspended if their vforked child is stopped while they wait on them. Reviewed by: jhb, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3828	2015-10-07 19:10:38 +00:00
Andriy Gapon	2f2f522b5d	save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days	2015-09-28 12:14:16 +00:00
Edward Tomasz Napierala	089d32934a	Fixes a panic triggered by threaded Linux applications when running with RACCT/RCTL enabled. Reviewed by: ngie@, ed@ Tested by: Larry Rosenman <ler@lerctr.org> MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3470	2015-09-02 14:04:13 +00:00
Ed Schouten	bc1ace0b96	Decompose linkat()/renameat() rights to source and target. To make it easier to understand how Capsicum interacts with linkat() and renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}. This also addresses a shortcoming in Capsicum, where it isn't possible to disable linking to files stored in a directory. Creating hardlinks essentially makes it possible to access files with additional rights. Reviewed by: rwatson, wblock Differential Revision: https://reviews.freebsd.org/D3411	2015-08-27 15:16:41 +00:00
Ed Schouten	edcf7fbf59	Don't forget to invoke pre_execve() and post_execve(). CloudABI's proc_exec() was implemented before r282708 introduced pre_execve() and post_execve(). Sync up by adding these missing calls.	2015-08-17 13:07:12 +00:00
Ed Schouten	fbb624e76f	Add the last remaining system calls: send() and recv(). There is still one TODO item for these calls: add file descriptor passing. The data structures are already prepared for this. It's just the translation that's missing. Obtained from: http://github.com/NuxiNL/freebsd	2015-08-12 17:42:20 +00:00
Ed Schouten	2c20fbe43a	Use CAP_EVENT instead of CAP_PDWAIT. The cloudlibc pdwait() function ends up using FreeBSD's kqueue() in combination with EVFILT_PROCDESC. This depends on CAP_EVENT -- not CAP_PDWAIT. Obtained from: https://github.com/NuxiNL/freebsd	2015-08-12 11:07:03 +00:00
Ed Schouten	18528470cb	Make blocking CloudABI futex operations work. Blocking on locks and condition variables can be accomplished by polling and using the special filters CONDVAR, LOCK_RDLOCK and LOCK_WRLOCK. For now it wouldn't make sense to implement this functionality into kqueue() itself, for the reason that they are CloudABI specific and would require us to resize 'struct kevent' to hold all of the parameters of interest. Add a bandaid to the CloudABI poll system call to call into the futex code directly if it detects specific combinations of events that are used by the C library. Obtained from: https://github.com/NuxiNL/freebsd	2015-08-12 08:41:48 +00:00
Ed Schouten	322e16e87e	Make poll() and kqueue() on CloudABI work. This change implements two functions, cloudabi64_kevent_copyin() and cloudabi64_kevent_copyout(), that convert CloudABI structures to FreeBSD's struct kevent. CloudABI uses two structures: subscription_t and event_t. The former is used for input, whereas the latter is used for output. Unlike struct kevent, fields aren't overloaded for multiple purposes or for separate event types. For poll() we call into the newly introduced kern_kevent_anonymous() function that allows us to poll without a file descriptor. This function is not only used by poll(), but also by functions such as sleep() and clock_nanosleep(). Reviewed by: jmg Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3308	2015-08-12 07:59:00 +00:00
Ed Schouten	55a224afa2	Fall back to O_RDONLY -- not O_WRONLY. If CloudABI processes open files with a set of requested rights that do not match any of the privileges granted by O_RDONLY, O_WRONLY or O_RDWR, we'd better fall back to O_RDONLY -- not O_WRONLY.	2015-08-11 14:08:46 +00:00
Ed Schouten	9d9123a80d	Properly convert the error number to CloudABI's indexing. We currently return FreeBSD's errno value directly, which is of course not correct.	2015-08-11 14:07:04 +00:00
Ed Schouten	65c17fe451	Make cap_rights_limit() work for CloudABI processes. Call into the recently introduced kern_cap_rights_limit() function to restrict rights.	2015-08-11 08:44:19 +00:00
Ed Schouten	0f85ff377b	Add file_open(): the underlying system call of openat(). CloudABI purely operates on file descriptor rights (CAP_). File descriptor access modes (O_ACCMODE) are emulated on top of rights. Instead of accepting the traditional flags argument, file_open() copies in an fdstat_t object that contains the initial rights the descriptor should have, but also file descriptor flags that should persist after opening (APPEND, NONBLOCK, SYNC). Only flags that don't persist (EXCL, TRUNC, CREAT, DIRECTORY) are passed in as an argument. file_open() first converts the rights, the persistent flags and the non-persistent flags to fflags. It then calls into vn_open(). If successful, it installs the file descriptor with the requested rights, trimming off rights that don't apply to the type of the file that has been opened. Unlike kern_openat(), this function does not support /dev/fd/*. I can't think of a reason why we need to support this for CloudABI. Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3235	2015-08-06 06:47:28 +00:00
Ed Schouten	aaf53ab2aa	Correct the previous commit: remove the DECLARE_MODULE(). It looks like a MODULE_VERSION() can also appear on its own -- there is no need to use explicitly use DECLARE_MODULE(). Looking at other modules, this seems common practice.	2015-08-05 16:53:49 +00:00
Ed Schouten	b6efa27589	Add DECLARE_MODULE() to the "cloudabi" kernel module. This kernel module does not require any explicit initialization, but a module declaration is needed to let the "cloudabi64" kernel module automatically pull this in. Obtained from: https://github.com/NuxiNL/freebsd	2015-08-05 16:45:47 +00:00
Ed Schouten	36310bcd1d	Make fcntl(F_SETFL) work. The stat_put() system call can be used to modify file descriptor attributes, such as flags, but also Capsicum permission bits. Support for changing Capsicum bits will be added as soon as its dependent changes have been pushed through code review. Obtained from: https://github.com/NuxiNL/freebsd	2015-08-05 16:15:43 +00:00
Ed Schouten	2412ae2b8e	Regenerate the system call table.	2015-08-05 13:10:13 +00:00
Ed Schouten	2837d9ed43	Import the latest CloudABI system call definitions and table. We're going to need these for next code I'm going to send out for review: support for poll() and kqueue() on CloudABI.	2015-08-05 13:09:46 +00:00
Ed Schouten	db1c8ee585	Add the remaining pointer size independent CloudABI socket system calls. CloudABI uses a structure called cloudabi_sockstat_t. Think of it as 'struct stat' for sockets. It is used by functions such as getsockname(), getpeername(), some of the getsockopt() values, etc. This change implements the sock_stat_get() system call that returns a copy of this structure. The accept() system call should also return a full copy of this structure eventually, but for now we're only interested in the peer address. Add a TODO() to make sure this is patched up later on. Differential Revision: https://reviews.freebsd.org/D3218	2015-08-05 08:18:05 +00:00
Ed Schouten	4958fab8cd	Allow the creation of polling descriptors (kqueues) on CloudABI.	2015-08-05 07:37:06 +00:00
Ed Schouten	a2034cc98a	Allow the creation of kqueues with a restricted set of Capsicum rights. On CloudABI we want to create file descriptors with just the minimal set of Capsicum rights in place. The reason for this is that it makes it easier to obtain uniform behaviour across different operating systems. By explicitly whitelisting the operations, we can return consistent error codes, but also prevent applications from depending OS-specific behaviour. Extend kern_kqueue() to take an additional struct filecaps that is passed on to falloc_caps(). Update the existing consumers to pass in NULL. Differential Revision: https://reviews.freebsd.org/D3259	2015-08-05 07:36:50 +00:00
Ed Schouten	0c0964844e	Let the CloudABI futex code use umtx_keys. The CloudABI kernel still passes all of the cloudlibc unit tests. Reviewed by: vangyzen Differential Revision: https://reviews.freebsd.org/D3286	2015-08-04 06:02:03 +00:00
Ed Schouten	f52c3dd415	Allow CloudABI processes to create shared memory objects. Summary: Use the newly created `kern_shm_open()` function to create objects with just the rights that are actually needed. Reviewers: jhb, kib Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3260	2015-08-01 07:51:48 +00:00
Ed Schouten	367a13f905	Limit rights on process descriptors. On CloudABI, the rights bits returned by cap_rights_get() match up with the operations that you can actually perform on the file descriptor. Limiting the rights is good, because it makes it easier to get uniform behaviour across different operating systems. If process descriptors on FreeBSD would suddenly gain support for any new file operation, this wouldn't become exposed to CloudABI processes without first extending the rights. Extend fork1() to gain a 'struct filecaps' argument that allows you to construct process descriptors with custom rights. Use this in cloudabi_sys_proc_fork() to limit the rights to just fstat() and pdwait(). Obtained from: https://github.com/NuxiNL/freebsd	2015-07-31 10:21:58 +00:00
Ed Schouten	8328babdd0	Make pipes in CloudABI work. Summary: Pipes in CloudABI are unidirectional. The reason for this is that CloudABI attempts to provide a uniform runtime environment across different flavours of UNIX. Instead of implementing a custom pipe that is unidirectional, we can simply reuse Capsicum permission bits to support this. This is nice, because CloudABI already attempts to restrict permission bits to correspond with the operations that apply to a certain file descriptor. Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes a pair of filecaps. These filecaps are passed to the newly introduced falloc_caps() function that creates the descriptors with rights in place. Test Plan: CloudABI pipes seem to be created with proper rights in place: https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44 Reviewers: jilles, mjg Reviewed By: mjg Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3236	2015-07-29 17:18:27 +00:00
Ed Schouten	9d2332c9ee	Split up Capsicum to CloudABI rights conversion into two separate routines. CloudABI's openat() ensures that files are opened with the smallest set of relevant rights. For example, when opening a FIFO, unrelated rights like CAP_RECV are automatically removed. To remove unrelated rights, we can just reuse the code for this that was already present in the rights conversion function.	2015-07-29 12:42:45 +00:00
Ed Schouten	3720b82fa8	Implement CloudABI's readdir(). Summary: CloudABI's readdir() system call could be thought of as a mixture between FreeBSD's getdents(2) and pread(). Instead of using the file descriptor offset, userspace provides a 64-bit cloudabi_dircookie_t continue reading at a given point. CLOUDABI_DIRCOOKIE_START, having value 0, can be used to return entries at the start of the directory. The file descriptor offset is not used to store the cookie for the reason that in a file descriptor centric environment, it would make sense to allow concurrent use of a single file descriptor. The remaining space returned by the system call should be filled with a partially truncated copy of the next entry. The advantage of doing this is that it gracefully deals with long filenames. If the C library provides a buffer that is too small to hold a single entry, it can still extract the directory entry header, meaning that it can retry the read with a larger buffer or skip it using the cookie. Test Plan: This implementation passes the cloudlibc unit tests at: https://github.com/NuxiNL/cloudlibc/tree/master/src/libc/dirent Reviewers: marcel, kib Reviewed By: kib Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3226	2015-07-29 06:31:44 +00:00
Ed Schouten	1d96fd8d9f	Implement file attribute modification system calls for CloudABI. CloudABI uses a system call interface to modify file attributes that is more similar to KPI's/FUSE, namely where a stat structure is passed back to the kernel, together with a bitmask of attributes that should be changed. This would allow us to update any set of attributes atomically. That said, I'd rather not go as far as to actually implement it that way, as it would require us to duplicate more code than strictly needed. Let's just stick to the combinations that are actually used by cloudlibc. Obtained from: https://github.com/NuxiNL/freebsd	2015-07-28 12:57:19 +00:00
Ed Schouten	29515a68a5	Implement directory and FIFO creation. The file_create() system call can be used to create files of a given type. Right now it can only be used to create directories and FIFOs. As CloudABI does not expose filesystem permissions, this system call lacks a mode argument. Simply use 0777 or 0666 depending on the file type.	2015-07-28 06:50:47 +00:00
Ed Schouten	cec575201a	Make fstat() and friends work. Summary: CloudABI provides access to two different stat structures: - fdstat, containing file descriptor level status: oflags, file descriptor type and Capsicum rights, used by cap_rights_get(), fcntl(F_GETFL), getsockopt(SO_TYPE). - filestat, containing your regular file status: timestamps, inode number, used by fstat(). Unlike FreeBSD's stat::st_mode, CloudABI file descriptor types don't have overloaded meanings (e.g., returning S_ISCHR() for kqueues). Add a utility function to extract the type of a file descriptor accurately. CloudABI does not work with O_ACCMODEs. File descriptors have two sets of Capsicum-style rights: rights that apply to the file descriptor itself ('base') and rights that apply to any new file descriptors yielded through openat() ('inheriting'). Though not perfect, we can pretty safely decompose Capsicum rights to such a pair. This is done in convert_capabilities(). Test Plan: Tests for these system calls are fairly extensive in cloudlibc. Reviewers: jonathan, mjg, #manpages Reviewed By: mjg Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3171	2015-07-28 06:36:49 +00:00
Ed Schouten	af7e75f59d	Add a futex implementation for CloudABI. Summary: CloudABI provides two different types of futex objects: read-write locks and condition variables. There is no need to provide separate support for once objects and thread joining, as these are efficiently simulated by blocking on a read-write lock. Mutexes simply use read-write locks. Condition variables always have a lock object associated to them. They always know to which lock a thread needs to be migrated if woken up. This allows us to implement requeueing. A broadcast on a condition variable will never cause multiple threads to be woken up at once. They will be woken up iteratively. This implementation still has lots of room for improvement. Locking is coarse and right now we use linked lists to store all of the locks and condition variables, instead of using a hash table. The primary goal of this implementation was to behave correctly. Performance will be improved as we go. Test Plan: This futex implementation has been in use for the last couple of months and seems to work pretty well. All of the cloudlibc and libc++ unit tests seem to pass. Reviewers: dchagin, kib, vangyzen Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3148	2015-07-27 10:07:29 +00:00
Ed Schouten	533c8a29da	Regenerate system call table.	2015-07-27 10:04:28 +00:00
Ed Schouten	f4c06d124f	Sync in latest upstream system call definitions. Futex object scopes have been renamed from using their own constants to simply reusing the existing CLOUDABI_MAP_{PRIVATE,SHARED} flags, as they are more accurate in this context.	2015-07-27 10:04:06 +00:00
Ed Schouten	4615998165	Implement the basic system calls that operate on pathnames. Summary: Unlike FreeBSD, CloudABI does not use null terminated strings for its pathnames. Introduce a function called copyin_path() that can be used by all of the filesystem system calls that use pathnames. This change already implements the system calls that don't depend on any additional functionality (e.g., conversion of struct stat). Also implement the socket system calls that operate on pathnames, namely the ones used by the C library functions bindat() and connectat(). These don't receive a 'struct sockaddr_un', but just the pathname, meaning they could be implemented in such a way that they don't depend on the size of sun_path. For now, just use the existing interfaces. Add a missing #include to cloudabi_syscalldefs.h to get this code to build, as one of its macros depends on UINT64_C(). Test Plan: These implementations have already been tested in the CloudABI branch on GitHub. They pass all of the tests. Reviewers: kib, pjd Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3097	2015-07-24 07:46:02 +00:00
Ed Schouten	fef97e09d9	Allow us to create UNIX sockets and socketpairs in CloudABI processes.	2015-07-23 13:52:53 +00:00
Ed Schouten	c989441af6	Regenerate system call table.	2015-07-22 10:05:46 +00:00
Ed Schouten	73dcd7db56	Import upstream changes to the system call definitions. Support has been added for providing the scope of a futex operation, whether the futex is local to the process or shared between processes.	2015-07-22 10:04:53 +00:00
Ed Schouten	072cb63ddc	Make clock_gettime() and clock_getres() work for CloudABI programs. Though the standard C library uses a 'struct timespec' using a 64-bit 'time_t', there is no need to use such a type at the system call level. CloudABI uses a simple 64-bit unsigned timestamp in nanoseconds. This is sufficient to express any time value from 1970 to 2554. The CloudABI low-level interface also supports fetching timestamp values with a lower precision. Instead of overloading the clock ID argument for this purpose, the system call provides a precision argument that may be used to specify the maximum slack. The current system call implementation does not use this information, but it's good to already have this available. Expose cloudabi_convert_timespec(), as we're going to need this for fstat() as well. Obtained from: https://github.com/NuxiNL/freebsd	2015-07-21 15:08:13 +00:00
Ed Schouten	21d30b29d5	Make thread creation work for CloudABI processes. Summary: Remove the stub system call that was put in place during the system call import and replace it by a target-dependent version stored in sys/amd64. Initialize the thread in a way similar to cpu_set_upcall_kse(). We provide the entry point with two arguments: the thread ID and the argument pointer. Test Plan: Thread creation still seems to work, both for FreeBSD and CloudABI binaries. Reviewers: dchagin, mjg, kib Reviewed By: kib Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3110	2015-07-21 12:47:15 +00:00
Ed Schouten	62c31cffae	Make forking of CloudABI processes work. Just like FreeBSD+Capsicum, CloudABI uses process descriptors. Return the file descriptor number to the parent process. To the child process we both return a special value for the file descriptor number (CLOUDABI_PROCESS_CHILD). We also return the thread ID of the new thread in the copied process, so the threading library can reinitialize itself. Obtained from: https://github.com/NuxiNL/freebsd	2015-07-20 13:46:22 +00:00
Marcelo Araujo	f19e47d691	Add support to the jail framework to be able to mount linsysfs(5) and linprocfs(5). Differential Revision: D2846 Submitted by: Nikolai Lifanov <lifanov@mail.lifanov.com> Reviewed by: jamie	2015-07-19 08:52:35 +00:00
Konstantin Belousov	b4490c6e93	The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation	2015-07-18 09:02:50 +00:00
Ed Schouten	6256e57ba9	Implement CloudABI memory management system calls. Add support for the <sys/mman.h> functions by wrapping around our own implementations. There are no kern_() variants of these system calls, but we also don't need them in this case. It is sufficient to just call into the sys_() functions. Differential Revision: https://reviews.freebsd.org/D3033 Reviewed by: brooks	2015-07-17 09:00:38 +00:00
Ed Schouten	6e5fcd99df	Add a sysentvec for CloudABI on x86-64. Summary: For CloudABI we need to put two things on the stack of new processes: the argument data (a binary blob; not strings) and a startup data structure. The startup data structure contains interesting things such as a pointer to the ELF program header, the thread ID of the initial thread, a stack smashing protection canary, and a pointer to the argument data. Fetching system call arguments and setting the return value is similar to FreeBSD. The only differences are that system call 0 does not exist and that we call into cloudabi_convert_errno() to convert the error code. We also need this function in a couple of other places, so we'd better reuse it here. Reviewers: dchagin, kib Reviewed By: kib Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3098	2015-07-16 18:24:06 +00:00
Ed Schouten	457f7e23b1	Implement CloudABI's exec() call. Summary: In a runtime that is purely based on capability-based security, there is a strong emphasis on how programs start their execution. We need to make sure that we execute an new program with an exact set of file descriptors, ensuring that credentials are not leaked into the process accidentally. Providing the right file descriptors is just half the problem. There also needs to be a framework in place that gives meaning to these file descriptors. How does a CloudABI mail server know which of the file descriptors corresponds to the socket that receives incoming emails? Furthermore, how will this mail server acquire its configuration parameters, as it cannot open a configuration file from a global path on disk? CloudABI solves this problem by replacing traditional string command line arguments by tree-like data structure consisting of scalars, sequences and mappings (similar to YAML/JSON). In this structure, file descriptors are treated as a first-class citizen. When calling exec(), file descriptors are passed on to the new executable if and only if they are referenced from this tree structure. See the cloudabi-run(1) man page for more details and examples (sysutils/cloudabi-utils). Fortunately, the kernel does not need to care about this tree structure at all. The C library is responsible for serializing and deserializing, but also for extracting the list of referenced file descriptors. The system call only receives a copy of the serialized data and a layout of what the new file descriptor table should look like: int proc_exec(int execfd, const void data, size_t datalen, const int fds, size_t fdslen); This change introduces a set of fd*_remapped() functions: - fdcopy_remapped() pulls a copy of a file descriptor table, remapping all of the file descriptors according to the provided mapping table. - fdinstall_remapped() replaces the file descriptor table of the process by the copy created by fdcopy_remapped(). - fdescfree_remapped() frees the table in case we aborted before fdinstall_remapped(). We then add a function exec_copyin_data_fds() that builds on top these functions. It copies in the data and constructs a new remapped file descriptor. This is used by cloudabi_sys_proc_exec(). Test Plan: cloudabi-run(1) is capable of spawning processes successfully, providing it data and file descriptors. procstat -f seems to confirm all is good. Regular FreeBSD processes also work properly. Reviewers: kib, mjg Reviewed By: mjg Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3079	2015-07-16 07:05:42 +00:00
Ed Schouten	952c6e1010	Implement the trivial socket system calls: shutdown() and listen().	2015-07-15 11:27:34 +00:00
Ed Schouten	4fa92fb538	Make posix_fallocate() and posix_fadvise() work. We can map these system calls directly to the FreeBSD counterparts. The other filesystem related system calls will be sent out for review separately, as they are a bit more complex to get right.	2015-07-15 09:14:06 +00:00
Ed Schouten	707d98fe2f	Implement the CloudABI random_get() system call. The random_get() system call works similar to getentropy()/getrandom() on OpenBSD/Linux. It fills a buffer with random data. This change introduces a new function, read_random_uio(), that is used to implement read() on the random devices. We can call into this function from within the CloudABI compatibility layer. Approved by: secteam Reviewed by: jmg, markm, wblock Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3053	2015-07-14 18:45:15 +00:00
Ed Schouten	460ac6370a	Regenerate system call table for r285540.	2015-07-14 15:12:24 +00:00
Ed Schouten	1eb7c7cae3	Implement thread_tcb_set() and thread_yield(). The first system call is used to set the user TLS address. Right now this system call is invoked by the C library for both the initial thread and additional threads unconditionally, but in the future we'll only call this if the architecture does not support this. On recent x86-64 CPUs we could use the WRFSBASE instruction. This system call was erroneously placed in sys/compat/cloudabi64, even though it does not depend on any pointer size dependent datastructure. Move it to the right place. Obtained from: https://github.com/NuxiNL/freebsd	2015-07-14 15:11:50 +00:00
Ed Schouten	03744d7c8d	Implement {,p}{read,write}{,v}(). Add a routine similar to copyinuio() and freebsd32_copyinuio() that copies in CloudABI's struct iovecs. These are then translated into FreeBSD format and placed in a 'struct uio', so we can call into the kern_*() functions. Obtained from: https://github.com/NuxiNL/freebsd	2015-07-14 14:33:21 +00:00
Ed Schouten	f9675092b8	Let proc_raise() call into pksignal() directly. Summary: As discussed with kib@ in response to r285404, don't call into kern_sigaction() within proc_raise() to reset the signal to the default action before delivery. We'd better do that during image execution. Change the code to simply use pksignal(), so we don't waste cycles on functions like pfind() to look up the currently running process itself. Test Plan: This change has also been pushed into the cloudabi branch on GitHub. The raise() tests still seem to pass. Reviewers: kib Reviewed By: kib Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3076	2015-07-14 12:16:14 +00:00
Ed Schouten	4f1905177a	Implement normal and abnormal process termination. CloudABI does not provide an explicit kill() system call, for the reason that there is no access to the global process namespace. Instead, it offers a raise() system call that can at least be used to terminate the process abnormally. CloudABI does not support installing signal handlers. CloudABI's raise() system call should behave as if the default policy is set up. Call into kern_sigaction(SIG_DFL) before calling sys_kill() to force this. Obtained from: https://github.com/NuxiNL/freebsd	2015-07-11 19:41:31 +00:00
Ed Schouten	a4001f4cb9	Use FDDUP_NORMAL instead of hardcoding value 0. Proposed by: mjg	2015-07-11 18:53:30 +00:00
Ed Schouten	329d1bca7f	Add missing function parameter. A function parameter got added in r285356, meaning that the call to kern_dup() needs to be patched up.	2015-07-11 18:39:16 +00:00
Mateusz Guzik	b34be824a0	linprocfs: vref the vnode passed to vn_fullpath	2015-07-11 16:44:28 +00:00
Mateusz Guzik	8a08cec166	Create a dedicated function for ensuring that cdir and rdir are populated. Previously several places were doing it on its own, partially incorrectly (e.g. without the filedesc locked) or even actively harmful by populating jdir or assigning rootvnode without vrefing it. Reviewed by: kib	2015-07-11 16:22:48 +00:00
Mateusz Guzik	f0725a8e1e	Move chdir/chroot-related fdp manipulation to kern_descrip.c Prefix exported functions with pwd_. Deduplicate some code by adding a helper for setting fd_cdir. Reviewed by: kib	2015-07-11 16:19:11 +00:00
Adrian Chadd	871ef8b0d8	Regenerate syscalls.	2015-07-11 15:22:11 +00:00
Mateusz Guzik	5fe97c20dc	fd: split kern_dup flags argument into actual flags and a mode Tidy up the code inside to switch on the mode.	2015-07-10 11:01:30 +00:00
Ed Schouten	2491302a04	Add implementations for some of the CloudABI file descriptor system calls. All of the CloudABI system calls that operate on file descriptors of an arbitrary type are prefixed with fd_. This change adds wrappers for most of these system calls around their FreeBSD equivalents. The dup2() system call present on CloudABI deviates from POSIX, in the sense that it can only be used to replace existing file descriptor. It cannot be used to create new ones. The reason for this is that this is inherently thread-unsafe. Furthermore, there is no need on CloudABI to use fixed file descriptor numbers. File descriptors 0, 1 and 2 have no special meaning. This change exposes the kern_dup() through <sys/syscallsubr.h> and puts the FDDUP_* flags in <sys/filedesc.h>. It then adds a new flag, FDDUP_MUSTREPLACE to force that file descriptors are replaced -- not allocated. Differential Revision: https://reviews.freebsd.org/D3035 Reviewed by: mjg	2015-07-09 16:07:01 +00:00
Ed Schouten	f355e810cf	Generate CloudABI system call table with proper $FreeBSD$ tags.	2015-07-09 07:21:33 +00:00
Ed Schouten	6d338f9a81	Import the CloudABI datatypes and create a system call table. CloudABI is a pure capability-based runtime environment for UNIX. It works similar to Capsicum, except that processes already run in capabilities mode on startup. All functionality that conflicts with this model has been omitted, making it a compact binary interface that can be supported by other operating systems without too much effort. CloudABI is 'secure by default'; the idea is that it should be safe to run arbitrary third-party binaries without requiring any explicit hardware virtualization (Bhyve) or namespace virtualization (Jails). The rights of an application are purely determined by the set of file descriptors that you grant it on startup. The datatypes and constants used by CloudABI's C library (cloudlibc) are defined in separate files called syscalldefs_mi.h (pointer size independent) and syscalldefs_md.h (pointer size dependent). We import these files in sys/contrib/cloudabi and wrap around them in cloudabi*_syscalldefs.h. We then add stubs for all of the system calls in sys/compat/cloudabi or sys/compat/cloudabi64, depending on whether the system call depends on the pointer size. We only have nine system calls that depend on the pointer size. If we ever want to support 32-bit binaries, we can simply add sys/compat/cloudabi32 and implement these nine system calls again. The next step is to send in code reviews for the individual system call implementations, but also add a sysentvec, to allow CloudABI executabled to be started through execve(). More information about CloudABI: - GitHub: https://github.com/NuxiNL/cloudlibc - Talk at BSDCan: https://www.youtube.com/watch?v=SVdF84x1EdA Differential Revision: https://reviews.freebsd.org/D2848 Reviewed by: emaste, brooks Obtained from: https://github.com/NuxiNL/freebsd	2015-07-09 07:20:15 +00:00
Mateusz Guzik	f131759f54	fd: make 'rights' a manadatory argument to fget* functions	2015-07-05 19:05:16 +00:00
Konstantin Belousov	2a4734651c	svr4 emulator has custom sendsig() implementation, it does not use sv_sigtbl. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-06-29 10:33:04 +00:00
Dmitry Chagin	3c91646b46	Add EPOLLRDHUP support. Tested by: abi at abinet dot ru	2015-06-20 05:40:35 +00:00
Mateusz Guzik	4da8456f0a	Replace struct filedesc argument in getvnode with struct thread This is is a step towards removal of spurious arguments.	2015-06-16 13:09:18 +00:00
Mateusz Guzik	9ef8328d52	fd: make rights a mandatory argument to fget_unlocked	2015-06-16 09:52:36 +00:00
Mateusz Guzik	6871c7c3f1	linux: make sure to grab all cow structs when creating a thread This is a fixup for r284214. Reported and tested by: Ivan Klymenko <fidaj ukr.net>	2015-06-10 15:34:43 +00:00
Mateusz Guzik	f6f6d24062	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.	2015-06-10 10:48:12 +00:00
Jung-uk Kim	1a01bdf906	Properly initialize flags for accept4(2) not to return spurious EINVAL. Note this fixes a Linuxulator regression introduced in r283490. PR: 200662	2015-06-08 20:03:15 +00:00
Dmitry Chagin	32ba368ba9	Finish r283544. In exec case properly detach threads from user space before suicide.	2015-06-06 06:12:14 +00:00
Eric van Gyzen	63e4c6cdf9	Provide vnode in memory map info for files on tmpfs When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431 MFC after: 2 weeks Reviewed by: jhb Approved by: kib (mentor)	2015-06-02 18:37:04 +00:00
Dmitry Chagin	d707582f83	When I merged the lemul branch I missied kib@'s r282708 commit. This is not the final fix as I need properly cleanup thread resources before other threads suicide. Tested by: Ruslan Makhmatkhanov	2015-05-25 20:44:46 +00:00
Dmitry Chagin	5c2748d5e7	Linux nanosleep() and clock_nanosleep() system calls always writes the remaining time into the structure pointed to by rmtp unless rmtp is NULL. The value of *rmtp can then be used to call nanosleep() again and complete the specified pause if the previous call was interrupted. Note. clock_nanosleep() with an absolute time value does not write the remaining time. While here fix whitespaces and typo in SDT_PROBE.	2015-05-24 18:14:38 +00:00
Dmitry Chagin	bbf392d5ef	Convert SCM_TIMESTAMP in recvmsg().	2015-05-24 18:13:21 +00:00
Dmitry Chagin	5989b75bdb	The latest cp tool is trying to use the btrfs clone operation that is implemented via ioctl interface. First of all return ENOTSUP for this operation as a cp fallback to usual method in that case. Secondly, do not print out the message about unimplemented operation.	2015-05-24 18:12:04 +00:00
Dmitry Chagin	4f65e9cff4	Fix an mbuf(9) leak in sendmsg() under failure condition and remove unneeded check for failed M_WAITOK allocation. Found by: Brainy Code Scanner Reported by: Maxime Villard	2015-05-24 18:10:07 +00:00
Dmitry Chagin	9802eb9ebc	Implement Linux specific syncfs() system call.	2015-05-24 18:08:01 +00:00
Dmitry Chagin	d9cbe8f0ef	Properly check tv_nsec value. The tv_nsec field can also be one of the special value UTIME_NOW or UTIME_OMIT.	2015-05-24 18:06:46 +00:00
Dmitry Chagin	4cf10e2934	Since FreeBSD supports SOCK_CLOEXEC & SOCK_NONBLOCK options remove its emulation via fcntl call from Linuxulator.	2015-05-24 18:06:12 +00:00
Dmitry Chagin	e1ff74c0f7	Implement recvmmsg() and sendmmsg() system calls.	2015-05-24 18:04:04 +00:00
Dmitry Chagin	b7aaa9fdb0	Reduce duplication between MD Linux code by moving msg related struct definitions out into the compat/linux/linux_socket.h	2015-05-24 18:03:14 +00:00
Dmitry Chagin	6e4c8004dc	Implement epoll_pwait() system call.	2015-05-24 18:00:14 +00:00
Dmitry Chagin	b7c4ebdb56	Convert signal number to native for VT_SETMODE ioctl and remove strange and invalid ISSIGVALID macro. The code has not been tested right way but it was originally broken.	2015-05-24 17:59:17 +00:00
Dmitry Chagin	19d8b461f4	Add utimensat() system call. The patch developed by Jilles Tjoelker and Andrew Wilcox and adopted for lemul branch by me.	2015-05-24 17:57:07 +00:00
Dmitry Chagin	dcc0e6c493	Simplify linprocfs_doprocenviron(). Remove extra proc visibility checks and initialize pn_vis by well known procfs_candebug().	2015-05-24 17:53:48 +00:00
Dmitry Chagin	5885e5ab29	Convert Linux signal number to the FreeBSD.	2015-05-24 17:49:09 +00:00
Dmitry Chagin	94c0ee30b4	Convert Linux sigsets before showing. Linux kernel displays sigset always as 16x4 bit mask.	2015-05-24 17:48:34 +00:00
Dmitry Chagin	4ab7403bbd	Rework signal code to allow using it by other modules, like linprocfs: 1. Linux sigset always 64 bit on all platforms. In order to move Linux sigset code to the linux_common module define it as 64 bit int. Move Linux sigset manipulation routines to the MI path. 2. Move Linux signal number definitions to the MI path. In general, they are the same on all platforms except for a few signals. 3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion tables to avoid conversion errors. 4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside of allowed on Linux signal numbers. PR: 197216	2015-05-24 17:47:20 +00:00
Dmitry Chagin	a6fd8bb2bb	Add support for /proc/<pid>/auxv.	2015-05-24 17:46:04 +00:00
Dmitry Chagin	ffefd5707d	Add vdso and stack names to the /proc/self/maps.	2015-05-24 17:44:42 +00:00
Dmitry Chagin	a7ac457613	According to Linux man sigaltstack(3) shall return EINVAL if the ss argument is not a null pointer, and the ss_flags member pointed to by ss contains flags other than SS_DISABLE. However, in fact, Linux also allows SS_ONSTACK flag which is simply ignored. For buggy apps (at least mono) ignore other than SS_DISABLE flags as a Linux do. While here move MI part of sigaltstack code to the appropriate place. Reported by: abi at abinet dot ru	2015-05-24 17:44:08 +00:00
Dmitry Chagin	76672e1113	Add EPOLLERR flag handling to epoll. Tested by: abi at abinet dot ru	2015-05-24 17:42:45 +00:00
Dmitry Chagin	e2ff4b9864	As fo_fill_kinfo() does not check fo_fill_kinfo to NULL add a fo_fill_kinfo op to eventfdops. Reported by: trinity	2015-05-24 17:40:14 +00:00
Dmitry Chagin	b6aeb7d5dd	Add preliminary fallocate system call implementation to emulate posix_fallocate() function. Differential Revision: https://reviews.freebsd.org/D1523 Reviewed by: emaste	2015-05-24 17:33:21 +00:00
Dmitry Chagin	16ac71bc4f	Delete the duplicate of linux_to_native_clockid() function. Differential Revision: https://reviews.freebsd.org/D1521 Reviewed by: trasz	2015-05-24 17:30:31 +00:00
Dmitry Chagin	680982281b	Do not use struct l_timespec without conversion. While here move args->timeout handling before acquiring the futex key at FUTEX_WAIT path. Differential Revision: https://reviews.freebsd.org/D1520 Reviewed by: trasz	2015-05-24 17:29:18 +00:00
Dmitry Chagin	7e947ccc81	Add prototypes for static futex functions. Differential Revision: https://reviews.freebsd.org/D1519 Reviewed by: trasz	2015-05-24 17:27:59 +00:00
Dmitry Chagin	2166e4e0a5	As for now our tmpfs is no longer being considered "highly experimental" remove /dev/shm magic commited in r218497 and convert tmpfs type to an expected magic number. Differential Revision: https://reviews.freebsd.org/D1497 Reviewed by: emaste, trasz	2015-05-24 17:26:58 +00:00
Dmitry Chagin	5dd1d097f8	Print out unsupported futex operation message only once for the process. Differential Revision: https://reviews.freebsd.org/D1498	2015-05-24 17:25:57 +00:00
Dmitry Chagin	2711aba97e	Add some clock mappings used in glibc 2.20. Differential Revision: https://reviews.freebsd.org/D1465 Reviewd by: trasz	2015-05-24 17:23:08 +00:00
Dmitry Chagin	7d96520b25	Improve ktr(9) records in thread managment code. Differential Revision: https://reviews.freebsd.org/D1464 Reviewed by: trasz	2015-05-24 17:09:07 +00:00
Dmitry Chagin	68cf0367e9	Use local struct proc * varable instead of dereferencing td->td_proc. Differential Revision: https://reviews.freebsd.org/D1463 Reviewed by: emaste	2015-05-24 17:08:25 +00:00
Dmitry Chagin	97cfa5c899	Avoid unnecessary em zeroing in non-exec path as it already zeroed by malloc with M_ZERO flag and move zeroing to the proper place in exec path. Differential Revision: https://reviews.freebsd.org/D1462 Reviewed by: trasz	2015-05-24 17:07:10 +00:00
Dmitry Chagin	e0327ddba0	Remove the unnecessary cast. Differential Revision: https://reviews.freebsd.org/D1461 Reviewed by: emaste	2015-05-24 17:05:59 +00:00
Dmitry Chagin	a6b40812ec	Implement ppoll() system call. Differential Revision: https://reviews.freebsd.org/D1105 Reviewed by: trasz	2015-05-24 16:59:25 +00:00
Dmitry Chagin	3d7b4b3720	td_sigmask of a newly created thread copied from td. Remove excess initialization of td_sigmask. Differential Revision: https://reviews.freebsd.org/D1128 Reviewed by: emaste	2015-05-24 16:56:32 +00:00
Dmitry Chagin	2c4f134b25	Update Linux compat revision to 32. Differential Revision: https://reviews.freebsd.org/D1122 Reviewed by: emaste	2015-05-24 16:55:32 +00:00
Dmitry Chagin	520e9c187d	Fix linux_common module build with KTR option. Differential Revision: https://reviews.freebsd.org/D1096 Reviewed by: trasz	2015-05-24 16:52:45 +00:00
Dmitry Chagin	a31d76867d	Implement eventfd system call. Differential Revision: https://reviews.freebsd.org/D1094 In collaboration with: Jilles Tjoelker	2015-05-24 16:49:14 +00:00
Dmitry Chagin	3e89b64168	Put the correct value for the abi_nfdbits parameter of kern_select() for all supported Linuxulators. Differential Revision: https://reviews.freebsd.org/D1093 Reviewed by: trasz	2015-05-24 16:47:13 +00:00
Dmitry Chagin	e16fe1c730	Implement epoll family system calls. This is a tiny wrapper around kqueue() to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data, so we keep user data in the proc emuldata. Initial patch developed by rdivacky@ in 2007, then extended by Yuri Victorovich @ r255672 and finished by me in collaboration with mjg@ and jillies@. Differential Revision: https://reviews.freebsd.org/D1092	2015-05-24 16:41:39 +00:00
Dmitry Chagin	d2b6dbc06f	Implement F_DUPFD_CLOEXEC fcntl flag. Differential Revision: https://reviews.freebsd.org/D1089 Reviewed by: trasz	2015-05-24 16:34:57 +00:00
Dmitry Chagin	bfa4d74baf	Add several fcntl flags. Differential Revision: https://reviews.freebsd.org/D1088 Reviewed by: trasz	2015-05-24 16:32:52 +00:00
Dmitry Chagin	4d0f380d87	To avoid code duplication move open/fcntl definitions to the MI header file. Differential Revision: https://reviews.freebsd.org/D1087 Reviewed by: trasz	2015-05-24 16:31:44 +00:00
Dmitry Chagin	26c68e1fe5	Use the BSD_TO_LINUX_SIGNAL() wherever there is no need to check the ABI as it is known. Differential Revision: https://reviews.freebsd.org/D1086	2015-05-24 16:30:23 +00:00
Dmitry Chagin	2245df381a	Convert Linux wait options to the FreeBSD. Check wait options as a Linux do. Linux always set WEXITED option not a WUNTRACED\|WNOHANG which is a strange bug. Differential Revision: https://reviews.freebsd.org/D1085 Reviewed by: trasz	2015-05-24 16:28:58 +00:00
Dmitry Chagin	7a7a6efc25	Set WIFCONTINUED to the wait status if needed. Differential Revision: https://reviews.freebsd.org/D1083 Reviewed by: trasz	2015-05-24 16:27:38 +00:00
Dmitry Chagin	9599b0ec3a	Rewrite linux_recvfrom. To avoid double conversion of sockaddr use kern_recvit() directly. And check fromlen parameter before sockaddr copyin and conversion. Differential Revision: https://reviews.freebsd.org/D1082	2015-05-24 16:26:55 +00:00
Dmitry Chagin	4048f59cd0	Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory. Differential Revision: https://reviews.freebsd.org/D1080	2015-05-24 16:24:24 +00:00
Dmitry Chagin	baa232bbfd	Change linux faccessat syscall definition to match actual linux one. The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented within the glibc wrapper function for faccessat(). If either of these flags are specified, then the wrapper function employs fstatat() to determine access permissions. Differential Revision: https://reviews.freebsd.org/D1078 Reviewed by: trasz	2015-05-24 16:18:03 +00:00
Dmitry Chagin	e0d3ea8c65	Where possible we will use M_LINUX malloc(9) type. Move M_FUTEX defines to the linux_common.ko. Differential Revision: https://reviews.freebsd.org/D1077 Reviewed by: emaste	2015-05-24 16:14:41 +00:00
Dmitry Chagin	0edc82b564	Move FEATURE macros for v4l and v4l2 to the common module. Differential Revision: https://reviews.freebsd.org/D1075 Reviewed by: emaste	2015-05-24 16:00:01 +00:00
Dmitry Chagin	bc27367760	Refund the proc emuldata struct for future use. For now move flags from thread emuldata to proc emuldata as it was originally intended. As we can have both 64 & 32 bit Linuxulator running any eventhandler can be called twice for us. To prevent this move eventhandlers code from linux_emul.c to the linux_common.ko module. Differential Revision: https://reviews.freebsd.org/D1073	2015-05-24 15:54:58 +00:00
Dmitry Chagin	67d3974849	Introduce a new module linux_common.ko which is intended for the following primary purposes: 1. Remove the dependency of linsysfs and linprocfs modules from linux.ko, which will be architecture specific on amd64. 2. Incorporate into linux_common.ko general code for platforms on which we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit). 3. Move malloc(9) declaration to linux_common.ko, to enable getting memory usage statistics properly. Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko. Temporarily remove dtrace garbage from linux_mib.c and linux_util.c Differential Revision: https://reviews.freebsd.org/D1072 In collaboration with: Vassilis Laganakos. Reviewed by: trasz	2015-05-24 15:51:18 +00:00
Dmitry Chagin	606bcc1741	Add newfstatat system call for 64-bit Linuxulator. Differential Revision: https://reviews.freebsd.org/D1071 Reviewed by: trasz	2015-05-24 15:48:34 +00:00
Dmitry Chagin	4ca75bed31	Fix compilation with -DDEBUG option. Differential Revision: https://reviews.freebsd.org/D1070 Reviewed by: trasz	2015-05-24 15:47:15 +00:00
Dmitry Chagin	36204c3016	Add 64 bit support to the vdso. Differential Revision: https://reviews.freebsd.org/D1069 Reviewed by: trasz	2015-05-24 15:45:36 +00:00
Dmitry Chagin	31eb438886	x86_64 Linux do not use multiplexing on ipc system calls. Move struct ipc_perm definition to the MD path as it differs for 64 and 32 bit platform. Differential Revision: https://reviews.freebsd.org/D1068 Reviewed by: trasz	2015-05-24 15:44:41 +00:00
Dmitry Chagin	7f8f1d7f7a	Disable i386 call for x86-64 Linux. Differential Revision: https://reviews.freebsd.org/D1067 Reviewed by: trasz	2015-05-24 15:43:53 +00:00
Dmitry Chagin	0ed687fa2e	Print out proper procmap entry for 64 bit binaries. Differential Revision: https://reviews.freebsd.org/D1066 Reviewed by: trasz	2015-05-24 15:42:36 +00:00
Dmitry Chagin	a12b9b3d96	64-bit paltforms, like x86_64, do not use multiplexing on socketcall system calls. Differential Revision: https://reviews.freebsd.org/D1065 Reviewed by: trasz	2015-05-24 15:41:27 +00:00
Dmitry Chagin	297f61cc01	Get ready to commit x86_64 Linux emulation. All fields of type l_int in struct statfs are defined as l_long on i386 and amd64. Differential Revision: https://reviews.freebsd.org/D1064 Reviewed by: trasz	2015-05-24 15:39:08 +00:00
Dmitry Chagin	0020bdf13a	Put linux_platform into the vdso to avoid copying it onto the stack at every exec. Differential Revision: https://reviews.freebsd.org/D1062 Reviewed by: trasz	2015-05-24 15:30:52 +00:00
Dmitry Chagin	bdc379344a	Implement vdso - virtual dynamic shared object. Through vdso Linux exposes functions from kernel with proper DWARF CFI information so that it becomes easier to unwind through them. Using vdso is a mandatory for a thread cancelation && cleanup on a modern glibc. Differential Revision: https://reviews.freebsd.org/D1060	2015-05-24 15:28:17 +00:00
Dmitry Chagin	ae50b4d7b5	Implement pselect6() system call. Differential Revision: https://reviews.freebsd.org/D1051 Reviewed by: trasz	2015-05-24 15:21:25 +00:00
Dmitry Chagin	c3978c7bb1	Implement prlimit64() system call. Differential Revision: https://reviews.freebsd.org/D1050 Reviewed by: emaste, trasz	2015-05-24 15:18:19 +00:00
Dmitry Chagin	254a937ee5	Implement dup3() system call. Differential Revision: https://reviews.freebsd.org/D1049 Reviewed by: emaste	2015-05-24 15:14:51 +00:00
Dmitry Chagin	44e93b234f	Sched_rr_get_interval returns EINVAL in case when the invalid pid specified. This silence the ltp tests. Differential Revision: https://reviews.freebsd.org/D1048 Reviewed by: trasz	2015-05-24 15:13:56 +00:00
Dmitry Chagin	7ac9766db4	Implement rt_sigqueueinfo() system call. Differential Revision: https://reviews.freebsd.org/D1047 Reviewed by: trasz	2015-05-24 15:11:32 +00:00
Dmitry Chagin	e5fe4ccf59	Implement waitid() system call. Differential Revision: https://reviews.freebsd.org/D1046	2015-05-24 15:06:39 +00:00
Dmitry Chagin	001398c4c5	To reduce code duplication introduce linux_copyout_rusage() method. Use it in linux_wait4() system call and move linux_wait4() to the MI path. While here add a prototype for the static bsd_to_linux_rusage(). Differential Revision: https://reviews.freebsd.org/D2138 Reviewed by: trasz	2015-05-24 15:03:09 +00:00
Dmitry Chagin	a7ae3c557f	Add a function for converting wait options. Differential Revision: https://reviews.freebsd.org/D1045 Reviewed by: trasz	2015-05-24 15:00:27 +00:00
Dmitry Chagin	fe4ed1e768	Add a siginfo_t conversion function. Differential Revision: https://reviews.freebsd.org/D1044 Reviewed by: emaste, trasz	2015-05-24 14:58:30 +00:00
Dmitry Chagin	86bda7a02d	Remove a now unused define. Differential Revision: https://reviews.freebsd.org/D1043 Reviewed by: trasz	2015-05-24 14:57:39 +00:00
Dmitry Chagin	a6326909bb	Introduce LINUX_VERSION_STR, LINUX_VERSION_CODE macro for use instead of harcoded pr_osrelease, pr_osrel values. This will be used later in the VDSO. Differential Revision: https://reviews.freebsd.org/D1042 Reviewed by: trasz	2015-05-24 14:56:21 +00:00
Dmitry Chagin	5e609834bd	pthread_join() caller do futex_wait on child_clear_tid. As a results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined wake up the one thread. Differential Revision: https://reviews.freebsd.org/D1040	2015-05-24 14:54:12 +00:00
Dmitry Chagin	81338031c4	Switch linuxulator to use the native 1:1 threads. The reasons: 1. Get rid of the stubs/quirks with process dethreading, process reparent when the process group leader exits and close to this problems on wait(), waitpid(), etc. 2. Reuse our kernel code instead of writing excessive thread managment routines in Linuxulator. Implementation details: 1. The thread is created via kern_thr_new() in the clone() call with the CLONE_THREAD parameter. Thus, everything else is a process. 2. The test that the process has a threads is done via P_HADTHREADS bit p_flag of struct proc. 3. Per thread emulator state data structure is now located in the struct thread and freed in the thread_dtor() hook. Mandatory holdig of the p_mtx required when referencing emuldata from the other threads. 4. PID mangling has changed. Now Linux pid is the native tid and Linux tgid is the native pid, with the exception of the first thread in the process where tid and pid are one and the same. Ugliness: In case when the Linux thread is the initial thread in the thread group thread id is equal to the process id. Glibc depends on this magic (assert in pthread_getattr_np.c). So for system calls that take thread id as a parameter we should use the special method to reference struct thread. Differential Revision: https://reviews.freebsd.org/D1039	2015-05-24 14:53:16 +00:00
Dmitry Chagin	91d1786f65	In preparation for switching linuxulator to the use the native 1:1 threads add a hook for cleaning thread resources before the thread die. Differential Revision: https://reviews.freebsd.org/D1038	2015-05-24 14:51:29 +00:00
Dmitry Chagin	2003907d45	Implement a Linux version of sched_getparam() && sched_setparam(). Temporarily use the first thread in proc. Differential Revision: https://reviews.freebsd.org/D1036 Reviewed by: trasz	2015-05-24 14:45:57 +00:00
Dmitry Chagin	1aa90eca33	In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator). Linuxulator temporarily uses first thread in proc. Move linux_sched_rr_get_interval() to the MI part. Differential Revision: https://reviews.freebsd.org/D1032 Reviewed by: trasz	2015-05-24 14:39:26 +00:00
Dmitry Chagin	161acbb670	In preparation for switching linuxulator to the use the native 1:1 threads introduce linux_exit() stub instead of sys_exit() call (which terminates process). In the new linuxulator exit() system call terminates the calling thread (not a whole process). Differential Revision: https://reviews.freebsd.org/D1027 Reviewed by: trasz	2015-05-24 14:33:19 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Konstantin Belousov	7b445033ff	On exec, single-threading must be enforced before arguments space is allocated from exec_map. If many threads try to perform execve(2) in parallel, the exec map is exhausted and some threads sleep uninterruptible waiting for the map space. Then, the thread which won the race for the space allocation, cannot single-thread the process, causing deadlock. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-10 09:00:40 +00:00
Peter Wemm	76cd25496f	Fix an error in r281551, part of the getfsstat() / kern_getfsstat() rework. The number of entries was supposed to be returned to the user, not used as a scratch variable. This broke RELENG_4 jails starting up on current systems.	2015-05-05 05:14:12 +00:00
Edward Tomasz Napierala	310e931198	Simplify linux_getcwd(), removing code that was longer used. Differential Revision: https://reviews.freebsd.org/D2326 Reviewed by: dchagin@, kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-23 08:41:50 +00:00
Edward Tomasz Napierala	6289b482ec	Modify kern___getcwd() to take max pathlen limit as an additional argument. This will be used for the Linux emulation layer - for Linux, PATH_MAX is 4096 and not 1024. Differential Revision: https://reviews.freebsd.org/D2335 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-21 13:55:24 +00:00
Edward Tomasz Napierala	565716e60e	Add back fdrop() missed in r281726. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-19 07:35:18 +00:00
Edward Tomasz Napierala	92f7441328	Optimize the O_NOCTTY handling hack in linux_common_open(). Differential Revision: https://reviews.freebsd.org/D2323 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-19 07:12:16 +00:00
Edward Tomasz Napierala	94d014f079	Remove unused code from linux_mount(), and make it possible to mount any kind of filesystem instead of harcoded three. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-18 09:49:09 +00:00
Edward Tomasz Napierala	1c73bcab8e	Rewrite linprocfs_domtab() as a wrapper around kern_getfsstat(). This adds missing jail and MAC checks. Differential Revision: https://reviews.freebsd.org/D2193 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-15 09:13:11 +00:00
Mateusz Guzik	90f54cbfeb	fd: remove filedesc argument from fdclose Just accept a thread instead. This makes it consistent with fdalloc. No functional changes.	2015-04-11 15:40:28 +00:00
John Baldwin	dbee5c671a	Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h> and export them to userland. - Define __HAVE_REG32 on platforms that define a reg32 structure and check for this in <sys/procfs.h> to control when to export prstatus32, etc. - Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures. libbfd looks for these types, and having them fixes 'gcore' in gdb of a 32-bit process on a 64-bit platform. - Use the structure definitions from <sys/procfs.h> in gcore's elf32 core dump code instead of duplicating the definitions. Differential Revision: https://reviews.freebsd.org/D2142 Reviewed by: kib, nathanw (powerpc bits) MFC after: 1 week	2015-04-08 16:30:45 +00:00
Edward Tomasz Napierala	67caead165	Remove unused code. Differential Revision: https://reviews.freebsd.org/D2195 Reviewed by: kib@, imp@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-02 10:19:24 +00:00
Mateusz Guzik	daf63fd2f9	cred: add proc_set_cred helper The goal here is to provide one place altering process credentials. This eases debugging and opens up posibilities to do additional work when such an action is performed.	2015-03-16 00:10:03 +00:00
Jilles Tjoelker	2b35e6a9f2	Run make sysent.	2015-01-23 21:08:24 +00:00
Jilles Tjoelker	2205e0d1bd	Add futimens and utimensat system calls. The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support. A new UTIME_* constant might allow setting birthtimes in future. Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes	2015-01-23 21:07:08 +00:00
Konstantin Belousov	677258f7e7	Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger attachment to the process. Note that the command is not intended to be a security measure, rather it is an obfuscation feature, implemented for parity with other operating systems. Discussed with: jilles, rwatson Man page fixes by: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-18 15:13:11 +00:00
Konstantin Belousov	b53fc49cd4	fcntl F_O{GET,SET}LK take pointer as the arg, handle them properly for compat32. Reported and tested by: Alex Tutubalin <lexa@lexa.ru> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-15 10:43:58 +00:00
Dmitry Chagin	1beb1a8e13	Regen for r276654 (__getcwd()).	2015-01-04 10:40:23 +00:00
Dmitry Chagin	9f7a06f27e	Indeed, instead of hiding the kern___getcwd() bug by bogus cast in r276564, change path type to char * (pathnames are always char ). And remove bogus casts of malloc(). kern___getcwd() internally doesn't actually use or support u_char paths, except to copy them to a normal char * path. These changes are not visible to libc as libc/gen/getcwd.c misdeclares __getcwd() as taking a plain char * path. While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as we always have sysproto.h. Pointed out by: bde MFC after: 1 week	2015-01-04 10:34:02 +00:00
Dmitry Chagin	9fa04b52ec	Cast *path to silence clang -Wpointer-sign warning. MFC after: 1 week	2015-01-02 19:29:32 +00:00
Dmitry Chagin	de90b09a79	Remove Giant from linux_getcwd() due to VFS is MPSAFE now. Discussed with: kib MFC after: 1 week	2015-01-02 18:36:08 +00:00
Dmitry Chagin	857ad5a31b	Fix Clang -Wpointer-sign warnings. MFC after: 1 week	2015-01-01 20:53:38 +00:00
Dmitry Chagin	5072ad67ae	Fix Clang warning: passing 'unsigned int ' to parameter of type 'int ' converts between pointers to integer types with different sign. MFC after: 1 week	2015-01-01 19:57:24 +00:00
Gleb Kurtsou	dde58752db	Adjust printf format specifiers for dev_t and ino_t in kernel. ino_t and dev_t are about to become uint64_t. Reviewed by: kib, mckusick	2014-12-17 07:27:19 +00:00
Konstantin Belousov	237623b028	Add a facility for non-init process to declare itself the reaper of the orphaned descendants. Base of the API is modelled after the same feature from the DragonFlyBSD. Requested by: bapt Reviewed by: jilles (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-12-15 12:01:42 +00:00
Konstantin Belousov	5c7bebf961	The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-26 14:10:00 +00:00
John Baldwin	180e57e5c7	Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0). Differential Revision: https://reviews.freebsd.org/D1193 Reviewed by: kib MFC after: 2 weeks	2014-11-21 20:53:17 +00:00
Konstantin Belousov	6e646651d3	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 18:01:51 +00:00
Dmitry Chagin	c28d9d0f9f	Regen for r274462.	2014-11-13 05:28:06 +00:00
Dmitry Chagin	186d9c3473	Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month	2014-11-13 05:26:14 +00:00
Gleb Smirnoff	efe28398f5	Fix build.	2014-11-11 22:08:18 +00:00
Gleb Smirnoff	0e87b36eaa	Remove SF_KQUEUE code. This code was developed at Netflix, but was not ever used. It didn't go into stable/10, neither was documented. It might be useful, but we collectively decided to remove it, rather leave it abandoned and unmaintained. It is removed in one single commit, so restoring it should be easy, if anyone wants to reopen this idea. Sponsored by: Netflix	2014-11-11 20:32:46 +00:00
Warner Losh	2736ae9f8c	These don't belong in the modules directory.	2014-11-06 16:52:51 +00:00
Konstantin Belousov	0a2c94b86e	Replace some calls to fuword() by fueword() with proper error checking. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:28:20 +00:00
Mateusz Guzik	e015b1ab0a	Avoid dynamic syscall overhead for statically compiled modules. The kernel tracks syscall users so that modules can safely unregister them. But if the module is not unloadable or was compiled into the kernel, there is no need to do this. Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC during kernel build and 0 otherwise. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-26 19:42:44 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Adrian Chadd	e77f9fed15	Update the ULE scheduler + thread and kinfo structs to use int for cpuid rather than u_char. To try and play nice with the ABI, the u_char CPU ID values are clamped at 254. The new fields now contain the full CPU ID, or -1 for no cpu. Differential Revision: D955 Reviewed by: jhb, kib Sponsored by: Norse Corp, Inc.	2014-10-18 19:36:11 +00:00
Marcel Moolenaar	2e7634503e	Regenerate after r272823: Move the SCTP syscalls to netinet with the rest of the SCTP code. Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:19:35 +00:00
Marcel Moolenaar	80b47aefa1	Move the SCTP syscalls to netinet with the rest of the SCTP code. The syscalls themselves are tightly coupled with the network stack and therefore should not be in the generic socket code. The following four syscalls have been marked as NOSTD so they can be dynamically registered in sctp_syscalls_init() function: sys_sctp_peeloff sys_sctp_generic_sendmsg sys_sctp_generic_sendmsg_iov sys_sctp_generic_recvmsg The syscalls are also set up to be dynamically registered when COMPAT32 option is configured. As a side effect of moving the SCTP syscalls, getsock_cap needs to be made available outside of the uipc_syscalls.c source file. A proper prototype has been added to the sys/socketvar.h header file. API tests from the SCTP reference implementation have been run to ensure compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout) Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:16:52 +00:00
Konstantin Belousov	f69261f2f9	Fix fcntl(2) compat32 after r270691. The copyin and copyout of the struct flock are done in the sys_fcntl(), which mean that compat32 used direct access to userland pointers. Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which performs neccessary userland memory accesses, and use it from both native and compat32 fcntl syscalls. Reported by: jhibbits Sponsored by: The FreeBSD Foundation MFC after: 3 days	2014-09-25 21:07:19 +00:00
Alexander Motin	6a9bcacfcf	Remake Linux' SOUND_MIXER_INFO IOCTL as a wrapper around new FreeBSD's one. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 3 days	2014-09-24 08:18:11 +00:00
Sean Bruno	d143d69857	Bump minimum linux compat version to support Centos6 ports updates for linux. Update linux compat minimum revision to match linux-c6 now in ports. This is a candidate for 10.1 R as it matches the current state of supported linux compat packages in the ports tree. PR: 187786 Reviewed by: xmj MFC after: 2 days Relnotes: yes	2014-09-22 17:26:07 +00:00
Gleb Smirnoff	1411ec550f	Fix build on 32-bit machines. Pointy hat to: glebius	2014-09-18 20:29:17 +00:00
Gleb Smirnoff	1e99b3f4e3	- Use if_get_counter() to fetch ifnet statistics. - Report IFCOUNTER_OQDROPS to linprocfs. Wasn't there before. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-18 16:44:28 +00:00
Bjoern A. Zeeb	0a041f3b47	Implement most of timer_{create,settime,gettime,getoverrun,delete} for amd64/linux32. Fix the entirely bogus (untested) version from r161310 for i386/linux using the same shared code in compat/linux. It is unclear to me if we could support more clock mappings but the current set allows me to successfully run commercial 32bit linux software under linuxolator on amd64. Reviewed by: jhb Differential Revision: D784 MFC after: 3 days Sponsored by: DARPA, AFRL	2014-09-18 08:36:45 +00:00
Mateusz Guzik	6662ce5aab	Add missing proctree locking to fill_kinfo_proc consumers. This fixes r270444. Pointy hat: mjg Reported by: many MFC after: 1 week	2014-08-30 03:10:55 +00:00
Mateusz Guzik	8b04bbef31	Return real parent pid in kinfo (used by e.g. ps) Add a separate field which exports tracer pid and add a new keyword ("tracer") for ps to display it. This is a follow up to r270444. Reviewed by: kib MFC after: 1 week Relnotes: yes	2014-08-28 08:41:11 +00:00
Konstantin Belousov	5aec07c73d	Regen.	2014-08-27 01:02:19 +00:00
Konstantin Belousov	8fbeebf590	Fix handling of the third argument for fcntl(2). The native syscall uses long for arg, which needs translation. Discussed with and tested by: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-27 01:02:02 +00:00
Gleb Smirnoff	15c28f87b8	All mbuf external free functions never fail, so let them be void. Sponsored by: Nginx, Inc.	2014-07-11 13:58:48 +00:00
Marcel Moolenaar	e7d939bda2	Remove ia64. This includes: o All directories named ia64 o All files named ia64 o All ia64-specific code guarded by __ia64__ o All ia64-specific makefile logic o Mention of ia64 in comments and documentation This excludes: o Everything under contrib/ o Everything under crypto/ o sys/xen/interface o sys/sys/elf_common.h Discussed at: BSDcan	2014-07-07 00:27:09 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Alexander Motin	94fe9f959c	- Add support for SG_GET_SG_TABLESIZE IOCTL to report that we don't support scatter/gather lists. - Return error for still unsupported SG 3.x API read/write calls. MFC after: 1 month	2014-06-04 12:05:47 +00:00
Alexander Motin	fcaf473cfc	Overhaul CAM SG driver IOCTL interfaces. Make it really work for native FreeBSD programs. Before this it was broken for years due to different number of pointer dereferences in Linux and FreeBSD IOCTL paths, permanently returning errors to FreeBSD programs. This change breaks the driver FreeBSD IOCTL ABI, making it more strict, but since it was not working any way -- who bother. Add shims for 32-bit programs on 64-bit host, translating the argument of the SG_IO IOCTL for both FreeBSD and Linux ABIs. With this change I was able to run 32-bit Linux sg3_utils tools and simple 32 and 64-bit FreeBSD test tools on both 32 and 64-bit FreeBSD systems. MFC after: 1 month	2014-06-02 19:53:53 +00:00
Dmitry Chagin	fb6bf8bba9	Glibc was switched to the FUTEX_WAIT_BITSET op and CLOCK_REALTIME flag has been added instead of FUTEX_WAIT to replace the FUTEX_WAIT logic which needs to do gettimeofday() calls before the futex syscall to convert the absolute timeout to a relative timeout. Before this the CLOCK_MONOTONIC used by the FUTEX_WAIT_BITSET op. When the FUTEX_CLOCK_REALTIME is specified the timeout is an absolute time, not a relative time. Rework futex_wait to handle this. On the side fix the futex leak in error case and remove useless parentheses. Properly calculate the timeout for the CLOCK_MONOTONIC case. MFC after: 3 days	2014-05-31 14:58:53 +00:00
Dmitry Chagin	32fd44657c	In r218101 I have not changed properly the futex syscall definition. Some Linux futex ops atomically verifies that the futex address uaddr (uval) contains the value val. Comparing signed uval and unsigned val may lead to an unexpected result, mostly to a deadlock. So copyin uaddr to an unsigned int to compare the parameters correctly. While here change ktr records to print parameters in more readable format. Tested by eadler@ MFC after: 3 days	2014-05-28 05:57:35 +00:00
Marcel Moolenaar	0fa211be96	In freebsd32_sendmsg(), replace the call to sockargs() followed by a call to freebsd32_convert_msg_in() with freebsd32_copyin_control() to readin and convert in a single step. This makes it simpler to put all the control messages in a single mbuf or mbuf cluster as per the limitations imposed upon us by ip6_setpktopts(). The logic is as follows: 1. Go over the array of control messages to determine overall size and include extra padding for proper alignment as we go. 2. Get a mbuf or mbuf cluster as needed or fail if the overall (adjusted) size is larger than a cluster. 3. Go over the array of control messages again, but now copy them into kernel space and into aligned offsets. 4. Update the length of the control message to take padding between the header and the data into account (but not for padding added between one control message and the next). Obtained from: Juniper Networks, Inc. MFC after: 1 week	2014-04-05 18:56:01 +00:00
Warner Losh	8a27a339b6	Remove instances of variables that were set, but never used. gcc 4.9 warns about these by default.	2014-03-30 23:43:36 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Konstantin Belousov	88b124cede	Make the array pointed to by AT_PAGESIZES auxv properly aligned. Also, remove the expression which calculated the location of the strings for a new image and grown over the time to be non-comprehensible. Instead, calculate the offsets by steps, which also makes fixing the alignments much cleaner. Reported and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-19 12:35:04 +00:00
Attilio Rao	4f11a684ff	Regen per r263318. Sponsored by: EMC / Isilon storage division	2014-03-18 21:34:11 +00:00
Attilio Rao	ce42e79310	Remove dead code from umtx support: - Retire long time unused (basically always unused) sys__umtx_lock() and sys__umtx_unlock() syscalls - struct umtx and their supporting definitions - UMUTEX_ERROR_CHECK flag - Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall __FreeBSD_version is not bumped yet because it is expected that further breakages to the umtx interface will follow up in the next days. However there will be a final bump when necessary. Sponsored by: EMC / Isilon storage division Reviewed by: jhb	2014-03-18 21:32:03 +00:00
Ed Maste	0fcefb433d	Update NetBSD Foundation copyrights to 2-clause BSD The NetBSD Foundation states "Third parties are encouraged to change the license on any files which have a 4-clause license contributed to the NetBSD Foundation to a 2-clause license." This change removes clauses 3 and 4 from copyright / license blocks that list The NetBSD Foundation as the only copyright holder. Sponsored by: The FreeBSD Foundation	2014-03-18 01:40:25 +00:00
Robert Watson	4a14441044	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
John-Mark Gurney	6f2b769cac	change td_retval into a union w/ off_t, with defines to mask the change... This eliminates a cast, and also forces td_retval (often 2 32-bit registers) to be aligned so that off_t's can be stored there on arches with strict alignment requirements like armeb (AVILA)... On i386, this doesn't change alignment, and on amd64 it doesn't either, as register_t is already 64bits... This will also prevent future breakage due to people adding additional fields to the struct... This gets AVILA booting a bit farther... Reviewed by: bde	2014-03-16 00:53:40 +00:00
Gleb Smirnoff	b245f96c44	Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit interface, in the r241616 a crutch was provided. It didn't work well, and finally we decided that it is time to break ABI and simply make if_baudrate a 64-bit value. Meanwhile, the entire struct if_data was reviewed. o Remove the if_baudrate_pf crutch. o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second. This also removes quite a lot of COMPAT_FREEBSD32 code. o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes. o Give 32 bits to ifi_mtu and ifi_metric. o Give 64 bits to the rest of fields, since they are counters. __FreeBSD_version bumped. Discussed with: emax Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-13 03:42:24 +00:00
Eitan Adler	9ace80105a	linprocfs: add support for /sys/kernel/random/uuid PR: kern/186187 Submitted by: Fernando <fernando.apesteguia@gmail.com> MFC After: 2 weeks	2014-02-27 00:43:10 +00:00
Konstantin Belousov	49d39308ba	The posix_madvise(3) and posix_fadvise(2) should return error on failure, same as posix_fallocate(2). Noted by: Bob Bishop <rb@gid.co.uk> Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-01-30 18:04:39 +00:00
Konstantin Belousov	2852de0489	The posix_fallocate(2) syscall should return error number on error, without modifying errno. Reported and tested by: Gennady Proskurin <gpr@mail.ru> Reviewed by: mdf PR: standards/186028 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-01-23 17:24:26 +00:00
Adrian Chadd	0cfea1c8fc	Implement a kqueue notification path for sendfile. This fires off a kqueue note (of type sendfile) to the configured kqfd when the sendfile transaction has completed and the relevant memory backing the transaction is no longer in use by this transaction. This is analogous to SF_SYNC waiting for the mbufs to complete - except now you don't have to wait. Both SF_SYNC and SF_KQUEUE should work together, even if it doesn't necessarily make any practical sense. This is designed for use by applications which use backing cache/store files (eg Varnish) or POSIX shared memory (not sure anything is using it yet!) to know when a region of memory is free for re-use. Note it doesn't mark the region as free overall - only free from this transaction. The application developer still needs to track which ranges are in the process of being recycled and wait until all pending transactions are completed. TODO: * documentation, as always Sponsored by: Netflix, Inc.	2014-01-17 05:26:55 +00:00
Adrian Chadd	a43caef195	Refactor out the common sendfile code from the do_sendfile() and the compat32 sendfile syscall. Sponsored by: Netflix, Inc.	2014-01-09 00:11:14 +00:00
Adrian Chadd	79750e3b36	Migrate the sendfile_sync structure into a public(ish) API in preparation for extending and reusing it. The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper, used to indicate that the backing store for a group of mbufs has completed. It's only being used by sendfile for now and it's only implementing a sleep/wakeup rendezvous. However, there are other potential signaling paths (kqueue) and other potential uses (socket zero-copy write) where the same mechanism would also be useful. So, with that in mind: * extract the sendfile_sync code out into sf_sync_() methods teach the sf_sync_alloc method about the current config flag - it will eventually know about kqueue. * move the sendfile_sync code out of do_sendfile() - the only thing it now knows about is the sfs pointer. The guts of the sync rendezvous (setup, rendezvous/wait, free) is now done in the syscall wrapper. * .. and teach the 32-bit compat sendfile call the same. This should be a no-op. It's primarily preparation work for teaching the sendfile_sync about kqueue notification. Tested: * Peter Holm's sendfile stress / regression scripts Sponsored by: Netflix, Inc.	2013-12-01 03:53:21 +00:00
Peter Wemm	b5019bc45b	jail_v0.ip_number was always in host byte order. This was handled in one of the many layers of indirection and shims through stable/7 in jail_handle_ips(). When it was cleaned up and unified through kern_jail() for 8.x, the byte order swap was lost. This only matters for ancient binaries that call jail(2) themselves internally.	2013-11-28 19:40:33 +00:00
Konstantin Belousov	80c3af4e80	Add an kinfo sysctl to retrieve signal trampoline location for the given process. Note that the correctness of the trampoline length returned for ABIs which do not use shared page depends on the correctness of the struct sysvec sv_szsigcodebase member, which will be fixed on as-need basis. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-26 19:47:09 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Adrian Chadd	7689abaedc	Fix the compat32 sendfile() to be in line with my recent changes. Reminded by: kib	2013-11-26 08:32:37 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Gleb Smirnoff	37968927c8	Fix build. Pointy hat to: glebius	2013-11-05 19:17:19 +00:00
Gleb Smirnoff	af50ea380f	Axe IFF_SMART. Fortunately this layering violating flag was never used, it was just declared.	2013-11-05 12:52:56 +00:00
Gleb Smirnoff	5fb009bda7	Drop support for historic ioctls and also undefine them, so that code that checks their presence via ifdef, won't use them. Bump __FreeBSD_version as safety measure.	2013-11-05 10:29:47 +00:00
Gleb Smirnoff	66e01d73cd	- Provide necessary includes. - Remove unnecessary includes. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-29 11:17:49 +00:00
Gleb Smirnoff	c3322cb91c	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
Gleb Smirnoff	eedc7fd9e8	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Konstantin Belousov	3a2092bad0	Add padding to match the compat32 struct stat32 definition to the real struct stat on 32bit architectures. Debugged and tested by: bsam Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-10-04 22:05:23 +00:00
Mark Johnston	92c6196caa	Fix some typos that were causing probe argument types to show up as unknown. Reviewed by: rwatson (mac provider) Approved by: re (glebius) MFC after: 1 week	2013-10-01 15:40:27 +00:00
Mark Johnston	8d305ba0dc	Regenerate syscall argument strings after r255777. Approved by: re (gjb) MFC after: 1 week	2013-09-21 23:06:36 +00:00
John Baldwin	a566e8e3c5	Regen. Approved by: re (delphij)	2013-09-19 18:56:00 +00:00
John Baldwin	55648840de	Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month	2013-09-19 18:53:42 +00:00
Roman Divacky	b12698e1a1	Revert r255672, it has some serious flaws, leaking file references etc. Approved by: re (delphij)	2013-09-18 18:48:33 +00:00
Roman Divacky	253c75c0de	Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file. Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich. Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>	2013-09-18 17:56:04 +00:00
Jilles Tjoelker	9fdb497cd0	Regenerate for freebsd32_cap_enter(). Approved by: re (hrs)	2013-09-17 20:49:05 +00:00
Jilles Tjoelker	529411c369	Disallow cap_enter() in freebsd32 compatibility mode. The freebsd32 compatibility mode (for running 32-bit binaries on 64-bit kernels) does not currently allow any system calls in capability mode, but still permits cap_enter(). As a result, 32-bit binaries on 64-bit kernels that use capability mode do not work (they crash after being disallowed to call sys_exit()). Affected binaries include dhclient and uniq. The latter's crashes cause obscure build failures. This commit makes freebsd32 cap_enter() fail with [ENOSYS], as if capability mode was not compiled in. Applications deal with this by doing their work without capability mode. This commit does not fix the uncommon situation where a 64-bit process enters capability mode and then executes a 32-bit binary using fexecve(). This commit should be reverted when allowing the necessary freebsd32 system calls in capability mode. Reviewed by: pjd Approved by: re (hrs)	2013-09-17 20:48:19 +00:00
John Baldwin	edb572a38c	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)	2013-09-09 18:11:59 +00:00

... 3 4 5 6 7 ...

2481 Commits