freebsd-nq

Author	SHA1	Message	Date
Mariusz Zaborski	85b0f9de11	capsicum: propagate rights on accept(2) Descriptor returned by accept(2) should inherits capabilities rights from the listening socket. PR: 201052 Reviewed by: emaste, jonathan Discussed with: many Differential Revision: https://reviews.freebsd.org/D7724	2016-09-22 09:58:46 +00:00
Ed Maste	df4336ddfa	Catch up to sys/capability.h rename to sys/capsicum.h in r263232 MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-09-19 18:44:43 +00:00
Dmitry Chagin	ede2869c4c	Implement BLKSSZGET ioctl for the Linuxulator. PR: 212700 Submitted by: Erik Cederstrand Reported by: Erik Cederstrand MFC after: 1 week	2016-09-17 08:10:01 +00:00
Ed Schouten	93d9ebd82e	Eliminate use of sys_fsync() and sys_fdatasync(). Make the kern_fsync() function public, so that it can be used by other parts of the kernel. Fix up existing consumers to make use of it. Requested by: kib	2016-08-15 20:11:52 +00:00
Dmitry Chagin	97d06da692	Fix a copy/paste bug introduced during X86_64 Linuxulator work. FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation use READ_IMPLIES_EXEC flag, introduced in r302515. While here move common part of mmap() and mprotect() code to the files in compat/linux to reduce code dupcliation between Linuxulator's. Reported by: Johannes Jost Meixner, Shawn Webb MFC after: 1 week XMFC with: r302515, r302516	2016-07-10 08:22:04 +00:00
Dmitry Chagin	23e8912c60	Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag. In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap(). Linux/i386 set this flag automatically if the binary requires executable stack. READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.	2016-07-10 08:15:50 +00:00
Dmitry Chagin	3a49978f45	Fix a bug introduced in r283433. [1] Remove unneeded sockaddr conversion before kern_recvit() call as the from argument is used to record result (the source address of the received message) only. [2] In Linux the type of msg_namelen member of struct msghdr is signed but native msg_namelen has a unsigned type (socklen_t). So use the proper storage to fetch fromlen from userspace and than check the user supplied value and return EINVAL if it is less than 0 as a Linux do. Reported by: Thomas Mueller <tmueller at sysgo dot com> [1] Reviewed by: kib@ Approved by: re (gjb, kib) MFC after: 3 days	2016-06-26 16:59:59 +00:00
Konstantin Belousov	5c2cf81845	Update comments for the MD functions managing contexts for new threads, to make it less confusing and using modern kernel terms. Rename the functions to reflect current use of the functions, instead of the historic KSE conventions: cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads) cpu_set_upcall -> cpu_copy_thread (for forks) cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation) Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 12:05:44 +00:00
Gleb Smirnoff	34e05ebe72	Fix kernel stack disclosures in the Linux and 4.3BSD compat layers. Submitted by: CTurt Security: SA-16:20 Security: SA-16:21	2016-05-31 16:56:30 +00:00
Dmitry Chagin	ab610366b5	Don't leak fp in case where fo_ioctl() returns an error. Reported by: C Turt <ecturt@gmail.com> MFC after: 1 week	2016-05-24 05:29:41 +00:00
Dmitry Chagin	df964aa45d	Convert proto family in both directions. The linux and native values for local and inet are identical, but for inet6 values differ. PR: 155040 Reported by: Simon Walton MFC after: 2 week	2016-05-22 19:08:29 +00:00
Dmitry Chagin	d56e689e7d	Add a missing errno translation for SO_ERROR optname. PR: 135458 Reported by: Stefan Schmidt @ stadtbuch.de MFC after: 1 week	2016-05-22 12:49:08 +00:00
Dmitry Chagin	f8d72f5312	For future use move futex timeout code to the separate function and switch to the high resolution sbintime_t. MFC after: 1 week	2016-05-22 12:37:40 +00:00
Dmitry Chagin	a03566dd95	Due to lack the priority propagation feature replace sx by mutex. WIth this commit NPTL tests are ends in 1 minute faster. MFC after: 1 week	2016-05-22 12:35:50 +00:00
Dmitry Chagin	ea53658ed7	Add my copyright as I rewrote most of the futex code. Minor style(9) cleanup while here. MFC after: 1 week	2016-05-22 12:28:55 +00:00
Dmitry Chagin	56c4f83d2e	Minor style(9) cleanup, no functional changes. MFC after: 1 week	2016-05-22 12:26:03 +00:00
Konstantin Belousov	2a339d9e3d	Add implementation of robust mutexes, hopefully close enough to the intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013. A robust mutex is guaranteed to be cleared by the system upon either thread or process owner termination while the mutex is held. The next mutex locker is then notified about inconsistent mutex state and can execute (or abandon) corrective actions. The patch mostly consists of small changes here and there, adding neccessary checks for the inconsistent and abandoned conditions into existing paths. Additionally, the thread exit handler was extended to iterate over the userspace-maintained list of owned robust mutexes, unlocking and marking as terminated each of them. The list of owned robust mutexes cannot be maintained atomically synchronous with the mutex lock state (it is possible in kernel, but is too expensive). Instead, for the duration of lock or unlock operation, the current mutex is remembered in a special slot that is also checked by the kernel at thread termination. Kernel must be aware about the per-thread location of the heads of robust mutex lists and the current active mutex slot. When a thread touches a robust mutex for the first time, a new umtx op syscall is issued which informs about location of lists heads. The umtx sleep queues for PP and PI mutexes are split between non-robust and robust. Somewhat unrelated changes in the patch: 1. Style. 2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared pi mutexes. 3. Removal of the userspace struct pthread_mutex m_owner field. 4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls the lifetime of the shared mutex associated with a vnode' page. Reviewed by: jilles (previous version, supposedly the objection was fixed) Discussed with: brooks, Martin Simmons <martin@lispworks.com> (some aspects) Tested by: pho Sponsored by: The FreeBSD Foundation	2016-05-17 09:56:22 +00:00
Pedro F. Giffuni	1ce4275dd2	sys/compat/linux*: spelling fixes. Mostly on comments but there are some user-visible messages as well. MFC after: 2 weeks	2016-04-30 00:53:10 +00:00
Conrad Meyer	aa90aec270	osd(9): Change array pointer to array pointer type from void* This is a minor follow-up to r297422, prompted by a Coverity warning. (It's not a real defect, just a code smell.) OSD slot array reservations are an array of pointers (void *) but were cast to void and back unnecessarily. Keep the correct type from reservation to use. osd.9 is updated to match, along with a few trivial igor fixes. Reported by: Coverity CID: 1353811 Sponsored by: EMC / Isilon Storage Division	2016-04-26 19:57:35 +00:00
Jamie Gritton	d56cf22d22	linux_map_osrel doesn't need to be checked in linux_prison_set, since it already was in linux_prison_check.	2016-04-25 06:08:45 +00:00
Pedro F. Giffuni	b66bb393f2	Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.	2016-04-22 16:57:42 +00:00
Pedro F. Giffuni	02abd40029	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
Pedro F. Giffuni	500ed14d6e	compat/linux: for pointers replace 0 with NULL. plvc is a pointer, no functional change. Found with devel/coccinelle.	2016-04-15 16:21:13 +00:00
Dmitry Chagin	5743aa47f5	More complete implementation of /proc/self/limits. Fix the way the code accesses process limits struct - pointed out by mjg@. PR: 207386 Reviewed by: no objection form des@ MFC after: 3 weeks	2016-04-10 07:11:29 +00:00
Pedro F. Giffuni	ae26eab161	Fix indentation oops.	2016-04-03 14:40:54 +00:00
Dmitry Chagin	8bc21bafba	Move Linux specific times tests up to guarantee the values are defined. CID: 1305178 Submitted by: pfg@ MFC after: 1 week	2016-04-03 06:33:16 +00:00
Jamie Gritton	7ab25e3d18	Use osd_reserve / osd_jail_set_reserved, which is known to succeed. Also don't work around nonexistent osd_register failure.	2016-03-30 17:05:04 +00:00
Dmitry Chagin	7c5982000d	Revert r297310 as the SOL_XXX are equal to the IPPROTO_XX except SOL_SOCKET. Pointed out by: ae@	2016-03-27 10:09:10 +00:00
Dmitry Chagin	c826fcfe22	iConvert Linux SOL_IPV6 level. MFC after: 1 week	2016-03-27 08:12:01 +00:00
Dmitry Chagin	e667ee63f6	Whitespaces and style(9) fix. No functional changes. MFC after: 1 week	2016-03-27 08:10:20 +00:00
Dmitry Chagin	09806d8e3e	When write(2) on eventfd object fails with the error EAGAIN do not return the number of bytes written. MFC after: 1 week	2016-03-26 19:16:53 +00:00
Dmitry Chagin	2bb14e7541	Implement O_NONBLOCK flag via fcntl(F_SETFL) for eventfd object. MFC after: 1 week	2016-03-26 19:15:23 +00:00
Dmitry Chagin	2ad0231309	Check bsd_to_linux_statfs() return value. Forgotten in r297070. MFC after: 1 week	2016-03-20 19:06:21 +00:00
Dmitry Chagin	525c9796c3	Return EOVERFLOW in case when actual statfs values are large enough and not fit into 32 bit fileds of a Linux struct statfs. PR: 181012 MFC after: 1 week	2016-03-20 18:31:30 +00:00
Dmitry Chagin	7958a34cb5	Whitespaces, style(9) fixes. No functional changes. MFC after: 1 week	2016-03-20 14:06:27 +00:00
Dmitry Chagin	99546279d6	Implement fstatfs64 system call. PR: 181012 Submitted by: John Wehle MFC after: 1 week	2016-03-20 13:21:20 +00:00
Dmitry Chagin	4525bb829f	Rework r296543: 1. Limit secs to INT32_MAX / 2 to avoid errors from kern_setitimer(). Assert that kern_setitimer() returns 0. Remove bogus cast of secs. Fix style(9) issues. 2. Increment the return value if the remaining tv_usec value more than 500000 as a Linux does. Pointed out by: [1] Bruce Evans MFC after: 1 week	2016-03-20 11:40:52 +00:00
Andrey V. Elsukov	86a9058b01	Add support for IPPROTO_IPV6 socket layer for getsockopt/setsockopt calls. Also add mapping for several options from RFC 3493 and 3542. Reviewed by: dchagin Tested by: Joe Love <joe at getsomwhere dot net> MFC after: 2 weeks	2016-03-09 09:12:40 +00:00
Dmitry Chagin	a87488d1e4	Better english. Submitted by: Kevin P. Neal MFC after: 1 week	2016-03-08 19:40:01 +00:00
Dmitry Chagin	649ca5e9dc	Put a commit message from r296502 about Linux alarm() system call behaviour to the source. Suggested by: emaste@ MFC after: 1 week	2016-03-08 19:20:57 +00:00
Dmitry Chagin	91f514e413	Does not leak fp. While here remove bogus cast of fp->f_data. MFC after: 1 week	2016-03-08 15:55:43 +00:00
Dmitry Chagin	fc4b98fb88	Linux accept() system call return EOPNOTSUPP errno instead of EINVAL for UDP sockets. MFC after: 1 week	2016-03-08 15:15:34 +00:00
Dmitry Chagin	15c3b371e2	According to POSIX and Linux implementation the alarm() system call is always successfull. So, ignore any errors and return 0 as a Linux do. XXX. Unlike POSIX, Linux in case when the invalid seconds value specified always return 0, so in that case Linux does not return proper remining time. MFC after: 1 week	2016-03-08 15:12:49 +00:00
Dmitry Chagin	9f4e66afb9	Link the newly created process to the corresponding parent as if CLONE_PARENT is set, then the parent of the new process will be the same as that of the calling process. MFC after: 1 week	2016-03-08 15:08:22 +00:00
Svatopluk Kraus	35a0bc1260	As <machine/vmparam.h> is included from <vm/vm_param.h>, there is no need to include it explicitly when <vm/vm_param.h> is already included. Suggested by: alc Reviewed by: alc Differential Revision: https://reviews.freebsd.org/D5379	2016-02-22 09:08:04 +00:00
Mateusz Guzik	33fd9b9a2b	fork: pass arguments to fork1 in a dedicated structure Suggested by: kib	2016-02-04 04:22:18 +00:00
Dmitry Chagin	67968b35a0	Prevent double free of control in common sendmsg path as sosend already freeing it.	2016-01-17 19:28:13 +00:00
Gleb Smirnoff	c8358c6e0d	Call crextend() before copying old credentials to the new credentials and replace crcopysafe by crcopy as crcopysafe is is not intended to be safe in a threaded environment, it drops PROC_LOCK() in while() that can lead to unexpected results, such as overwrite kernel memory. In my POV crcopysafe() needs special attention. For now I do not see any problems with this function, but who knows. Submitted by: dchagin Found by: trinity Security: SA-16:04.linux	2016-01-14 10:16:25 +00:00
Gleb Smirnoff	037f750877	Change linux get_robust_list system call to match actual linux one. The set_robust_list system call request the kernel to record the head of the list of robust futexes owned by the calling thread. The head argument is the list head to record. The get_robust_list system call should return the head of the robust list of the thread whose thread id is specified in pid argument. The list head should be stored in the location pointed to by head argument. In contrast, our implemenattion of get_robust_list system call copies the known portion of memory pointed by recorded in set_robust_list system call pointer to the head of the robust list to the location pointed by head argument. So, it is possible for a local attacker to read portions of kernel memory, which may result in a privilege escalation. Submitted by: mjg Security: SA-16:03.linux	2016-01-14 10:13:58 +00:00
Dmitry Chagin	6437b8e7d9	Unlock process lock when return error from getrobustlist call and add an forgotten dtrace probe when return the same error. MFC after: 3 days XMFC with: r292743	2016-01-10 07:36:43 +00:00
Dmitry Chagin	bfb5568a3c	Return EINVAL in case of incorrect sigev_signo value specified instead of panicing.	2015-12-26 09:09:49 +00:00
Dmitry Chagin	6e5549717a	Do not allow access to emuldata for non Linux processes. Pointed out by: mjg@ Security: https://admbugs.freebsd.org/show_bug.cgi?id=679	2015-12-26 09:04:47 +00:00
Mark Johnston	3616095801	Fix style issues around existing SDT probes. - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week	2015-12-16 23:39:27 +00:00
Konstantin Belousov	cd0a26c53f	Fix build for the KTR-enabled kernels. Sponsored by: The FreeBSD Foundation	2015-10-23 11:41:55 +00:00
Bryan Drewery	a730673058	Remove redundant RFFPWAIT/vfork(2) handling in Linux fork(2) and clone(2) wrappers. r161611 added some of the code from sys_vfork() directly into the Linux module wrappers since they use RFSTOPPED. In r232240, the RFFPWAIT handling was moved to syscallret(), thus this code in the Linux module is no longer needed as it will be called later. This also allows the Linux wrappers to benefit from the fix in r275616 for threads not getting suspended if their vforked child is stopped while they wait on them. Reviewed by: jhb, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3828	2015-10-07 19:10:38 +00:00
Andriy Gapon	2f2f522b5d	save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days	2015-09-28 12:14:16 +00:00
Edward Tomasz Napierala	089d32934a	Fixes a panic triggered by threaded Linux applications when running with RACCT/RCTL enabled. Reviewed by: ngie@, ed@ Tested by: Larry Rosenman <ler@lerctr.org> MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3470	2015-09-02 14:04:13 +00:00
Ed Schouten	a2034cc98a	Allow the creation of kqueues with a restricted set of Capsicum rights. On CloudABI we want to create file descriptors with just the minimal set of Capsicum rights in place. The reason for this is that it makes it easier to obtain uniform behaviour across different operating systems. By explicitly whitelisting the operations, we can return consistent error codes, but also prevent applications from depending OS-specific behaviour. Extend kern_kqueue() to take an additional struct filecaps that is passed on to falloc_caps(). Update the existing consumers to pass in NULL. Differential Revision: https://reviews.freebsd.org/D3259	2015-08-05 07:36:50 +00:00
Ed Schouten	367a13f905	Limit rights on process descriptors. On CloudABI, the rights bits returned by cap_rights_get() match up with the operations that you can actually perform on the file descriptor. Limiting the rights is good, because it makes it easier to get uniform behaviour across different operating systems. If process descriptors on FreeBSD would suddenly gain support for any new file operation, this wouldn't become exposed to CloudABI processes without first extending the rights. Extend fork1() to gain a 'struct filecaps' argument that allows you to construct process descriptors with custom rights. Use this in cloudabi_sys_proc_fork() to limit the rights to just fstat() and pdwait(). Obtained from: https://github.com/NuxiNL/freebsd	2015-07-31 10:21:58 +00:00
Ed Schouten	8328babdd0	Make pipes in CloudABI work. Summary: Pipes in CloudABI are unidirectional. The reason for this is that CloudABI attempts to provide a uniform runtime environment across different flavours of UNIX. Instead of implementing a custom pipe that is unidirectional, we can simply reuse Capsicum permission bits to support this. This is nice, because CloudABI already attempts to restrict permission bits to correspond with the operations that apply to a certain file descriptor. Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes a pair of filecaps. These filecaps are passed to the newly introduced falloc_caps() function that creates the descriptors with rights in place. Test Plan: CloudABI pipes seem to be created with proper rights in place: https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44 Reviewers: jilles, mjg Reviewed By: mjg Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3236	2015-07-29 17:18:27 +00:00
Konstantin Belousov	b4490c6e93	The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation	2015-07-18 09:02:50 +00:00
Mateusz Guzik	f131759f54	fd: make 'rights' a manadatory argument to fget* functions	2015-07-05 19:05:16 +00:00
Dmitry Chagin	3c91646b46	Add EPOLLRDHUP support. Tested by: abi at abinet dot ru	2015-06-20 05:40:35 +00:00
Mateusz Guzik	4da8456f0a	Replace struct filedesc argument in getvnode with struct thread This is is a step towards removal of spurious arguments.	2015-06-16 13:09:18 +00:00
Mateusz Guzik	6871c7c3f1	linux: make sure to grab all cow structs when creating a thread This is a fixup for r284214. Reported and tested by: Ivan Klymenko <fidaj ukr.net>	2015-06-10 15:34:43 +00:00
Mateusz Guzik	f6f6d24062	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.	2015-06-10 10:48:12 +00:00
Jung-uk Kim	1a01bdf906	Properly initialize flags for accept4(2) not to return spurious EINVAL. Note this fixes a Linuxulator regression introduced in r283490. PR: 200662	2015-06-08 20:03:15 +00:00
Dmitry Chagin	32ba368ba9	Finish r283544. In exec case properly detach threads from user space before suicide.	2015-06-06 06:12:14 +00:00
Dmitry Chagin	d707582f83	When I merged the lemul branch I missied kib@'s r282708 commit. This is not the final fix as I need properly cleanup thread resources before other threads suicide. Tested by: Ruslan Makhmatkhanov	2015-05-25 20:44:46 +00:00
Dmitry Chagin	5c2748d5e7	Linux nanosleep() and clock_nanosleep() system calls always writes the remaining time into the structure pointed to by rmtp unless rmtp is NULL. The value of *rmtp can then be used to call nanosleep() again and complete the specified pause if the previous call was interrupted. Note. clock_nanosleep() with an absolute time value does not write the remaining time. While here fix whitespaces and typo in SDT_PROBE.	2015-05-24 18:14:38 +00:00
Dmitry Chagin	bbf392d5ef	Convert SCM_TIMESTAMP in recvmsg().	2015-05-24 18:13:21 +00:00
Dmitry Chagin	5989b75bdb	The latest cp tool is trying to use the btrfs clone operation that is implemented via ioctl interface. First of all return ENOTSUP for this operation as a cp fallback to usual method in that case. Secondly, do not print out the message about unimplemented operation.	2015-05-24 18:12:04 +00:00
Dmitry Chagin	4f65e9cff4	Fix an mbuf(9) leak in sendmsg() under failure condition and remove unneeded check for failed M_WAITOK allocation. Found by: Brainy Code Scanner Reported by: Maxime Villard	2015-05-24 18:10:07 +00:00
Dmitry Chagin	9802eb9ebc	Implement Linux specific syncfs() system call.	2015-05-24 18:08:01 +00:00
Dmitry Chagin	d9cbe8f0ef	Properly check tv_nsec value. The tv_nsec field can also be one of the special value UTIME_NOW or UTIME_OMIT.	2015-05-24 18:06:46 +00:00
Dmitry Chagin	4cf10e2934	Since FreeBSD supports SOCK_CLOEXEC & SOCK_NONBLOCK options remove its emulation via fcntl call from Linuxulator.	2015-05-24 18:06:12 +00:00
Dmitry Chagin	e1ff74c0f7	Implement recvmmsg() and sendmmsg() system calls.	2015-05-24 18:04:04 +00:00
Dmitry Chagin	b7aaa9fdb0	Reduce duplication between MD Linux code by moving msg related struct definitions out into the compat/linux/linux_socket.h	2015-05-24 18:03:14 +00:00
Dmitry Chagin	6e4c8004dc	Implement epoll_pwait() system call.	2015-05-24 18:00:14 +00:00
Dmitry Chagin	b7c4ebdb56	Convert signal number to native for VT_SETMODE ioctl and remove strange and invalid ISSIGVALID macro. The code has not been tested right way but it was originally broken.	2015-05-24 17:59:17 +00:00
Dmitry Chagin	19d8b461f4	Add utimensat() system call. The patch developed by Jilles Tjoelker and Andrew Wilcox and adopted for lemul branch by me.	2015-05-24 17:57:07 +00:00
Dmitry Chagin	5885e5ab29	Convert Linux signal number to the FreeBSD.	2015-05-24 17:49:09 +00:00
Dmitry Chagin	4ab7403bbd	Rework signal code to allow using it by other modules, like linprocfs: 1. Linux sigset always 64 bit on all platforms. In order to move Linux sigset code to the linux_common module define it as 64 bit int. Move Linux sigset manipulation routines to the MI path. 2. Move Linux signal number definitions to the MI path. In general, they are the same on all platforms except for a few signals. 3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion tables to avoid conversion errors. 4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside of allowed on Linux signal numbers. PR: 197216	2015-05-24 17:47:20 +00:00
Dmitry Chagin	a7ac457613	According to Linux man sigaltstack(3) shall return EINVAL if the ss argument is not a null pointer, and the ss_flags member pointed to by ss contains flags other than SS_DISABLE. However, in fact, Linux also allows SS_ONSTACK flag which is simply ignored. For buggy apps (at least mono) ignore other than SS_DISABLE flags as a Linux do. While here move MI part of sigaltstack code to the appropriate place. Reported by: abi at abinet dot ru	2015-05-24 17:44:08 +00:00
Dmitry Chagin	76672e1113	Add EPOLLERR flag handling to epoll. Tested by: abi at abinet dot ru	2015-05-24 17:42:45 +00:00
Dmitry Chagin	e2ff4b9864	As fo_fill_kinfo() does not check fo_fill_kinfo to NULL add a fo_fill_kinfo op to eventfdops. Reported by: trinity	2015-05-24 17:40:14 +00:00
Dmitry Chagin	b6aeb7d5dd	Add preliminary fallocate system call implementation to emulate posix_fallocate() function. Differential Revision: https://reviews.freebsd.org/D1523 Reviewed by: emaste	2015-05-24 17:33:21 +00:00
Dmitry Chagin	16ac71bc4f	Delete the duplicate of linux_to_native_clockid() function. Differential Revision: https://reviews.freebsd.org/D1521 Reviewed by: trasz	2015-05-24 17:30:31 +00:00
Dmitry Chagin	680982281b	Do not use struct l_timespec without conversion. While here move args->timeout handling before acquiring the futex key at FUTEX_WAIT path. Differential Revision: https://reviews.freebsd.org/D1520 Reviewed by: trasz	2015-05-24 17:29:18 +00:00
Dmitry Chagin	7e947ccc81	Add prototypes for static futex functions. Differential Revision: https://reviews.freebsd.org/D1519 Reviewed by: trasz	2015-05-24 17:27:59 +00:00
Dmitry Chagin	2166e4e0a5	As for now our tmpfs is no longer being considered "highly experimental" remove /dev/shm magic commited in r218497 and convert tmpfs type to an expected magic number. Differential Revision: https://reviews.freebsd.org/D1497 Reviewed by: emaste, trasz	2015-05-24 17:26:58 +00:00
Dmitry Chagin	5dd1d097f8	Print out unsupported futex operation message only once for the process. Differential Revision: https://reviews.freebsd.org/D1498	2015-05-24 17:25:57 +00:00
Dmitry Chagin	2711aba97e	Add some clock mappings used in glibc 2.20. Differential Revision: https://reviews.freebsd.org/D1465 Reviewd by: trasz	2015-05-24 17:23:08 +00:00
Dmitry Chagin	7d96520b25	Improve ktr(9) records in thread managment code. Differential Revision: https://reviews.freebsd.org/D1464 Reviewed by: trasz	2015-05-24 17:09:07 +00:00
Dmitry Chagin	68cf0367e9	Use local struct proc * varable instead of dereferencing td->td_proc. Differential Revision: https://reviews.freebsd.org/D1463 Reviewed by: emaste	2015-05-24 17:08:25 +00:00
Dmitry Chagin	97cfa5c899	Avoid unnecessary em zeroing in non-exec path as it already zeroed by malloc with M_ZERO flag and move zeroing to the proper place in exec path. Differential Revision: https://reviews.freebsd.org/D1462 Reviewed by: trasz	2015-05-24 17:07:10 +00:00
Dmitry Chagin	e0327ddba0	Remove the unnecessary cast. Differential Revision: https://reviews.freebsd.org/D1461 Reviewed by: emaste	2015-05-24 17:05:59 +00:00
Dmitry Chagin	a6b40812ec	Implement ppoll() system call. Differential Revision: https://reviews.freebsd.org/D1105 Reviewed by: trasz	2015-05-24 16:59:25 +00:00
Dmitry Chagin	3d7b4b3720	td_sigmask of a newly created thread copied from td. Remove excess initialization of td_sigmask. Differential Revision: https://reviews.freebsd.org/D1128 Reviewed by: emaste	2015-05-24 16:56:32 +00:00
Dmitry Chagin	2c4f134b25	Update Linux compat revision to 32. Differential Revision: https://reviews.freebsd.org/D1122 Reviewed by: emaste	2015-05-24 16:55:32 +00:00
Dmitry Chagin	520e9c187d	Fix linux_common module build with KTR option. Differential Revision: https://reviews.freebsd.org/D1096 Reviewed by: trasz	2015-05-24 16:52:45 +00:00
Dmitry Chagin	a31d76867d	Implement eventfd system call. Differential Revision: https://reviews.freebsd.org/D1094 In collaboration with: Jilles Tjoelker	2015-05-24 16:49:14 +00:00
Dmitry Chagin	3e89b64168	Put the correct value for the abi_nfdbits parameter of kern_select() for all supported Linuxulators. Differential Revision: https://reviews.freebsd.org/D1093 Reviewed by: trasz	2015-05-24 16:47:13 +00:00
Dmitry Chagin	e16fe1c730	Implement epoll family system calls. This is a tiny wrapper around kqueue() to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data, so we keep user data in the proc emuldata. Initial patch developed by rdivacky@ in 2007, then extended by Yuri Victorovich @ r255672 and finished by me in collaboration with mjg@ and jillies@. Differential Revision: https://reviews.freebsd.org/D1092	2015-05-24 16:41:39 +00:00
Dmitry Chagin	d2b6dbc06f	Implement F_DUPFD_CLOEXEC fcntl flag. Differential Revision: https://reviews.freebsd.org/D1089 Reviewed by: trasz	2015-05-24 16:34:57 +00:00
Dmitry Chagin	bfa4d74baf	Add several fcntl flags. Differential Revision: https://reviews.freebsd.org/D1088 Reviewed by: trasz	2015-05-24 16:32:52 +00:00
Dmitry Chagin	4d0f380d87	To avoid code duplication move open/fcntl definitions to the MI header file. Differential Revision: https://reviews.freebsd.org/D1087 Reviewed by: trasz	2015-05-24 16:31:44 +00:00
Dmitry Chagin	26c68e1fe5	Use the BSD_TO_LINUX_SIGNAL() wherever there is no need to check the ABI as it is known. Differential Revision: https://reviews.freebsd.org/D1086	2015-05-24 16:30:23 +00:00
Dmitry Chagin	2245df381a	Convert Linux wait options to the FreeBSD. Check wait options as a Linux do. Linux always set WEXITED option not a WUNTRACED\|WNOHANG which is a strange bug. Differential Revision: https://reviews.freebsd.org/D1085 Reviewed by: trasz	2015-05-24 16:28:58 +00:00
Dmitry Chagin	7a7a6efc25	Set WIFCONTINUED to the wait status if needed. Differential Revision: https://reviews.freebsd.org/D1083 Reviewed by: trasz	2015-05-24 16:27:38 +00:00
Dmitry Chagin	9599b0ec3a	Rewrite linux_recvfrom. To avoid double conversion of sockaddr use kern_recvit() directly. And check fromlen parameter before sockaddr copyin and conversion. Differential Revision: https://reviews.freebsd.org/D1082	2015-05-24 16:26:55 +00:00
Dmitry Chagin	4048f59cd0	Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory. Differential Revision: https://reviews.freebsd.org/D1080	2015-05-24 16:24:24 +00:00
Dmitry Chagin	baa232bbfd	Change linux faccessat syscall definition to match actual linux one. The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented within the glibc wrapper function for faccessat(). If either of these flags are specified, then the wrapper function employs fstatat() to determine access permissions. Differential Revision: https://reviews.freebsd.org/D1078 Reviewed by: trasz	2015-05-24 16:18:03 +00:00
Dmitry Chagin	e0d3ea8c65	Where possible we will use M_LINUX malloc(9) type. Move M_FUTEX defines to the linux_common.ko. Differential Revision: https://reviews.freebsd.org/D1077 Reviewed by: emaste	2015-05-24 16:14:41 +00:00
Dmitry Chagin	0edc82b564	Move FEATURE macros for v4l and v4l2 to the common module. Differential Revision: https://reviews.freebsd.org/D1075 Reviewed by: emaste	2015-05-24 16:00:01 +00:00
Dmitry Chagin	bc27367760	Refund the proc emuldata struct for future use. For now move flags from thread emuldata to proc emuldata as it was originally intended. As we can have both 64 & 32 bit Linuxulator running any eventhandler can be called twice for us. To prevent this move eventhandlers code from linux_emul.c to the linux_common.ko module. Differential Revision: https://reviews.freebsd.org/D1073	2015-05-24 15:54:58 +00:00
Dmitry Chagin	67d3974849	Introduce a new module linux_common.ko which is intended for the following primary purposes: 1. Remove the dependency of linsysfs and linprocfs modules from linux.ko, which will be architecture specific on amd64. 2. Incorporate into linux_common.ko general code for platforms on which we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit). 3. Move malloc(9) declaration to linux_common.ko, to enable getting memory usage statistics properly. Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko. Temporarily remove dtrace garbage from linux_mib.c and linux_util.c Differential Revision: https://reviews.freebsd.org/D1072 In collaboration with: Vassilis Laganakos. Reviewed by: trasz	2015-05-24 15:51:18 +00:00
Dmitry Chagin	606bcc1741	Add newfstatat system call for 64-bit Linuxulator. Differential Revision: https://reviews.freebsd.org/D1071 Reviewed by: trasz	2015-05-24 15:48:34 +00:00
Dmitry Chagin	4ca75bed31	Fix compilation with -DDEBUG option. Differential Revision: https://reviews.freebsd.org/D1070 Reviewed by: trasz	2015-05-24 15:47:15 +00:00
Dmitry Chagin	36204c3016	Add 64 bit support to the vdso. Differential Revision: https://reviews.freebsd.org/D1069 Reviewed by: trasz	2015-05-24 15:45:36 +00:00
Dmitry Chagin	31eb438886	x86_64 Linux do not use multiplexing on ipc system calls. Move struct ipc_perm definition to the MD path as it differs for 64 and 32 bit platform. Differential Revision: https://reviews.freebsd.org/D1068 Reviewed by: trasz	2015-05-24 15:44:41 +00:00
Dmitry Chagin	7f8f1d7f7a	Disable i386 call for x86-64 Linux. Differential Revision: https://reviews.freebsd.org/D1067 Reviewed by: trasz	2015-05-24 15:43:53 +00:00
Dmitry Chagin	a12b9b3d96	64-bit paltforms, like x86_64, do not use multiplexing on socketcall system calls. Differential Revision: https://reviews.freebsd.org/D1065 Reviewed by: trasz	2015-05-24 15:41:27 +00:00
Dmitry Chagin	297f61cc01	Get ready to commit x86_64 Linux emulation. All fields of type l_int in struct statfs are defined as l_long on i386 and amd64. Differential Revision: https://reviews.freebsd.org/D1064 Reviewed by: trasz	2015-05-24 15:39:08 +00:00
Dmitry Chagin	0020bdf13a	Put linux_platform into the vdso to avoid copying it onto the stack at every exec. Differential Revision: https://reviews.freebsd.org/D1062 Reviewed by: trasz	2015-05-24 15:30:52 +00:00
Dmitry Chagin	bdc379344a	Implement vdso - virtual dynamic shared object. Through vdso Linux exposes functions from kernel with proper DWARF CFI information so that it becomes easier to unwind through them. Using vdso is a mandatory for a thread cancelation && cleanup on a modern glibc. Differential Revision: https://reviews.freebsd.org/D1060	2015-05-24 15:28:17 +00:00
Dmitry Chagin	ae50b4d7b5	Implement pselect6() system call. Differential Revision: https://reviews.freebsd.org/D1051 Reviewed by: trasz	2015-05-24 15:21:25 +00:00
Dmitry Chagin	c3978c7bb1	Implement prlimit64() system call. Differential Revision: https://reviews.freebsd.org/D1050 Reviewed by: emaste, trasz	2015-05-24 15:18:19 +00:00
Dmitry Chagin	254a937ee5	Implement dup3() system call. Differential Revision: https://reviews.freebsd.org/D1049 Reviewed by: emaste	2015-05-24 15:14:51 +00:00
Dmitry Chagin	44e93b234f	Sched_rr_get_interval returns EINVAL in case when the invalid pid specified. This silence the ltp tests. Differential Revision: https://reviews.freebsd.org/D1048 Reviewed by: trasz	2015-05-24 15:13:56 +00:00
Dmitry Chagin	7ac9766db4	Implement rt_sigqueueinfo() system call. Differential Revision: https://reviews.freebsd.org/D1047 Reviewed by: trasz	2015-05-24 15:11:32 +00:00
Dmitry Chagin	e5fe4ccf59	Implement waitid() system call. Differential Revision: https://reviews.freebsd.org/D1046	2015-05-24 15:06:39 +00:00
Dmitry Chagin	001398c4c5	To reduce code duplication introduce linux_copyout_rusage() method. Use it in linux_wait4() system call and move linux_wait4() to the MI path. While here add a prototype for the static bsd_to_linux_rusage(). Differential Revision: https://reviews.freebsd.org/D2138 Reviewed by: trasz	2015-05-24 15:03:09 +00:00
Dmitry Chagin	a7ae3c557f	Add a function for converting wait options. Differential Revision: https://reviews.freebsd.org/D1045 Reviewed by: trasz	2015-05-24 15:00:27 +00:00
Dmitry Chagin	fe4ed1e768	Add a siginfo_t conversion function. Differential Revision: https://reviews.freebsd.org/D1044 Reviewed by: emaste, trasz	2015-05-24 14:58:30 +00:00
Dmitry Chagin	86bda7a02d	Remove a now unused define. Differential Revision: https://reviews.freebsd.org/D1043 Reviewed by: trasz	2015-05-24 14:57:39 +00:00
Dmitry Chagin	a6326909bb	Introduce LINUX_VERSION_STR, LINUX_VERSION_CODE macro for use instead of harcoded pr_osrelease, pr_osrel values. This will be used later in the VDSO. Differential Revision: https://reviews.freebsd.org/D1042 Reviewed by: trasz	2015-05-24 14:56:21 +00:00
Dmitry Chagin	5e609834bd	pthread_join() caller do futex_wait on child_clear_tid. As a results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined wake up the one thread. Differential Revision: https://reviews.freebsd.org/D1040	2015-05-24 14:54:12 +00:00
Dmitry Chagin	81338031c4	Switch linuxulator to use the native 1:1 threads. The reasons: 1. Get rid of the stubs/quirks with process dethreading, process reparent when the process group leader exits and close to this problems on wait(), waitpid(), etc. 2. Reuse our kernel code instead of writing excessive thread managment routines in Linuxulator. Implementation details: 1. The thread is created via kern_thr_new() in the clone() call with the CLONE_THREAD parameter. Thus, everything else is a process. 2. The test that the process has a threads is done via P_HADTHREADS bit p_flag of struct proc. 3. Per thread emulator state data structure is now located in the struct thread and freed in the thread_dtor() hook. Mandatory holdig of the p_mtx required when referencing emuldata from the other threads. 4. PID mangling has changed. Now Linux pid is the native tid and Linux tgid is the native pid, with the exception of the first thread in the process where tid and pid are one and the same. Ugliness: In case when the Linux thread is the initial thread in the thread group thread id is equal to the process id. Glibc depends on this magic (assert in pthread_getattr_np.c). So for system calls that take thread id as a parameter we should use the special method to reference struct thread. Differential Revision: https://reviews.freebsd.org/D1039	2015-05-24 14:53:16 +00:00
Dmitry Chagin	2003907d45	Implement a Linux version of sched_getparam() && sched_setparam(). Temporarily use the first thread in proc. Differential Revision: https://reviews.freebsd.org/D1036 Reviewed by: trasz	2015-05-24 14:45:57 +00:00
Dmitry Chagin	1aa90eca33	In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator). Linuxulator temporarily uses first thread in proc. Move linux_sched_rr_get_interval() to the MI part. Differential Revision: https://reviews.freebsd.org/D1032 Reviewed by: trasz	2015-05-24 14:39:26 +00:00
Dmitry Chagin	161acbb670	In preparation for switching linuxulator to the use the native 1:1 threads introduce linux_exit() stub instead of sys_exit() call (which terminates process). In the new linuxulator exit() system call terminates the calling thread (not a whole process). Differential Revision: https://reviews.freebsd.org/D1027 Reviewed by: trasz	2015-05-24 14:33:19 +00:00
Edward Tomasz Napierala	310e931198	Simplify linux_getcwd(), removing code that was longer used. Differential Revision: https://reviews.freebsd.org/D2326 Reviewed by: dchagin@, kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-23 08:41:50 +00:00
Edward Tomasz Napierala	6289b482ec	Modify kern___getcwd() to take max pathlen limit as an additional argument. This will be used for the Linux emulation layer - for Linux, PATH_MAX is 4096 and not 1024. Differential Revision: https://reviews.freebsd.org/D2335 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-21 13:55:24 +00:00
Edward Tomasz Napierala	565716e60e	Add back fdrop() missed in r281726. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-19 07:35:18 +00:00
Edward Tomasz Napierala	92f7441328	Optimize the O_NOCTTY handling hack in linux_common_open(). Differential Revision: https://reviews.freebsd.org/D2323 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-19 07:12:16 +00:00
Edward Tomasz Napierala	94d014f079	Remove unused code from linux_mount(), and make it possible to mount any kind of filesystem instead of harcoded three. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-18 09:49:09 +00:00
Mateusz Guzik	daf63fd2f9	cred: add proc_set_cred helper The goal here is to provide one place altering process credentials. This eases debugging and opens up posibilities to do additional work when such an action is performed.	2015-03-16 00:10:03 +00:00
Dmitry Chagin	9f7a06f27e	Indeed, instead of hiding the kern___getcwd() bug by bogus cast in r276564, change path type to char * (pathnames are always char ). And remove bogus casts of malloc(). kern___getcwd() internally doesn't actually use or support u_char paths, except to copy them to a normal char * path. These changes are not visible to libc as libc/gen/getcwd.c misdeclares __getcwd() as taking a plain char * path. While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as we always have sysproto.h. Pointed out by: bde MFC after: 1 week	2015-01-04 10:34:02 +00:00
Dmitry Chagin	9fa04b52ec	Cast *path to silence clang -Wpointer-sign warning. MFC after: 1 week	2015-01-02 19:29:32 +00:00
Dmitry Chagin	de90b09a79	Remove Giant from linux_getcwd() due to VFS is MPSAFE now. Discussed with: kib MFC after: 1 week	2015-01-02 18:36:08 +00:00
Dmitry Chagin	857ad5a31b	Fix Clang -Wpointer-sign warnings. MFC after: 1 week	2015-01-01 20:53:38 +00:00
Dmitry Chagin	5072ad67ae	Fix Clang warning: passing 'unsigned int ' to parameter of type 'int ' converts between pointers to integer types with different sign. MFC after: 1 week	2015-01-01 19:57:24 +00:00
Konstantin Belousov	5c7bebf961	The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-26 14:10:00 +00:00
Konstantin Belousov	6e646651d3	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 18:01:51 +00:00
Alexander Motin	6a9bcacfcf	Remake Linux' SOUND_MIXER_INFO IOCTL as a wrapper around new FreeBSD's one. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 3 days	2014-09-24 08:18:11 +00:00
Sean Bruno	d143d69857	Bump minimum linux compat version to support Centos6 ports updates for linux. Update linux compat minimum revision to match linux-c6 now in ports. This is a candidate for 10.1 R as it matches the current state of supported linux compat packages in the ports tree. PR: 187786 Reviewed by: xmj MFC after: 2 days Relnotes: yes	2014-09-22 17:26:07 +00:00
Bjoern A. Zeeb	0a041f3b47	Implement most of timer_{create,settime,gettime,getoverrun,delete} for amd64/linux32. Fix the entirely bogus (untested) version from r161310 for i386/linux using the same shared code in compat/linux. It is unclear to me if we could support more clock mappings but the current set allows me to successfully run commercial 32bit linux software under linuxolator on amd64. Reviewed by: jhb Differential Revision: D784 MFC after: 3 days Sponsored by: DARPA, AFRL	2014-09-18 08:36:45 +00:00
Alexander Motin	94fe9f959c	- Add support for SG_GET_SG_TABLESIZE IOCTL to report that we don't support scatter/gather lists. - Return error for still unsupported SG 3.x API read/write calls. MFC after: 1 month	2014-06-04 12:05:47 +00:00
Alexander Motin	fcaf473cfc	Overhaul CAM SG driver IOCTL interfaces. Make it really work for native FreeBSD programs. Before this it was broken for years due to different number of pointer dereferences in Linux and FreeBSD IOCTL paths, permanently returning errors to FreeBSD programs. This change breaks the driver FreeBSD IOCTL ABI, making it more strict, but since it was not working any way -- who bother. Add shims for 32-bit programs on 64-bit host, translating the argument of the SG_IO IOCTL for both FreeBSD and Linux ABIs. With this change I was able to run 32-bit Linux sg3_utils tools and simple 32 and 64-bit FreeBSD test tools on both 32 and 64-bit FreeBSD systems. MFC after: 1 month	2014-06-02 19:53:53 +00:00
Dmitry Chagin	fb6bf8bba9	Glibc was switched to the FUTEX_WAIT_BITSET op and CLOCK_REALTIME flag has been added instead of FUTEX_WAIT to replace the FUTEX_WAIT logic which needs to do gettimeofday() calls before the futex syscall to convert the absolute timeout to a relative timeout. Before this the CLOCK_MONOTONIC used by the FUTEX_WAIT_BITSET op. When the FUTEX_CLOCK_REALTIME is specified the timeout is an absolute time, not a relative time. Rework futex_wait to handle this. On the side fix the futex leak in error case and remove useless parentheses. Properly calculate the timeout for the CLOCK_MONOTONIC case. MFC after: 3 days	2014-05-31 14:58:53 +00:00
Dmitry Chagin	32fd44657c	In r218101 I have not changed properly the futex syscall definition. Some Linux futex ops atomically verifies that the futex address uaddr (uval) contains the value val. Comparing signed uval and unsigned val may lead to an unexpected result, mostly to a deadlock. So copyin uaddr to an unsigned int to compare the parameters correctly. While here change ktr records to print parameters in more readable format. Tested by eadler@ MFC after: 3 days	2014-05-28 05:57:35 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Robert Watson	4a14441044	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Gleb Smirnoff	af50ea380f	Axe IFF_SMART. Fortunately this layering violating flag was never used, it was just declared.	2013-11-05 12:52:56 +00:00
Gleb Smirnoff	eedc7fd9e8	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Mark Johnston	92c6196caa	Fix some typos that were causing probe argument types to show up as unknown. Reviewed by: rwatson (mac provider) Approved by: re (glebius) MFC after: 1 week	2013-10-01 15:40:27 +00:00
Roman Divacky	b12698e1a1	Revert r255672, it has some serious flaws, leaking file references etc. Approved by: re (delphij)	2013-09-18 18:48:33 +00:00
Roman Divacky	253c75c0de	Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file. Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich. Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>	2013-09-18 17:56:04 +00:00
John Baldwin	edb572a38c	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)	2013-09-09 18:11:59 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
Mark Johnston	1570438586	Remove a couple of unused macros. MFC after: 3 days	2013-08-17 21:53:37 +00:00
Jeff Roberson	5df87b21d3	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
Hans Petter Selasky	a40a377cc7	Add some missing LIBUSB IOCTL conversion codes.	2013-07-14 10:13:01 +00:00
Alexander Leidinger	b85e1f7d05	- Move videodev headers from compat/linux to contrib/v4l (cp from vendor and apply diff to compat/linux versions). - The cp implies an update of videodev2.h to the linux kernel 2.6.34.14 one. The update makes video in skype v4 work on FreeBSD. Tested by: Artyom Mirgorodskiy <artyom.mirgorodsky@gmail.com> (update of header only)	2013-07-06 19:59:06 +00:00
Jilles Tjoelker	d289dc7b73	Rename do_pipe() to kern_pipe2() and declare it properly.	2013-03-31 17:42:54 +00:00
Eitan Adler	1eb9ea583b	Remove check for NULL prior to free(9) and m_freem(9). Approved by: cperciva (mentor)	2013-03-04 02:21:34 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
John Baldwin	d825ce0a5d	Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h by moving bits that are MI out into headers in compat/linux. Reviewed by: Chagin Dmitry dmitry \| gmail MFC after: 2 weeks	2013-01-29 18:41:30 +00:00
Dmitry Chagin	4d04cf1d9e	Arithmetic on pointers takes into account the size of the type. Properly cast the pointer to avoid incorrect pointer scaling. MFC after: 1 Week	2013-01-25 14:40:54 +00:00
John Baldwin	fb709557a3	Don't assume that all Linux TCP-level socket options are identical to FreeBSD TCP-level socket options (only the first two are). Instead, using a mapping function and fail unsupported options as we do for other socket option levels. MFC after: 2 weeks	2013-01-23 21:44:48 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Colin Percival	43f13bea35	MFS security patches which seem to have accidentally not reached HEAD: Fix insufficient message length validation for EAP-TLS messages. Fix Linux compatibility layer input validation error. Security: FreeBSD-SA-12:07.hostapd Security: FreeBSD-SA-12:08.linux Security: CVE-2012-4445, CVE-2012-4576 With hat: so@	2012-11-23 01:48:31 +00:00
Konstantin Belousov	140dedb81c	The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks	2012-11-02 13:56:36 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Konstantin Belousov	877d24ac8a	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks	2012-09-28 11:25:02 +00:00
Kevin Lo	457a9cfbc1	Remove redundant check	2012-09-12 10:12:03 +00:00
Konstantin Belousov	c5c1199c83	Extend the KPI to lock and unlock f_offset member of struct file. It now fully encapsulates all accesses to f_offset, and extends f_offset locking to other consumers that need it, in particular, to lseek() and variants of getdirentries(). Ensure that on 32bit architectures f_offset, which is 64bit quantity, always read and written under the mtxpool protection. This fixes apparently easy to trigger race when parallel lseek()s or lseek() and read/write could destroy file offset. The already broken ABI emulations, including iBCS and SysV, are not converted (yet). Tested by: pho No objections from: jhb MFC after: 3 weeks	2012-07-02 21:01:03 +00:00
Alexander Leidinger	19e252baeb	- >500 static DTrace probes for the linuxulator - DTrace scripts to check for errors, performance, ... they serve mostly as examples of what you can do with the static probe;s with moderate load the scripts may be overwhelmed, excessive lock-tracing may influence program behavior (see the last design decission) Design decissions: - use "linuxulator" as the provider for the native bitsize; add the bitsize for the non-native emulation (e.g. "linuxuator32" on amd64) - Add probes only for locks which are acquired in one function and released in another function. Locks which are aquired and released in the same function should be easy to pair in the code, inter-function locking is more easy to verify in DTrace. - Probes for locks should be fired after locking and before releasing to prevent races (to provide data/function stability in DTrace, see the man-page of "dtrace -v ..." and the corresponding DTrace docs).	2012-05-05 19:42:38 +00:00
Jung-uk Kim	d69a426fce	- Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27 but GNU libc used it without checking its kernel version, e. g., Fedora 10. - Move pipe(2) implementation for Linuxulator from MD files to MI file, sys/compat/linux/linux_file.c. There is no MD code for this syscall at all. - Correct an argument type for pipe() from l_ulong * to l_int *. Probably this was the source of MI/MD confusion. Reviewed by: emulation	2012-04-16 21:22:02 +00:00
Konstantin Belousov	3494f31ad2	Fix misuse of the kernel map in miscellaneous image activators. Vnode-backed mappings cannot be put into the kernel map, since it is a system map. Use exec_map for transient mappings, and remove the mappings with kmem_free_wakeup() to notify the waiters on available map space. Do not map the whole executable into KVA at all to copy it out into usermode. Directly use vn_rdwr() for the case of not page aligned binary. There is one place left where the potentially unbounded amount of data is mapped into exec_map, namely, in the COFF image activator enumeration of the needed shared libraries. Reviewed by: alc MFC after: 2 weeks	2012-02-17 23:47:16 +00:00
Ed Schouten	7870adb640	Remove direct access to si_name. Code should just use the devtoname() function to obtain the name of a character device. Also add const keywords to pieces of code that need it to build properly. MFC after: 2 weeks	2012-02-10 12:35:57 +00:00
Ulrich Spörlein	9a14aa017b	Convert files to UTF-8	2012-01-15 13:23:18 +00:00
Dimitry Andric	69ee3e2f52	In sys/compat/linux/linux_ioctl.c, work around a warning when a pointer is compared to an integer, by casting the pointer to l_uintptr_t. No functional difference on both i386 and amd64. Reviewed by: ed, jhb MFC after: 1 week	2012-01-03 18:49:39 +00:00
John Baldwin	dd01579cde	Implement linux_fadvise64() and linux_fadvise64_64() using kern_posix_fadvise(). Reviewed by: silence on emulation@ MFC after: 2 weeks	2011-12-29 15:34:59 +00:00
Ed Schouten	767a32641c	Make the Linux *at() calls a bit more complete. Properly support: - AT_EACCESS for faccessat(), - AT_SYMLINK_FOLLOW for linkat().	2011-11-19 07:19:37 +00:00
Ed Schouten	d3a993d46b	Improve access() parameter name consistency. The current code mixes the use of `flags' and `mode'. This is a bit confusing, since the faccessat() function as a `flag' parameter to store the AT_ flag. Make this less confusing by using the same name as used in the POSIX specification -- `amode'.	2011-11-19 06:35:15 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Christian Brueffer	796fa5e465	Add curly braces missed in r226247. Pointy hat to: brueffer Submitted by: many MFC after: 1 week	2011-10-11 13:40:37 +00:00
Christian Brueffer	05ad7ad667	Properly free linux_gidset in case of an error. CID: 4136 Found with: Coverity Prevent(tm) MFC after: 1 week	2011-10-11 10:32:23 +00:00
Jung-uk Kim	bf3a36cc7b	Use the caculated length instead of maximum length.	2011-10-06 21:55:05 +00:00
Jung-uk Kim	b6f96462ba	Remove a now-defunct variable.	2011-10-06 21:40:08 +00:00
Jung-uk Kim	3106d6704b	Use uint32_t instead of u_int32_t. Fix style(9) nits.	2011-10-06 21:17:46 +00:00
Jung-uk Kim	c02637c717	Make sure to ignore the leading NULL byte from Linux abstract namespace.	2011-10-06 21:09:28 +00:00
Jung-uk Kim	f05531a392	Restore the original socket address length if it was not really AF_INET6.	2011-10-06 20:48:23 +00:00
Jung-uk Kim	43399111a7	Retern more appropriate errno when Linux path name is too long.	2011-10-06 20:28:08 +00:00
Jung-uk Kim	0007f669ca	Inline do_sa_get() function and remove an unused return value.	2011-10-06 20:20:30 +00:00
Jung-uk Kim	c15cdbf2f3	Unroll inlined strnlen(9) and make it easier to read. No functional change.	2011-10-06 19:59:14 +00:00
Colin Percival	5da3eb94fc	Fix a bug in UNIX socket handling in the linux emulator which was exposed by the security fix in FreeBSD-SA-11:05.unix. Approved by: so (cperciva) Approved by: re (kib) Security: Related to FreeBSD-SA-11:05.unix, but not actually a security fix.	2011-10-04 19:07:38 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Jonathan Anderson	cfb5f76865	Add experimental support for process descriptors A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc	2011-08-18 22:51:30 +00:00
Robert Watson	a9d2f8d84f	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
Bjoern A. Zeeb	74d7a2539e	Remove the 'either' from the comment as it'll be less obvious that we removed semmap in a bit of time from now. Re-wrap. Suggested by: jhb	2011-07-17 05:33:22 +00:00
Bjoern A. Zeeb	1080a2c85d	Remove semaphore map entry count "semmap" field and its tuning option that is highly recommended to be adjusted in too much documentation while doing nothing in FreeBSD since r2729 (rev 1.1). ipcs(1) needs to be recompiled as it is accessing _KERNEL private variables. Reviewed by: jhb (before comment change on linux code) Sponsored by: Sandvine Incorporated	2011-07-14 14:18:14 +00:00
Alexander Leidinger	f4cb7c85e6	Commit the missing linux_videdev2_compat.h (lost somewhere between commit tree patch generation -> successful compile tree build test -> commmit). Pointy hat to: netchild	2011-05-04 13:09:20 +00:00
Alexander Leidinger	60c6d23685	Add FEATURE macros for v4l and v4l2 to the linuxulator. Suggested by: ae	2011-05-04 09:52:34 +00:00
Alexander Leidinger	15bf9014c9	This is v4l2 support for the linuxulator. This allows to access FreeBSD native devices which support the v4l2 API from processes running within the linuxulator, e.g. skype or flash can access the multimedia/pwcbsd or multimedia/webcamd supplied drivers. Submitted by: nox MFC after: 1 month	2011-05-04 09:05:39 +00:00
Alexander Leidinger	4c94038794	Fix typo in comment, improve comment.	2011-05-04 08:42:31 +00:00
Alexander Leidinger	d0f5ca6d40	Add explanation about the use-permission and FreeBSDify it.	2011-05-04 08:41:55 +00:00
Alexander Leidinger	41ebeb8e6f	Copy the v4l2 header unchanged from the vendor branch.	2011-05-04 08:31:58 +00:00
Edward Tomasz Napierala	1ba5ad4210	Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-05 20:23:59 +00:00
Andriy Gapon	a930718af1	Revert r220032:linux compat: add SO_PASSCRED option with basic handling I have not properly thought through the commit. After r220031 (linux compat: improve and fix sendmsg/recvmsg compatibility) the basic handling for SO_PASSCRED is not sufficient as it breaks recvmsg functionality for SCM_CREDS messages because now we would need to handle sockcred data in addition to cmsgcred. And that is not implemented yet. Pointyhat to: avg	2011-03-31 08:14:51 +00:00
Andriy Gapon	01a9e1a11b	linux compat: add SO_PASSCRED option with basic handling This seems to have been a part of a bigger patch by dchagin that either haven't been committed or committed partially. Submitted by: dchagin, nox MFC after: 2 weeks	2011-03-26 11:25:36 +00:00
Andriy Gapon	605da56bc3	linux compat: improve and fix sendmsg/recvmsg compatibility - implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS and prctl PR_SET_KEEPCAPS. - add SCM_CREDS support to sendmsg and recvmsg - modify sendmsg to ignore control messages if not using UNIX domain sockets This should allow linux pulse audio daemon and client work on FreeBSD and interoperate with native counter-parts modulo the differences in pulseaudio versions. PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks	2011-03-26 11:05:53 +00:00
Alexander Leidinger	0d7b5e545c	Staticize functions which are not used somewhere else, move the corresponding prototypes from the header to the code file.	2011-03-15 13:40:47 +00:00
Dmitry Chagin	31f7ad1545	Style(9) fixes. No functional changes. MFC after: 2 Week	2011-03-12 07:47:05 +00:00
John Baldwin	c28a98e948	Remove now-obsolete comment. Submitted by: netchild MFC after: 1 week	2011-03-10 19:50:12 +00:00
Dmitry Chagin	a2cd91cf28	Indeed, remove bogus since r219405 check of the Linux ABI. Pointed out: jhb MFC after: 2 Week	2011-03-09 05:59:33 +00:00
Dmitry Chagin	e5d81ef1b5	Extend struct sysvec with new method sv_schedtail, which is used for an explicit process at fork trampoline path instead of eventhadler(schedtail) invocation for each child process. Remove eventhandler(schedtail) code and change linux ABI to use newly added sysvec method. While here replace explicit comparing of module sysentvec structure with the newly created process sysentvec to detect the linux ABI. Discussed with: kib MFC after: 2 Week	2011-03-08 19:01:45 +00:00
Dmitry Chagin	3a4bc25691	Print out shared flag for debug purpose. MFC after: 1 Week	2011-03-03 18:29:55 +00:00
Dmitry Chagin	815cb72a0c	Switch PROCESS_SHARE to AUTO_SHARE (as umtx do). Even for SHARED, if page mapped MAP_ANON linux uses private algorithm too. Disscussed with: jhb MFC after: 3 Days	2011-03-03 18:19:10 +00:00
John Baldwin	21f8f506fb	Use umtx_key objects to uniquely identify futexes. Private futexes in different processes that happen to use the same user address in the separate processes will now be treated as distinct futexes rather than the same futex. We can now honor shared futexes properly by mapping them to a PROCESS_SHARED umtx_key. Private futexes use THREAD_SHARED umtx_key objects. In conjunction with: dchagin Reviewed by: kib MFC after: 1 week	2011-02-23 13:23:28 +00:00
Dmitry Chagin	f9e66923e5	Do not clobber %rdx. Before calling vfork() syscall the linux user-space stores the current PID in the %rdx and restore it when the parent process will leave the kernel.	2011-02-20 07:58:30 +00:00
Dmitry Chagin	09d6cb0a23	For realtime signals fill the sigval value.	2011-02-15 21:46:36 +00:00
Dmitry Chagin	f3481dd9ab	Make a linux_rt_sigtimedwait() system call is actually working. 1) Translate the native signal number in the appropriate Linux signal. 2) Remove bogus code, which can lead to a panic as it calls kern_sigtimedwait with same ksiginfo. 3) Return the corresponding signal number.	2011-02-15 21:42:48 +00:00
Dmitry Chagin	8c50c56206	Style(9) fix. Wrap long lines in linux_rt_sigtimedwait().	2011-02-15 21:24:50 +00:00
Dmitry Chagin	d207e753da	Put the macro declaration in the relevant include file for future use.	2011-02-15 21:22:09 +00:00
Dmitry Chagin	e2ef00a426	Style(9) fix. Do not initialize variables in the declarations.	2011-02-14 17:24:58 +00:00
Dmitry Chagin	49fa1a745e	Sort include files in the alphabetical order.	2011-02-13 20:07:48 +00:00
Dmitry Chagin	4ca49f41ec	Remove comment about 'ftlk' LOR.	2011-02-13 18:46:34 +00:00
Dmitry Chagin	890c582fe5	Stop printing the LOR, as this is expected behavior.	2011-02-13 18:41:40 +00:00
Dmitry Chagin	7c3b05b99c	The bitset field of freshly created futex should be initialized explicity. Otherwise, REQUEUE operations fails.	2011-02-13 17:56:22 +00:00
Dmitry Chagin	d14cc07d07	Rename used_requeue and use it as bitwise field to store more flags. Reimplement used_requeue logic with LINUX_XDEPR_REQUEUEOP flag.	2011-02-12 20:58:59 +00:00
Dmitry Chagin	cfa57401b0	Slightly rewrite linux_fork: 1) Remove bogus error checking. 2) A new process exit from kernel through fork_trampoline(), so remove bogus check.	2011-02-12 20:16:25 +00:00
Dmitry Chagin	9588e04dde	Remove bogus include <machine/frame.h>	2011-02-12 19:14:57 +00:00
Dmitry Chagin	222198ab0b	Move linux_clone(), linux_fork(), linux_vfork() to a MI path.	2011-02-12 18:17:12 +00:00

... 3 4 5 6 7 ...

1213 Commits