freebsd-nq

Author	SHA1	Message	Date
Edward Tomasz Napierala	5db72ef2e4	Fix linux_getppid() to debug the actual parent, even it was reparented by debugger. Reviewed by: dchagin@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9361	2017-01-31 15:22:51 +00:00
Dmitry Chagin	23e8912c60	Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag. In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap(). Linux/i386 set this flag automatically if the binary requires executable stack. READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.	2016-07-10 08:15:50 +00:00
Gleb Smirnoff	34e05ebe72	Fix kernel stack disclosures in the Linux and 4.3BSD compat layers. Submitted by: CTurt Security: SA-16:20 Security: SA-16:21	2016-05-31 16:56:30 +00:00
Pedro F. Giffuni	1ce4275dd2	sys/compat/linux*: spelling fixes. Mostly on comments but there are some user-visible messages as well. MFC after: 2 weeks	2016-04-30 00:53:10 +00:00
Pedro F. Giffuni	ae26eab161	Fix indentation oops.	2016-04-03 14:40:54 +00:00
Dmitry Chagin	8bc21bafba	Move Linux specific times tests up to guarantee the values are defined. CID: 1305178 Submitted by: pfg@ MFC after: 1 week	2016-04-03 06:33:16 +00:00
Dmitry Chagin	4525bb829f	Rework r296543: 1. Limit secs to INT32_MAX / 2 to avoid errors from kern_setitimer(). Assert that kern_setitimer() returns 0. Remove bogus cast of secs. Fix style(9) issues. 2. Increment the return value if the remaining tv_usec value more than 500000 as a Linux does. Pointed out by: [1] Bruce Evans MFC after: 1 week	2016-03-20 11:40:52 +00:00
Dmitry Chagin	a87488d1e4	Better english. Submitted by: Kevin P. Neal MFC after: 1 week	2016-03-08 19:40:01 +00:00
Dmitry Chagin	649ca5e9dc	Put a commit message from r296502 about Linux alarm() system call behaviour to the source. Suggested by: emaste@ MFC after: 1 week	2016-03-08 19:20:57 +00:00
Dmitry Chagin	15c3b371e2	According to POSIX and Linux implementation the alarm() system call is always successfull. So, ignore any errors and return 0 as a Linux do. XXX. Unlike POSIX, Linux in case when the invalid seconds value specified always return 0, so in that case Linux does not return proper remining time. MFC after: 1 week	2016-03-08 15:12:49 +00:00
Gleb Smirnoff	c8358c6e0d	Call crextend() before copying old credentials to the new credentials and replace crcopysafe by crcopy as crcopysafe is is not intended to be safe in a threaded environment, it drops PROC_LOCK() in while() that can lead to unexpected results, such as overwrite kernel memory. In my POV crcopysafe() needs special attention. For now I do not see any problems with this function, but who knows. Submitted by: dchagin Found by: trinity Security: SA-16:04.linux	2016-01-14 10:16:25 +00:00
Konstantin Belousov	cd0a26c53f	Fix build for the KTR-enabled kernels. Sponsored by: The FreeBSD Foundation	2015-10-23 11:41:55 +00:00
Konstantin Belousov	b4490c6e93	The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation	2015-07-18 09:02:50 +00:00
Mateusz Guzik	f6f6d24062	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.	2015-06-10 10:48:12 +00:00
Dmitry Chagin	d9cbe8f0ef	Properly check tv_nsec value. The tv_nsec field can also be one of the special value UTIME_NOW or UTIME_OMIT.	2015-05-24 18:06:46 +00:00
Dmitry Chagin	19d8b461f4	Add utimensat() system call. The patch developed by Jilles Tjoelker and Andrew Wilcox and adopted for lemul branch by me.	2015-05-24 17:57:07 +00:00
Dmitry Chagin	4ab7403bbd	Rework signal code to allow using it by other modules, like linprocfs: 1. Linux sigset always 64 bit on all platforms. In order to move Linux sigset code to the linux_common module define it as 64 bit int. Move Linux sigset manipulation routines to the MI path. 2. Move Linux signal number definitions to the MI path. In general, they are the same on all platforms except for a few signals. 3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion tables to avoid conversion errors. 4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside of allowed on Linux signal numbers. PR: 197216	2015-05-24 17:47:20 +00:00
Dmitry Chagin	680982281b	Do not use struct l_timespec without conversion. While here move args->timeout handling before acquiring the futex key at FUTEX_WAIT path. Differential Revision: https://reviews.freebsd.org/D1520 Reviewed by: trasz	2015-05-24 17:29:18 +00:00
Dmitry Chagin	a6b40812ec	Implement ppoll() system call. Differential Revision: https://reviews.freebsd.org/D1105 Reviewed by: trasz	2015-05-24 16:59:25 +00:00
Dmitry Chagin	3e89b64168	Put the correct value for the abi_nfdbits parameter of kern_select() for all supported Linuxulators. Differential Revision: https://reviews.freebsd.org/D1093 Reviewed by: trasz	2015-05-24 16:47:13 +00:00
Dmitry Chagin	2245df381a	Convert Linux wait options to the FreeBSD. Check wait options as a Linux do. Linux always set WEXITED option not a WUNTRACED\|WNOHANG which is a strange bug. Differential Revision: https://reviews.freebsd.org/D1085 Reviewed by: trasz	2015-05-24 16:28:58 +00:00
Dmitry Chagin	7a7a6efc25	Set WIFCONTINUED to the wait status if needed. Differential Revision: https://reviews.freebsd.org/D1083 Reviewed by: trasz	2015-05-24 16:27:38 +00:00
Dmitry Chagin	e0d3ea8c65	Where possible we will use M_LINUX malloc(9) type. Move M_FUTEX defines to the linux_common.ko. Differential Revision: https://reviews.freebsd.org/D1077 Reviewed by: emaste	2015-05-24 16:14:41 +00:00
Dmitry Chagin	bc27367760	Refund the proc emuldata struct for future use. For now move flags from thread emuldata to proc emuldata as it was originally intended. As we can have both 64 & 32 bit Linuxulator running any eventhandler can be called twice for us. To prevent this move eventhandlers code from linux_emul.c to the linux_common.ko module. Differential Revision: https://reviews.freebsd.org/D1073	2015-05-24 15:54:58 +00:00
Dmitry Chagin	67d3974849	Introduce a new module linux_common.ko which is intended for the following primary purposes: 1. Remove the dependency of linsysfs and linprocfs modules from linux.ko, which will be architecture specific on amd64. 2. Incorporate into linux_common.ko general code for platforms on which we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit). 3. Move malloc(9) declaration to linux_common.ko, to enable getting memory usage statistics properly. Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko. Temporarily remove dtrace garbage from linux_mib.c and linux_util.c Differential Revision: https://reviews.freebsd.org/D1072 In collaboration with: Vassilis Laganakos. Reviewed by: trasz	2015-05-24 15:51:18 +00:00
Dmitry Chagin	4ca75bed31	Fix compilation with -DDEBUG option. Differential Revision: https://reviews.freebsd.org/D1070 Reviewed by: trasz	2015-05-24 15:47:15 +00:00
Dmitry Chagin	7f8f1d7f7a	Disable i386 call for x86-64 Linux. Differential Revision: https://reviews.freebsd.org/D1067 Reviewed by: trasz	2015-05-24 15:43:53 +00:00
Dmitry Chagin	0020bdf13a	Put linux_platform into the vdso to avoid copying it onto the stack at every exec. Differential Revision: https://reviews.freebsd.org/D1062 Reviewed by: trasz	2015-05-24 15:30:52 +00:00
Dmitry Chagin	ae50b4d7b5	Implement pselect6() system call. Differential Revision: https://reviews.freebsd.org/D1051 Reviewed by: trasz	2015-05-24 15:21:25 +00:00
Dmitry Chagin	c3978c7bb1	Implement prlimit64() system call. Differential Revision: https://reviews.freebsd.org/D1050 Reviewed by: emaste, trasz	2015-05-24 15:18:19 +00:00
Dmitry Chagin	44e93b234f	Sched_rr_get_interval returns EINVAL in case when the invalid pid specified. This silence the ltp tests. Differential Revision: https://reviews.freebsd.org/D1048 Reviewed by: trasz	2015-05-24 15:13:56 +00:00
Dmitry Chagin	e5fe4ccf59	Implement waitid() system call. Differential Revision: https://reviews.freebsd.org/D1046	2015-05-24 15:06:39 +00:00
Dmitry Chagin	001398c4c5	To reduce code duplication introduce linux_copyout_rusage() method. Use it in linux_wait4() system call and move linux_wait4() to the MI path. While here add a prototype for the static bsd_to_linux_rusage(). Differential Revision: https://reviews.freebsd.org/D2138 Reviewed by: trasz	2015-05-24 15:03:09 +00:00
Dmitry Chagin	a7ae3c557f	Add a function for converting wait options. Differential Revision: https://reviews.freebsd.org/D1045 Reviewed by: trasz	2015-05-24 15:00:27 +00:00
Dmitry Chagin	81338031c4	Switch linuxulator to use the native 1:1 threads. The reasons: 1. Get rid of the stubs/quirks with process dethreading, process reparent when the process group leader exits and close to this problems on wait(), waitpid(), etc. 2. Reuse our kernel code instead of writing excessive thread managment routines in Linuxulator. Implementation details: 1. The thread is created via kern_thr_new() in the clone() call with the CLONE_THREAD parameter. Thus, everything else is a process. 2. The test that the process has a threads is done via P_HADTHREADS bit p_flag of struct proc. 3. Per thread emulator state data structure is now located in the struct thread and freed in the thread_dtor() hook. Mandatory holdig of the p_mtx required when referencing emuldata from the other threads. 4. PID mangling has changed. Now Linux pid is the native tid and Linux tgid is the native pid, with the exception of the first thread in the process where tid and pid are one and the same. Ugliness: In case when the Linux thread is the initial thread in the thread group thread id is equal to the process id. Glibc depends on this magic (assert in pthread_getattr_np.c). So for system calls that take thread id as a parameter we should use the special method to reference struct thread. Differential Revision: https://reviews.freebsd.org/D1039	2015-05-24 14:53:16 +00:00
Dmitry Chagin	2003907d45	Implement a Linux version of sched_getparam() && sched_setparam(). Temporarily use the first thread in proc. Differential Revision: https://reviews.freebsd.org/D1036 Reviewed by: trasz	2015-05-24 14:45:57 +00:00
Dmitry Chagin	1aa90eca33	In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator). Linuxulator temporarily uses first thread in proc. Move linux_sched_rr_get_interval() to the MI part. Differential Revision: https://reviews.freebsd.org/D1032 Reviewed by: trasz	2015-05-24 14:39:26 +00:00
Mateusz Guzik	daf63fd2f9	cred: add proc_set_cred helper The goal here is to provide one place altering process credentials. This eases debugging and opens up posibilities to do additional work when such an action is performed.	2015-03-16 00:10:03 +00:00
Konstantin Belousov	5c7bebf961	The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-26 14:10:00 +00:00
Konstantin Belousov	6e646651d3	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 18:01:51 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
John Baldwin	edb572a38c	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)	2013-09-09 18:11:59 +00:00
Jeff Roberson	5df87b21d3	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
Konstantin Belousov	140dedb81c	The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks	2012-11-02 13:56:36 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Konstantin Belousov	877d24ac8a	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks	2012-09-28 11:25:02 +00:00
Alexander Leidinger	19e252baeb	- >500 static DTrace probes for the linuxulator - DTrace scripts to check for errors, performance, ... they serve mostly as examples of what you can do with the static probe;s with moderate load the scripts may be overwhelmed, excessive lock-tracing may influence program behavior (see the last design decission) Design decissions: - use "linuxulator" as the provider for the native bitsize; add the bitsize for the non-native emulation (e.g. "linuxuator32" on amd64) - Add probes only for locks which are acquired in one function and released in another function. Locks which are aquired and released in the same function should be easy to pair in the code, inter-function locking is more easy to verify in DTrace. - Probes for locks should be fired after locking and before releasing to prevent races (to provide data/function stability in DTrace, see the man-page of "dtrace -v ..." and the corresponding DTrace docs).	2012-05-05 19:42:38 +00:00
Konstantin Belousov	3494f31ad2	Fix misuse of the kernel map in miscellaneous image activators. Vnode-backed mappings cannot be put into the kernel map, since it is a system map. Use exec_map for transient mappings, and remove the mappings with kmem_free_wakeup() to notify the waiters on available map space. Do not map the whole executable into KVA at all to copy it out into usermode. Directly use vn_rdwr() for the case of not page aligned binary. There is one place left where the potentially unbounded amount of data is mapped into exec_map, namely, in the COFF image activator enumeration of the needed shared libraries. Reviewed by: alc MFC after: 2 weeks	2012-02-17 23:47:16 +00:00
Ulrich Spörlein	9a14aa017b	Convert files to UTF-8	2012-01-15 13:23:18 +00:00

1 2 3 4 5 ...

300 Commits