freebsd-skq

Author	SHA1	Message	Date
dchagin	73fcf6f585	Add EPOLLERR flag handling to epoll. Tested by: abi at abinet dot ru	2015-05-24 17:42:45 +00:00
dchagin	65a010b8d9	As fo_fill_kinfo() does not check fo_fill_kinfo to NULL add a fo_fill_kinfo op to eventfdops. Reported by: trinity	2015-05-24 17:40:14 +00:00
dchagin	a346bc7dc8	Add preliminary fallocate system call implementation to emulate posix_fallocate() function. Differential Revision: https://reviews.freebsd.org/D1523 Reviewed by: emaste	2015-05-24 17:33:21 +00:00
dchagin	34bea3bccb	Delete the duplicate of linux_to_native_clockid() function. Differential Revision: https://reviews.freebsd.org/D1521 Reviewed by: trasz	2015-05-24 17:30:31 +00:00
dchagin	0a71d3c7fa	Do not use struct l_timespec without conversion. While here move args->timeout handling before acquiring the futex key at FUTEX_WAIT path. Differential Revision: https://reviews.freebsd.org/D1520 Reviewed by: trasz	2015-05-24 17:29:18 +00:00
dchagin	a97ef012ca	Add prototypes for static futex functions. Differential Revision: https://reviews.freebsd.org/D1519 Reviewed by: trasz	2015-05-24 17:27:59 +00:00
dchagin	12e0f2e21b	As for now our tmpfs is no longer being considered "highly experimental" remove /dev/shm magic commited in r218497 and convert tmpfs type to an expected magic number. Differential Revision: https://reviews.freebsd.org/D1497 Reviewed by: emaste, trasz	2015-05-24 17:26:58 +00:00
dchagin	5a13ec4c64	Print out unsupported futex operation message only once for the process. Differential Revision: https://reviews.freebsd.org/D1498	2015-05-24 17:25:57 +00:00
dchagin	8cc36a7686	Add some clock mappings used in glibc 2.20. Differential Revision: https://reviews.freebsd.org/D1465 Reviewd by: trasz	2015-05-24 17:23:08 +00:00
dchagin	e05fc818bd	Improve ktr(9) records in thread managment code. Differential Revision: https://reviews.freebsd.org/D1464 Reviewed by: trasz	2015-05-24 17:09:07 +00:00
dchagin	b71a54d0db	Use local struct proc * varable instead of dereferencing td->td_proc. Differential Revision: https://reviews.freebsd.org/D1463 Reviewed by: emaste	2015-05-24 17:08:25 +00:00
dchagin	ce8b8e15a8	Avoid unnecessary em zeroing in non-exec path as it already zeroed by malloc with M_ZERO flag and move zeroing to the proper place in exec path. Differential Revision: https://reviews.freebsd.org/D1462 Reviewed by: trasz	2015-05-24 17:07:10 +00:00
dchagin	b63988daf7	Remove the unnecessary cast. Differential Revision: https://reviews.freebsd.org/D1461 Reviewed by: emaste	2015-05-24 17:05:59 +00:00
dchagin	b37f23e513	Implement ppoll() system call. Differential Revision: https://reviews.freebsd.org/D1105 Reviewed by: trasz	2015-05-24 16:59:25 +00:00
dchagin	0e434375e1	td_sigmask of a newly created thread copied from td. Remove excess initialization of td_sigmask. Differential Revision: https://reviews.freebsd.org/D1128 Reviewed by: emaste	2015-05-24 16:56:32 +00:00
dchagin	b8a4106305	Update Linux compat revision to 32. Differential Revision: https://reviews.freebsd.org/D1122 Reviewed by: emaste	2015-05-24 16:55:32 +00:00
dchagin	c88c933164	Fix linux_common module build with KTR option. Differential Revision: https://reviews.freebsd.org/D1096 Reviewed by: trasz	2015-05-24 16:52:45 +00:00
dchagin	d7e47c502a	Implement eventfd system call. Differential Revision: https://reviews.freebsd.org/D1094 In collaboration with: Jilles Tjoelker	2015-05-24 16:49:14 +00:00
dchagin	2336f8bb01	Put the correct value for the abi_nfdbits parameter of kern_select() for all supported Linuxulators. Differential Revision: https://reviews.freebsd.org/D1093 Reviewed by: trasz	2015-05-24 16:47:13 +00:00
dchagin	5f069937a3	Implement epoll family system calls. This is a tiny wrapper around kqueue() to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data, so we keep user data in the proc emuldata. Initial patch developed by rdivacky@ in 2007, then extended by Yuri Victorovich @ r255672 and finished by me in collaboration with mjg@ and jillies@. Differential Revision: https://reviews.freebsd.org/D1092	2015-05-24 16:41:39 +00:00
dchagin	afc381ebf7	Implement F_DUPFD_CLOEXEC fcntl flag. Differential Revision: https://reviews.freebsd.org/D1089 Reviewed by: trasz	2015-05-24 16:34:57 +00:00
dchagin	c7d9312243	Add several fcntl flags. Differential Revision: https://reviews.freebsd.org/D1088 Reviewed by: trasz	2015-05-24 16:32:52 +00:00
dchagin	db8a000521	To avoid code duplication move open/fcntl definitions to the MI header file. Differential Revision: https://reviews.freebsd.org/D1087 Reviewed by: trasz	2015-05-24 16:31:44 +00:00
dchagin	d1ecbe4998	Use the BSD_TO_LINUX_SIGNAL() wherever there is no need to check the ABI as it is known. Differential Revision: https://reviews.freebsd.org/D1086	2015-05-24 16:30:23 +00:00
dchagin	c04276b109	Convert Linux wait options to the FreeBSD. Check wait options as a Linux do. Linux always set WEXITED option not a WUNTRACED\|WNOHANG which is a strange bug. Differential Revision: https://reviews.freebsd.org/D1085 Reviewed by: trasz	2015-05-24 16:28:58 +00:00
dchagin	7ee506a86b	Set WIFCONTINUED to the wait status if needed. Differential Revision: https://reviews.freebsd.org/D1083 Reviewed by: trasz	2015-05-24 16:27:38 +00:00
dchagin	c6387d07c9	Rewrite linux_recvfrom. To avoid double conversion of sockaddr use kern_recvit() directly. And check fromlen parameter before sockaddr copyin and conversion. Differential Revision: https://reviews.freebsd.org/D1082	2015-05-24 16:26:55 +00:00
dchagin	eb881eec7e	Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory. Differential Revision: https://reviews.freebsd.org/D1080	2015-05-24 16:24:24 +00:00
dchagin	bb042fb0da	Change linux faccessat syscall definition to match actual linux one. The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented within the glibc wrapper function for faccessat(). If either of these flags are specified, then the wrapper function employs fstatat() to determine access permissions. Differential Revision: https://reviews.freebsd.org/D1078 Reviewed by: trasz	2015-05-24 16:18:03 +00:00
dchagin	f5eca4c957	Where possible we will use M_LINUX malloc(9) type. Move M_FUTEX defines to the linux_common.ko. Differential Revision: https://reviews.freebsd.org/D1077 Reviewed by: emaste	2015-05-24 16:14:41 +00:00
dchagin	5b7bd42ffe	Move FEATURE macros for v4l and v4l2 to the common module. Differential Revision: https://reviews.freebsd.org/D1075 Reviewed by: emaste	2015-05-24 16:00:01 +00:00
dchagin	f18b3d51fa	Refund the proc emuldata struct for future use. For now move flags from thread emuldata to proc emuldata as it was originally intended. As we can have both 64 & 32 bit Linuxulator running any eventhandler can be called twice for us. To prevent this move eventhandlers code from linux_emul.c to the linux_common.ko module. Differential Revision: https://reviews.freebsd.org/D1073	2015-05-24 15:54:58 +00:00
dchagin	b08f3f43f9	Introduce a new module linux_common.ko which is intended for the following primary purposes: 1. Remove the dependency of linsysfs and linprocfs modules from linux.ko, which will be architecture specific on amd64. 2. Incorporate into linux_common.ko general code for platforms on which we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit). 3. Move malloc(9) declaration to linux_common.ko, to enable getting memory usage statistics properly. Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko. Temporarily remove dtrace garbage from linux_mib.c and linux_util.c Differential Revision: https://reviews.freebsd.org/D1072 In collaboration with: Vassilis Laganakos. Reviewed by: trasz	2015-05-24 15:51:18 +00:00
dchagin	9e320cb48d	Add newfstatat system call for 64-bit Linuxulator. Differential Revision: https://reviews.freebsd.org/D1071 Reviewed by: trasz	2015-05-24 15:48:34 +00:00
dchagin	c714299c49	Fix compilation with -DDEBUG option. Differential Revision: https://reviews.freebsd.org/D1070 Reviewed by: trasz	2015-05-24 15:47:15 +00:00
dchagin	4dc96a6552	Add 64 bit support to the vdso. Differential Revision: https://reviews.freebsd.org/D1069 Reviewed by: trasz	2015-05-24 15:45:36 +00:00
dchagin	dc4523e6a4	x86_64 Linux do not use multiplexing on ipc system calls. Move struct ipc_perm definition to the MD path as it differs for 64 and 32 bit platform. Differential Revision: https://reviews.freebsd.org/D1068 Reviewed by: trasz	2015-05-24 15:44:41 +00:00
dchagin	a92a30f54d	Disable i386 call for x86-64 Linux. Differential Revision: https://reviews.freebsd.org/D1067 Reviewed by: trasz	2015-05-24 15:43:53 +00:00
dchagin	49e67f3383	Print out proper procmap entry for 64 bit binaries. Differential Revision: https://reviews.freebsd.org/D1066 Reviewed by: trasz	2015-05-24 15:42:36 +00:00
dchagin	c0ca16d4f0	64-bit paltforms, like x86_64, do not use multiplexing on socketcall system calls. Differential Revision: https://reviews.freebsd.org/D1065 Reviewed by: trasz	2015-05-24 15:41:27 +00:00
dchagin	8a707fd4ea	Get ready to commit x86_64 Linux emulation. All fields of type l_int in struct statfs are defined as l_long on i386 and amd64. Differential Revision: https://reviews.freebsd.org/D1064 Reviewed by: trasz	2015-05-24 15:39:08 +00:00
dchagin	4178f554e5	Put linux_platform into the vdso to avoid copying it onto the stack at every exec. Differential Revision: https://reviews.freebsd.org/D1062 Reviewed by: trasz	2015-05-24 15:30:52 +00:00
dchagin	cd614289ee	Implement vdso - virtual dynamic shared object. Through vdso Linux exposes functions from kernel with proper DWARF CFI information so that it becomes easier to unwind through them. Using vdso is a mandatory for a thread cancelation && cleanup on a modern glibc. Differential Revision: https://reviews.freebsd.org/D1060	2015-05-24 15:28:17 +00:00
dchagin	ed4f44fbe7	Implement pselect6() system call. Differential Revision: https://reviews.freebsd.org/D1051 Reviewed by: trasz	2015-05-24 15:21:25 +00:00
dchagin	98b4e8b812	Implement prlimit64() system call. Differential Revision: https://reviews.freebsd.org/D1050 Reviewed by: emaste, trasz	2015-05-24 15:18:19 +00:00
dchagin	1ec9e6445a	Implement dup3() system call. Differential Revision: https://reviews.freebsd.org/D1049 Reviewed by: emaste	2015-05-24 15:14:51 +00:00
dchagin	0922240f49	Sched_rr_get_interval returns EINVAL in case when the invalid pid specified. This silence the ltp tests. Differential Revision: https://reviews.freebsd.org/D1048 Reviewed by: trasz	2015-05-24 15:13:56 +00:00
dchagin	e29d24e3d8	Implement rt_sigqueueinfo() system call. Differential Revision: https://reviews.freebsd.org/D1047 Reviewed by: trasz	2015-05-24 15:11:32 +00:00
dchagin	912ea57deb	Implement waitid() system call. Differential Revision: https://reviews.freebsd.org/D1046	2015-05-24 15:06:39 +00:00
dchagin	e28b659be1	To reduce code duplication introduce linux_copyout_rusage() method. Use it in linux_wait4() system call and move linux_wait4() to the MI path. While here add a prototype for the static bsd_to_linux_rusage(). Differential Revision: https://reviews.freebsd.org/D2138 Reviewed by: trasz	2015-05-24 15:03:09 +00:00
dchagin	3445f2f9f2	Add a function for converting wait options. Differential Revision: https://reviews.freebsd.org/D1045 Reviewed by: trasz	2015-05-24 15:00:27 +00:00
dchagin	88e62ac1d2	Add a siginfo_t conversion function. Differential Revision: https://reviews.freebsd.org/D1044 Reviewed by: emaste, trasz	2015-05-24 14:58:30 +00:00
dchagin	1b7da2f539	Remove a now unused define. Differential Revision: https://reviews.freebsd.org/D1043 Reviewed by: trasz	2015-05-24 14:57:39 +00:00
dchagin	f446253cf0	Introduce LINUX_VERSION_STR, LINUX_VERSION_CODE macro for use instead of harcoded pr_osrelease, pr_osrel values. This will be used later in the VDSO. Differential Revision: https://reviews.freebsd.org/D1042 Reviewed by: trasz	2015-05-24 14:56:21 +00:00
dchagin	e2a8386972	pthread_join() caller do futex_wait on child_clear_tid. As a results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined wake up the one thread. Differential Revision: https://reviews.freebsd.org/D1040	2015-05-24 14:54:12 +00:00
dchagin	50155e11cc	Switch linuxulator to use the native 1:1 threads. The reasons: 1. Get rid of the stubs/quirks with process dethreading, process reparent when the process group leader exits and close to this problems on wait(), waitpid(), etc. 2. Reuse our kernel code instead of writing excessive thread managment routines in Linuxulator. Implementation details: 1. The thread is created via kern_thr_new() in the clone() call with the CLONE_THREAD parameter. Thus, everything else is a process. 2. The test that the process has a threads is done via P_HADTHREADS bit p_flag of struct proc. 3. Per thread emulator state data structure is now located in the struct thread and freed in the thread_dtor() hook. Mandatory holdig of the p_mtx required when referencing emuldata from the other threads. 4. PID mangling has changed. Now Linux pid is the native tid and Linux tgid is the native pid, with the exception of the first thread in the process where tid and pid are one and the same. Ugliness: In case when the Linux thread is the initial thread in the thread group thread id is equal to the process id. Glibc depends on this magic (assert in pthread_getattr_np.c). So for system calls that take thread id as a parameter we should use the special method to reference struct thread. Differential Revision: https://reviews.freebsd.org/D1039	2015-05-24 14:53:16 +00:00
dchagin	ca0fda4077	In preparation for switching linuxulator to the use the native 1:1 threads add a hook for cleaning thread resources before the thread die. Differential Revision: https://reviews.freebsd.org/D1038	2015-05-24 14:51:29 +00:00
dchagin	77ba567613	Implement a Linux version of sched_getparam() && sched_setparam(). Temporarily use the first thread in proc. Differential Revision: https://reviews.freebsd.org/D1036 Reviewed by: trasz	2015-05-24 14:45:57 +00:00
dchagin	ca1941958a	In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator). Linuxulator temporarily uses first thread in proc. Move linux_sched_rr_get_interval() to the MI part. Differential Revision: https://reviews.freebsd.org/D1032 Reviewed by: trasz	2015-05-24 14:39:26 +00:00
dchagin	cd30334c97	In preparation for switching linuxulator to the use the native 1:1 threads introduce linux_exit() stub instead of sys_exit() call (which terminates process). In the new linuxulator exit() system call terminates the calling thread (not a whole process). Differential Revision: https://reviews.freebsd.org/D1027 Reviewed by: trasz	2015-05-24 14:33:19 +00:00
jkim	318c4f97e6	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
kib	f371322983	On exec, single-threading must be enforced before arguments space is allocated from exec_map. If many threads try to perform execve(2) in parallel, the exec map is exhausted and some threads sleep uninterruptible waiting for the map space. Then, the thread which won the race for the space allocation, cannot single-thread the process, causing deadlock. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-10 09:00:40 +00:00
peter	ec163c3dc9	Fix an error in r281551, part of the getfsstat() / kern_getfsstat() rework. The number of entries was supposed to be returned to the user, not used as a scratch variable. This broke RELENG_4 jails starting up on current systems.	2015-05-05 05:14:12 +00:00
trasz	9b3d9f0645	Simplify linux_getcwd(), removing code that was longer used. Differential Revision: https://reviews.freebsd.org/D2326 Reviewed by: dchagin@, kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-23 08:41:50 +00:00
trasz	598f70c8b4	Modify kern___getcwd() to take max pathlen limit as an additional argument. This will be used for the Linux emulation layer - for Linux, PATH_MAX is 4096 and not 1024. Differential Revision: https://reviews.freebsd.org/D2335 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-21 13:55:24 +00:00
trasz	de45bc6d3c	Add back fdrop() missed in r281726. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-19 07:35:18 +00:00
trasz	babf640f62	Optimize the O_NOCTTY handling hack in linux_common_open(). Differential Revision: https://reviews.freebsd.org/D2323 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-19 07:12:16 +00:00
trasz	befe1c29cd	Remove unused code from linux_mount(), and make it possible to mount any kind of filesystem instead of harcoded three. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-18 09:49:09 +00:00
trasz	009c656eaf	Rewrite linprocfs_domtab() as a wrapper around kern_getfsstat(). This adds missing jail and MAC checks. Differential Revision: https://reviews.freebsd.org/D2193 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-15 09:13:11 +00:00
mjg	22da590f11	fd: remove filedesc argument from fdclose Just accept a thread instead. This makes it consistent with fdalloc. No functional changes.	2015-04-11 15:40:28 +00:00
jhb	148355cbb6	Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h> and export them to userland. - Define __HAVE_REG32 on platforms that define a reg32 structure and check for this in <sys/procfs.h> to control when to export prstatus32, etc. - Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures. libbfd looks for these types, and having them fixes 'gcore' in gdb of a 32-bit process on a 64-bit platform. - Use the structure definitions from <sys/procfs.h> in gcore's elf32 core dump code instead of duplicating the definitions. Differential Revision: https://reviews.freebsd.org/D2142 Reviewed by: kib, nathanw (powerpc bits) MFC after: 1 week	2015-04-08 16:30:45 +00:00
trasz	48a3f6fc28	Remove unused code. Differential Revision: https://reviews.freebsd.org/D2195 Reviewed by: kib@, imp@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-02 10:19:24 +00:00
mjg	054f9cab59	cred: add proc_set_cred helper The goal here is to provide one place altering process credentials. This eases debugging and opens up posibilities to do additional work when such an action is performed.	2015-03-16 00:10:03 +00:00
jilles	6ad32c1c79	Run make sysent.	2015-01-23 21:08:24 +00:00
jilles	67db24d0f2	Add futimens and utimensat system calls. The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support. A new UTIME_* constant might allow setting birthtimes in future. Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes	2015-01-23 21:07:08 +00:00
kib	aa0ac99391	Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger attachment to the process. Note that the command is not intended to be a security measure, rather it is an obfuscation feature, implemented for parity with other operating systems. Discussed with: jilles, rwatson Man page fixes by: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-18 15:13:11 +00:00
kib	631f2ce1dd	fcntl F_O{GET,SET}LK take pointer as the arg, handle them properly for compat32. Reported and tested by: Alex Tutubalin <lexa@lexa.ru> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-15 10:43:58 +00:00
dchagin	6996603344	Regen for r276654 (__getcwd()).	2015-01-04 10:40:23 +00:00
dchagin	e777b160fa	Indeed, instead of hiding the kern___getcwd() bug by bogus cast in r276564, change path type to char * (pathnames are always char ). And remove bogus casts of malloc(). kern___getcwd() internally doesn't actually use or support u_char paths, except to copy them to a normal char * path. These changes are not visible to libc as libc/gen/getcwd.c misdeclares __getcwd() as taking a plain char * path. While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as we always have sysproto.h. Pointed out by: bde MFC after: 1 week	2015-01-04 10:34:02 +00:00
dchagin	09e6c22e46	Cast *path to silence clang -Wpointer-sign warning. MFC after: 1 week	2015-01-02 19:29:32 +00:00
dchagin	49eaa137e5	Remove Giant from linux_getcwd() due to VFS is MPSAFE now. Discussed with: kib MFC after: 1 week	2015-01-02 18:36:08 +00:00
dchagin	236e47c874	Fix Clang -Wpointer-sign warnings. MFC after: 1 week	2015-01-01 20:53:38 +00:00
dchagin	2e64d69349	Fix Clang warning: passing 'unsigned int ' to parameter of type 'int ' converts between pointers to integer types with different sign. MFC after: 1 week	2015-01-01 19:57:24 +00:00
gleb	5c99f46b3b	Adjust printf format specifiers for dev_t and ino_t in kernel. ino_t and dev_t are about to become uint64_t. Reviewed by: kib, mckusick	2014-12-17 07:27:19 +00:00
kib	c014fd46ec	Add a facility for non-init process to declare itself the reaper of the orphaned descendants. Base of the API is modelled after the same feature from the DragonFlyBSD. Requested by: bapt Reviewed by: jilles (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-12-15 12:01:42 +00:00
kib	11cee2ecf7	The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-26 14:10:00 +00:00
jhb	1671ac9155	Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0). Differential Revision: https://reviews.freebsd.org/D1193 Reviewed by: kib MFC after: 2 weeks	2014-11-21 20:53:17 +00:00
kib	b4ef709604	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 18:01:51 +00:00
dchagin	c0a51053a4	Regen for r274462.	2014-11-13 05:28:06 +00:00
dchagin	162012051b	Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month	2014-11-13 05:26:14 +00:00
glebius	b8af75c693	Fix build.	2014-11-11 22:08:18 +00:00
glebius	53273c84d0	Remove SF_KQUEUE code. This code was developed at Netflix, but was not ever used. It didn't go into stable/10, neither was documented. It might be useful, but we collectively decided to remove it, rather leave it abandoned and unmaintained. It is removed in one single commit, so restoring it should be easy, if anyone wants to reopen this idea. Sponsored by: Netflix	2014-11-11 20:32:46 +00:00
imp	162751703e	These don't belong in the modules directory.	2014-11-06 16:52:51 +00:00
kib	ad7bf17db7	Replace some calls to fuword() by fueword() with proper error checking. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:28:20 +00:00
mjg	a9faac8f4b	Avoid dynamic syscall overhead for statically compiled modules. The kernel tracks syscall users so that modules can safely unregister them. But if the module is not unloadable or was compiled into the kernel, there is no need to do this. Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC during kernel build and 0 otherwise. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-26 19:42:44 +00:00
hselasky	49c137f7be	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
adrian	9b44fe556b	Update the ULE scheduler + thread and kinfo structs to use int for cpuid rather than u_char. To try and play nice with the ABI, the u_char CPU ID values are clamped at 254. The new fields now contain the full CPU ID, or -1 for no cpu. Differential Revision: D955 Reviewed by: jhb, kib Sponsored by: Norse Corp, Inc.	2014-10-18 19:36:11 +00:00
marcel	005c9e3ebe	Regenerate after r272823: Move the SCTP syscalls to netinet with the rest of the SCTP code. Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:19:35 +00:00
marcel	42d9d5479e	Move the SCTP syscalls to netinet with the rest of the SCTP code. The syscalls themselves are tightly coupled with the network stack and therefore should not be in the generic socket code. The following four syscalls have been marked as NOSTD so they can be dynamically registered in sctp_syscalls_init() function: sys_sctp_peeloff sys_sctp_generic_sendmsg sys_sctp_generic_sendmsg_iov sys_sctp_generic_recvmsg The syscalls are also set up to be dynamically registered when COMPAT32 option is configured. As a side effect of moving the SCTP syscalls, getsock_cap needs to be made available outside of the uipc_syscalls.c source file. A proper prototype has been added to the sys/socketvar.h header file. API tests from the SCTP reference implementation have been run to ensure compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout) Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:16:52 +00:00
kib	d972eee1e7	Fix fcntl(2) compat32 after r270691. The copyin and copyout of the struct flock are done in the sys_fcntl(), which mean that compat32 used direct access to userland pointers. Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which performs neccessary userland memory accesses, and use it from both native and compat32 fcntl syscalls. Reported by: jhibbits Sponsored by: The FreeBSD Foundation MFC after: 3 days	2014-09-25 21:07:19 +00:00
mav	1d2330e15a	Remake Linux' SOUND_MIXER_INFO IOCTL as a wrapper around new FreeBSD's one. Submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> MFC after: 3 days	2014-09-24 08:18:11 +00:00
sbruno	c738e0d253	Bump minimum linux compat version to support Centos6 ports updates for linux. Update linux compat minimum revision to match linux-c6 now in ports. This is a candidate for 10.1 R as it matches the current state of supported linux compat packages in the ports tree. PR: 187786 Reviewed by: xmj MFC after: 2 days Relnotes: yes	2014-09-22 17:26:07 +00:00
glebius	94b048a2d0	Fix build on 32-bit machines. Pointy hat to: glebius	2014-09-18 20:29:17 +00:00
glebius	42586cabb8	- Use if_get_counter() to fetch ifnet statistics. - Report IFCOUNTER_OQDROPS to linprocfs. Wasn't there before. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-18 16:44:28 +00:00
bz	122003e2ff	Implement most of timer_{create,settime,gettime,getoverrun,delete} for amd64/linux32. Fix the entirely bogus (untested) version from r161310 for i386/linux using the same shared code in compat/linux. It is unclear to me if we could support more clock mappings but the current set allows me to successfully run commercial 32bit linux software under linuxolator on amd64. Reviewed by: jhb Differential Revision: D784 MFC after: 3 days Sponsored by: DARPA, AFRL	2014-09-18 08:36:45 +00:00
mjg	4cf719a9ee	Add missing proctree locking to fill_kinfo_proc consumers. This fixes r270444. Pointy hat: mjg Reported by: many MFC after: 1 week	2014-08-30 03:10:55 +00:00
mjg	ec92f2e61c	Return real parent pid in kinfo (used by e.g. ps) Add a separate field which exports tracer pid and add a new keyword ("tracer") for ps to display it. This is a follow up to r270444. Reviewed by: kib MFC after: 1 week Relnotes: yes	2014-08-28 08:41:11 +00:00
kib	1e11271b7e	Regen.	2014-08-27 01:02:19 +00:00
kib	75f74b437b	Fix handling of the third argument for fcntl(2). The native syscall uses long for arg, which needs translation. Discussed with and tested by: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-27 01:02:02 +00:00
glebius	0236597739	All mbuf external free functions never fail, so let them be void. Sponsored by: Nginx, Inc.	2014-07-11 13:58:48 +00:00
marcel	9f28abd980	Remove ia64. This includes: o All directories named ia64 o All files named ia64 o All ia64-specific code guarded by __ia64__ o All ia64-specific makefile logic o Mention of ia64 in comments and documentation This excludes: o Everything under contrib/ o Everything under crypto/ o sys/xen/interface o sys/sys/elf_common.h Discussed at: BSDcan	2014-07-07 00:27:09 +00:00
hselasky	35b126e324	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
gjb	fc21f40567	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
hselasky	bd1ed65f0f	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
mav	797781309b	- Add support for SG_GET_SG_TABLESIZE IOCTL to report that we don't support scatter/gather lists. - Return error for still unsupported SG 3.x API read/write calls. MFC after: 1 month	2014-06-04 12:05:47 +00:00
mav	cb97f6171b	Overhaul CAM SG driver IOCTL interfaces. Make it really work for native FreeBSD programs. Before this it was broken for years due to different number of pointer dereferences in Linux and FreeBSD IOCTL paths, permanently returning errors to FreeBSD programs. This change breaks the driver FreeBSD IOCTL ABI, making it more strict, but since it was not working any way -- who bother. Add shims for 32-bit programs on 64-bit host, translating the argument of the SG_IO IOCTL for both FreeBSD and Linux ABIs. With this change I was able to run 32-bit Linux sg3_utils tools and simple 32 and 64-bit FreeBSD test tools on both 32 and 64-bit FreeBSD systems. MFC after: 1 month	2014-06-02 19:53:53 +00:00
dchagin	f38753151f	Glibc was switched to the FUTEX_WAIT_BITSET op and CLOCK_REALTIME flag has been added instead of FUTEX_WAIT to replace the FUTEX_WAIT logic which needs to do gettimeofday() calls before the futex syscall to convert the absolute timeout to a relative timeout. Before this the CLOCK_MONOTONIC used by the FUTEX_WAIT_BITSET op. When the FUTEX_CLOCK_REALTIME is specified the timeout is an absolute time, not a relative time. Rework futex_wait to handle this. On the side fix the futex leak in error case and remove useless parentheses. Properly calculate the timeout for the CLOCK_MONOTONIC case. MFC after: 3 days	2014-05-31 14:58:53 +00:00
dchagin	1e378c6cc6	In r218101 I have not changed properly the futex syscall definition. Some Linux futex ops atomically verifies that the futex address uaddr (uval) contains the value val. Comparing signed uval and unsigned val may lead to an unexpected result, mostly to a deadlock. So copyin uaddr to an unsigned int to compare the parameters correctly. While here change ktr records to print parameters in more readable format. Tested by eadler@ MFC after: 3 days	2014-05-28 05:57:35 +00:00
marcel	60f16a83d9	In freebsd32_sendmsg(), replace the call to sockargs() followed by a call to freebsd32_convert_msg_in() with freebsd32_copyin_control() to readin and convert in a single step. This makes it simpler to put all the control messages in a single mbuf or mbuf cluster as per the limitations imposed upon us by ip6_setpktopts(). The logic is as follows: 1. Go over the array of control messages to determine overall size and include extra padding for proper alignment as we go. 2. Get a mbuf or mbuf cluster as needed or fail if the overall (adjusted) size is larger than a cluster. 3. Go over the array of control messages again, but now copy them into kernel space and into aligned offsets. 4. Update the length of the control message to take padding between the header and the data into account (but not for padding added between one control message and the next). Obtained from: Juniper Networks, Inc. MFC after: 1 week	2014-04-05 18:56:01 +00:00
imp	eebc91c3f0	Remove instances of variables that were set, but never used. gcc 4.9 warns about these by default.	2014-03-30 23:43:36 +00:00
bdrewery	6fcf6199a4	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
kib	b236080eb1	Make the array pointed to by AT_PAGESIZES auxv properly aligned. Also, remove the expression which calculated the location of the strings for a new image and grown over the time to be non-comprehensible. Instead, calculate the offsets by steps, which also makes fixing the alignments much cleaner. Reported and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-03-19 12:35:04 +00:00
attilio	f931c33558	Regen per r263318. Sponsored by: EMC / Isilon storage division	2014-03-18 21:34:11 +00:00
attilio	25d02685fb	Remove dead code from umtx support: - Retire long time unused (basically always unused) sys__umtx_lock() and sys__umtx_unlock() syscalls - struct umtx and their supporting definitions - UMUTEX_ERROR_CHECK flag - Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall __FreeBSD_version is not bumped yet because it is expected that further breakages to the umtx interface will follow up in the next days. However there will be a final bump when necessary. Sponsored by: EMC / Isilon storage division Reviewed by: jhb	2014-03-18 21:32:03 +00:00
emaste	dfd2dcdc01	Update NetBSD Foundation copyrights to 2-clause BSD The NetBSD Foundation states "Third parties are encouraged to change the license on any files which have a 4-clause license contributed to the NetBSD Foundation to a 2-clause license." This change removes clauses 3 and 4 from copyright / license blocks that list The NetBSD Foundation as the only copyright holder. Sponsored by: The FreeBSD Foundation	2014-03-18 01:40:25 +00:00
rwatson	33fdc14c0c	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
jmg	b66f059b49	change td_retval into a union w/ off_t, with defines to mask the change... This eliminates a cast, and also forces td_retval (often 2 32-bit registers) to be aligned so that off_t's can be stored there on arches with strict alignment requirements like armeb (AVILA)... On i386, this doesn't change alignment, and on amd64 it doesn't either, as register_t is already 64bits... This will also prevent future breakage due to people adding additional fields to the struct... This gets AVILA booting a bit farther... Reviewed by: bde	2014-03-16 00:53:40 +00:00
glebius	b38edcd355	Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit interface, in the r241616 a crutch was provided. It didn't work well, and finally we decided that it is time to break ABI and simply make if_baudrate a 64-bit value. Meanwhile, the entire struct if_data was reviewed. o Remove the if_baudrate_pf crutch. o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second. This also removes quite a lot of COMPAT_FREEBSD32 code. o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes. o Give 32 bits to ifi_mtu and ifi_metric. o Give 64 bits to the rest of fields, since they are counters. __FreeBSD_version bumped. Discussed with: emax Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-13 03:42:24 +00:00
eadler	ea79ffcf4b	linprocfs: add support for /sys/kernel/random/uuid PR: kern/186187 Submitted by: Fernando <fernando.apesteguia@gmail.com> MFC After: 2 weeks	2014-02-27 00:43:10 +00:00
kib	f0cb8e7d88	The posix_madvise(3) and posix_fadvise(2) should return error on failure, same as posix_fallocate(2). Noted by: Bob Bishop <rb@gid.co.uk> Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-01-30 18:04:39 +00:00
kib	05b9ae7031	The posix_fallocate(2) syscall should return error number on error, without modifying errno. Reported and tested by: Gennady Proskurin <gpr@mail.ru> Reviewed by: mdf PR: standards/186028 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-01-23 17:24:26 +00:00
adrian	c46f73c7ae	Implement a kqueue notification path for sendfile. This fires off a kqueue note (of type sendfile) to the configured kqfd when the sendfile transaction has completed and the relevant memory backing the transaction is no longer in use by this transaction. This is analogous to SF_SYNC waiting for the mbufs to complete - except now you don't have to wait. Both SF_SYNC and SF_KQUEUE should work together, even if it doesn't necessarily make any practical sense. This is designed for use by applications which use backing cache/store files (eg Varnish) or POSIX shared memory (not sure anything is using it yet!) to know when a region of memory is free for re-use. Note it doesn't mark the region as free overall - only free from this transaction. The application developer still needs to track which ranges are in the process of being recycled and wait until all pending transactions are completed. TODO: * documentation, as always Sponsored by: Netflix, Inc.	2014-01-17 05:26:55 +00:00
adrian	19f7055283	Refactor out the common sendfile code from the do_sendfile() and the compat32 sendfile syscall. Sponsored by: Netflix, Inc.	2014-01-09 00:11:14 +00:00
adrian	86274dd213	Migrate the sendfile_sync structure into a public(ish) API in preparation for extending and reusing it. The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper, used to indicate that the backing store for a group of mbufs has completed. It's only being used by sendfile for now and it's only implementing a sleep/wakeup rendezvous. However, there are other potential signaling paths (kqueue) and other potential uses (socket zero-copy write) where the same mechanism would also be useful. So, with that in mind: * extract the sendfile_sync code out into sf_sync_() methods teach the sf_sync_alloc method about the current config flag - it will eventually know about kqueue. * move the sendfile_sync code out of do_sendfile() - the only thing it now knows about is the sfs pointer. The guts of the sync rendezvous (setup, rendezvous/wait, free) is now done in the syscall wrapper. * .. and teach the 32-bit compat sendfile call the same. This should be a no-op. It's primarily preparation work for teaching the sendfile_sync about kqueue notification. Tested: * Peter Holm's sendfile stress / regression scripts Sponsored by: Netflix, Inc.	2013-12-01 03:53:21 +00:00
peter	ac40be45fb	jail_v0.ip_number was always in host byte order. This was handled in one of the many layers of indirection and shims through stable/7 in jail_handle_ips(). When it was cleaned up and unified through kern_jail() for 8.x, the byte order swap was lost. This only matters for ancient binaries that call jail(2) themselves internally.	2013-11-28 19:40:33 +00:00
kib	0123c14853	Add an kinfo sysctl to retrieve signal trampoline location for the given process. Note that the correctness of the trampoline length returned for ABIs which do not use shared page depends on the correctness of the struct sysvec sv_szsigcodebase member, which will be fixed on as-need basis. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-26 19:47:09 +00:00
avg	71889a5eff	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
adrian	f447b2ef43	Fix the compat32 sendfile() to be in line with my recent changes. Reminded by: kib	2013-11-26 08:32:37 +00:00
attilio	7ee4e910ce	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
glebius	af0db43f1c	Fix build. Pointy hat to: glebius	2013-11-05 19:17:19 +00:00
glebius	cb6df3f35c	Axe IFF_SMART. Fortunately this layering violating flag was never used, it was just declared.	2013-11-05 12:52:56 +00:00
glebius	3b6f8b896c	Drop support for historic ioctls and also undefine them, so that code that checks their presence via ifdef, won't use them. Bump __FreeBSD_version as safety measure.	2013-11-05 10:29:47 +00:00
glebius	9e01f79e97	- Provide necessary includes. - Remove unnecessary includes. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-29 11:17:49 +00:00
glebius	f469ae1d45	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
glebius	2c1ec831c9	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
glebius	ff6e113f1b	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
kib	677c1f8ce9	Add padding to match the compat32 struct stat32 definition to the real struct stat on 32bit architectures. Debugged and tested by: bsam Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-10-04 22:05:23 +00:00
markj	8ecd1f0d70	Fix some typos that were causing probe argument types to show up as unknown. Reviewed by: rwatson (mac provider) Approved by: re (glebius) MFC after: 1 week	2013-10-01 15:40:27 +00:00
markj	d3af58e0a0	Regenerate syscall argument strings after r255777. Approved by: re (gjb) MFC after: 1 week	2013-09-21 23:06:36 +00:00
jhb	c6151e30b1	Regen. Approved by: re (delphij)	2013-09-19 18:56:00 +00:00
jhb	d3ef75b6c7	Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month	2013-09-19 18:53:42 +00:00
rdivacky	fae003a069	Revert r255672, it has some serious flaws, leaking file references etc. Approved by: re (delphij)	2013-09-18 18:48:33 +00:00
rdivacky	d57db3eead	Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file. Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich. Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>	2013-09-18 17:56:04 +00:00
jilles	d252cd0ae3	Regenerate for freebsd32_cap_enter(). Approved by: re (hrs)	2013-09-17 20:49:05 +00:00
jilles	5faad32e2c	Disallow cap_enter() in freebsd32 compatibility mode. The freebsd32 compatibility mode (for running 32-bit binaries on 64-bit kernels) does not currently allow any system calls in capability mode, but still permits cap_enter(). As a result, 32-bit binaries on 64-bit kernels that use capability mode do not work (they crash after being disallowed to call sys_exit()). Affected binaries include dhclient and uniq. The latter's crashes cause obscure build failures. This commit makes freebsd32 cap_enter() fail with [ENOSYS], as if capability mode was not compiled in. Applications deal with this by doing their work without capability mode. This commit does not fix the uncommon situation where a 64-bit process enters capability mode and then executes a 32-bit binary using fexecve(). This commit should be reverted when allowing the necessary freebsd32 system calls in capability mode. Reviewed by: pjd Approved by: re (hrs)	2013-09-17 20:48:19 +00:00
jhb	04bb6e10cd	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)	2013-09-09 18:11:59 +00:00
pjd	d1a65cb7ef	Regenerate after r255219. Sponsored by: The FreeBSD Foundation	2013-09-05 00:11:59 +00:00
pjd	029a6f5d92	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
will	a52b9ca1d3	Add the ability to display the default FIB number for a process to the ps(1) utility, e.g. "ps -O fib". bin/ps/keyword.c: Add the "fib" keyword and default its column name to "FIB". bin/ps/ps.1: Add "fib" as a supported keyword. sys/compat/freebsd32/freebsd32.h: sys/kern/kern_proc.c: sys/sys/user.h: Add the default fib number for a process (p->p_fibnum) to the user land accessible process data of struct kinfo_proc. Submitted by: Oliver Fromme <olli@fromme.com>, gibbs	2013-08-26 23:48:21 +00:00
andre	6c0efad132	Give (*ext_free) an int return value allowing for very sophisticated external mbuf buffer management capabilities in the future. For now only EXT_FREE_OK is defined with current legacy behavior. Sponsored by: The FreeBSD Foundation	2013-08-25 10:57:09 +00:00
pjd	50f2a13249	Regenerate after r254491.	2013-08-18 13:38:39 +00:00
pjd	a1397643f3	The cap_rights_limit(2) system calls needs a wrapper for 32bit binaries running under 64bit kernels as the 'rights' argument has to be split into two registers or the half of the rights will disappear. Reported by: jilles Sponsored by: The FreeBSD Foundation	2013-08-18 13:37:54 +00:00
pjd	999f81be5b	Move the PAIR32TO64() macro and the RETVAL_HI/RETVAL_LO defines to a header file for use by other .c files. Sponsored by: The FreeBSD Foundation	2013-08-18 13:34:11 +00:00
pjd	6e04ef92e4	Regenerate after r254481.	2013-08-18 10:31:30 +00:00
pjd	3014e000ae	Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2) system calls as unsigned longs have different size on i386 and amd64. Reported by: jilles Sponsored by: The FreeBSD Foundation	2013-08-18 10:30:41 +00:00
markj	8e4bf18907	Remove a couple of unused macros. MFC after: 3 days	2013-08-17 21:53:37 +00:00
pjd	8919b4e852	Regenerate after r254447. Sponsored by: The FreeBSD Foundation	2013-08-17 14:18:41 +00:00
pjd	b3f1c95907	Make pdfork(2), pdkill(2) and pdgetpid(2) syscalls available for 32bit binaries running under 64bit kernel. Sponsored by: The FreeBSD Foundation	2013-08-17 14:17:13 +00:00
glebius	722a1a5e5d	Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented. Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix	2013-08-15 07:54:31 +00:00
jeff	de4ecca213	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
kib	445bcc30a7	Regenerate.	2013-07-21 19:44:53 +00:00
kib	a7dacef5ab	Implement compat32 wrappers for the ktimer_* syscalls. Reported, reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:43:52 +00:00
kib	e9d8b81db7	Wrap kmq_notify(2) for compat32 to properly consume struct sigevent32 argument. Reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:40:30 +00:00
kib	5dee91b64a	The freebsd32_lio_listio() compat syscall takes the struct sigevent32. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:36:53 +00:00
kib	97d40396c6	Move the convert_sigevent32() utility function into freebsd32_misc.c for consumption outside the vfs_aio.c. For SIGEV_THREAD_ID and SIGEV_SIGNAL notification delivery methods, also copy in the sigev_value, since librt event pumping loop compares note generation number with the value passed through sigev_value. Tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:33:48 +00:00
kib	c43c8949f9	Cosmetic change, use the same union name on the left and right sides of the conversion. Tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:17:46 +00:00
kib	2287ae0451	Regenerate	2013-07-20 13:40:03 +00:00
kib	82f12b6237	id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2). Reported and tested by: Petr Salinger <Petr.Salinger@seznam.cz> PR: threads/180652 Sponsored by: The FreeBSD Foundation	2013-07-20 13:39:41 +00:00
hselasky	6308c49781	Add some missing LIBUSB IOCTL conversion codes.	2013-07-14 10:13:01 +00:00
netchild	32d31c91c0	- Move videodev headers from compat/linux to contrib/v4l (cp from vendor and apply diff to compat/linux versions). - The cp implies an update of videodev2.h to the linux kernel 2.6.34.14 one. The update makes video in skype v4 work on FreeBSD. Tested by: Artyom Mirgorodskiy <artyom.mirgorodsky@gmail.com> (update of header only)	2013-07-06 19:59:06 +00:00
glebius	85cf0e083f	aio_mlock() added: - Regen for r251526. - Bump __FreeBSD_version.	2013-06-08 13:30:13 +00:00
glebius	9a02f3097d	Add new system call - aio_mlock(). The name speaks for itself. It allows to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4). Reviewed by: kib, jilles Sponsored by: Nginx, Inc.	2013-06-08 13:27:57 +00:00
alc	b4fae70474	Relax the vm object locking. Use a read lock. Sponsored by: EMC / Isilon Storage Division	2013-06-05 17:00:10 +00:00
obrien	0c92a49de4	Add a "kern.features" MIB for 32bit support under a 64bit kernel.	2013-05-31 21:43:17 +00:00
kib	19425ad923	Regenerate.	2013-05-21 11:41:08 +00:00
kib	ad28c68314	Fix the wait6(2) on 32bit architectures and for the compat32, by using the right type for the argument in syscalls.master. Also fix the posix_fallocate(2) and posix_fadvise(2) compat32 syscalls on the architectures which require padding of the 64bit argument. Noted and reviewed by: jhb Pointy hat to: kib MFC after: 1 week	2013-05-21 11:40:16 +00:00
jilles	49a5937b77	Regenerate files for pipe2().	2013-05-01 22:45:04 +00:00
jilles	16772c421d	Add pipe2() system call. The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function. If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p). If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper. Reviewed by: kan, kib	2013-05-01 22:42:42 +00:00
jilles	66a7a3379b	Regenerate files for accept4().	2013-05-01 20:12:58 +00:00
jilles	299afd25fd	Add accept4() system call. The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().) The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code. Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.	2013-05-01 20:10:21 +00:00
mdf	a3d624db5a	Regen. MFC after: 1 week	2013-04-02 05:30:52 +00:00
mdf	da578c6492	Fix return type of extattr_set_* and fix rmextattr(8) utility. extattr_set_{fd,file,link} is logically a write(2)-like operation and should return ssize_t, just like extattr_get_. Also, the user-space utility was using an int for the return value of extattr_get_ and extattr_list_*, both of which return an ssize_t. MFC after: 1 week	2013-04-02 05:30:41 +00:00
jilles	9d8a3c5c3b	Rename do_pipe() to kern_pipe2() and declare it properly.	2013-03-31 17:42:54 +00:00
pjd	f44b21d5e5	Regenerate after r248599. Sponsored by: The FreeBSD Foundation	2013-03-21 23:02:19 +00:00
pjd	635dbe90f2	Implement chflagsat(2) system call, similar to fchmodat(2), but operates on file flags. Reviewed by: kib, jilles Sponsored by: The FreeBSD Foundation	2013-03-21 22:59:01 +00:00
pjd	5fc1bac315	Regenerate after r248597. Sponsored by: The FreeBSD Foundation	2013-03-21 22:47:03 +00:00
pjd	2a3cf7f364	- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type u_long. Before this change it was of type int for syscalls, but prototypes in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not for lchflags(2)) stated that it was u_long. Now some related functions use u_long type for flags (strtofflags(3), fflagstostr(3)). - Make path argument of type 'const char *' for consistency. Discussed on: arch Sponsored by: The FreeBSD Foundation	2013-03-21 22:44:33 +00:00
glebius	b37af62b9e	Use m_get/m_gethdr instead of compat macros. Sponsored by: Nginx, Inc.	2013-03-15 12:55:30 +00:00
attilio	bf1dc90446	MFC	2013-03-08 00:03:07 +00:00
eadler	a0bd41720a	Remove check for NULL prior to free(9) and m_freem(9). Approved by: cperciva (mentor)	2013-03-04 02:21:34 +00:00
pjd	369ed4d4ad	Regen after r247667.	2013-03-02 21:12:54 +00:00
pjd	702516e70b	- Implement two new system calls: int bindat(int fd, int s, const struct sockaddr addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des	2013-03-02 21:11:30 +00:00
attilio	e98f58faf6	MFC	2013-03-02 14:48:41 +00:00
pjd	48e0f13795	Regen after r247602.	2013-03-02 00:55:09 +00:00
pjd	f07ebb8888	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
delphij	b1482c7ae7	Fix wrong assignment. Submitted by: Sascha Wildner <saw online de> Obtained from: DragonFly rev 9568dd07a22a136e380e6c19a8ea188eb92976d5 MFC after: 2 weeks	2013-03-01 23:21:18 +00:00
attilio	15bf891afe	Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to their "write" versions. Sponsored by: EMC / Isilon storage division	2013-02-20 12:03:20 +00:00
jhb	2617d9f095	Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h by moving bits that are MI out into headers in compat/linux. Reviewed by: Chagin Dmitry dmitry \| gmail MFC after: 2 weeks	2013-01-29 18:41:30 +00:00
dchagin	fa34eceef7	Arithmetic on pointers takes into account the size of the type. Properly cast the pointer to avoid incorrect pointer scaling. MFC after: 1 Week	2013-01-25 14:40:54 +00:00
jhb	af17a55dfd	Don't assume that all Linux TCP-level socket options are identical to FreeBSD TCP-level socket options (only the first two are). Instead, using a mapping function and fail unsupported options as we do for other socket option levels. MFC after: 2 weeks	2013-01-23 21:44:48 +00:00
glebius	8e20fa5ae9	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
cperciva	748c98fc62	MFS security patches which seem to have accidentally not reached HEAD: Fix insufficient message length validation for EAP-TLS messages. Fix Linux compatibility layer input validation error. Security: FreeBSD-SA-12:07.hostapd Security: FreeBSD-SA-12:08.linux Security: CVE-2012-4445, CVE-2012-4576 With hat: so@	2012-11-23 01:48:31 +00:00
kib	de90907af2	Style fixes for r242958. Reported and reviewed by: bde MFC after: 28 days	2012-11-16 06:22:14 +00:00
kib	63c9e066e5	Regen	2012-11-13 12:53:41 +00:00
kib	1409e8df20	Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage. Allow to get the current rusage information for non-exited processes as well, similar to Solaris. The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in. Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state. PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month	2012-11-13 12:52:31 +00:00
kib	f16ea99007	The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks	2012-11-02 13:56:36 +00:00
kib	560aa751e0	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
kevlo	ceb08698f2	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
kevlo	8747a46991	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
kib	8f845e475e	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks	2012-09-28 11:25:02 +00:00
kevlo	ce4326d001	Remove redundant check	2012-09-12 10:12:03 +00:00
jhb	e8b429e1c0	Remove some more NetBSD compat shims and other unused bits from these drivers: - Remove scsi_low_pisa.*, they were unused. - Remove <compat/netbsd/physio_proc.h> and calls to the stubs in that header. They were empty nops. - Retire sl_xname and use device_get_nameunit() and device_printf() with the underlying device_t instead. - Remove unused {ct,ncv,nsp,stg}print() functions. - Remove empty SOFT_INTR_REQUIRED() macro and the unused sl_irq member.	2012-09-10 18:49:49 +00:00
davidxu	b788233f5b	regen.	2012-08-17 02:47:16 +00:00
davidxu	3f0806aa1f	Implement syscall clock_getcpuclockid2, so we can get a clock id for process, thread or others we want to support. Use the syscall to implement POSIX API clock_getcpuclock and pthread_getcpuclockid. PR: 168417	2012-08-17 02:26:31 +00:00
kib	9f82bec8aa	Regenerate.	2012-08-15 15:18:20 +00:00
kib	b2d9fb2c07	Provide 32bit compat for truncate(2) and ftruncate(2). MFC after: 1 week	2012-08-15 15:17:56 +00:00
kib	876d70b046	Regenerate.	2012-08-14 12:09:36 +00:00
kib	bc5359fb02	Implement the old mmap syscall for compat32, when COMPAT_43 option is enabled. The syscall is used by FreeBSD 1.1.5.1 dynamic linker. MFC after: 1 week	2012-08-14 12:09:09 +00:00
kib	6253150288	Cosmetics: define FREEBSD32_MINUSER and AOUT32_MINUSER for struct sysentvec .sv_minuser. Also improve style. Submitted by: Oliver Pinter <oliver.pinter@gmail.com> MFC after: 1 week	2012-07-22 13:41:45 +00:00
kib	53224f018a	Extend the KPI to lock and unlock f_offset member of struct file. It now fully encapsulates all accesses to f_offset, and extends f_offset locking to other consumers that need it, in particular, to lseek() and variants of getdirentries(). Ensure that on 32bit architectures f_offset, which is 64bit quantity, always read and written under the mtxpool protection. This fixes apparently easy to trigger race when parallel lseek()s or lseek() and read/write could destroy file offset. The already broken ABI emulations, including iBCS and SysV, are not converted (yet). Tested by: pho No objections from: jhb MFC after: 3 weeks	2012-07-02 21:01:03 +00:00
kevlo	07ebfe1b9c	Make sure that each va_start has one and only one matching va_end, especially in error cases.	2012-05-29 01:48:06 +00:00
kib	cae6484163	Fix ki_cow for compat32 binaries. MFC after: 3 days	2012-05-27 05:24:53 +00:00
ed	0d9131d0d0	Regenerate system call tables.	2012-05-25 21:52:57 +00:00
ed	55e4d6365d	Remove use of non-ISO-C integer types from system call tables. These files already use ISO-C-style integer types, so make them less inconsistent by preferring the standard types.	2012-05-25 21:50:48 +00:00
gleb	3c7243df78	Add kern_fhstat(), adjust sys_fhstat() to use it. Extend kern_getdirentries() to accept uio segflag and optionally return buffer residue. Sponsored by: Google Summer of Code 2011	2012-05-24 08:00:26 +00:00
netchild	9895b5ca9d	- >500 static DTrace probes for the linuxulator - DTrace scripts to check for errors, performance, ... they serve mostly as examples of what you can do with the static probe;s with moderate load the scripts may be overwhelmed, excessive lock-tracing may influence program behavior (see the last design decission) Design decissions: - use "linuxulator" as the provider for the native bitsize; add the bitsize for the non-native emulation (e.g. "linuxuator32" on amd64) - Add probes only for locks which are acquired in one function and released in another function. Locks which are aquired and released in the same function should be easy to pair in the code, inter-function locking is more easy to verify in DTrace. - Probes for locks should be fired after locking and before releasing to prevent races (to provide data/function stability in DTrace, see the man-page of "dtrace -v ..." and the corresponding DTrace docs).	2012-05-05 19:42:38 +00:00
jkim	e210f689a8	- Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27 but GNU libc used it without checking its kernel version, e. g., Fedora 10. - Move pipe(2) implementation for Linuxulator from MD files to MI file, sys/compat/linux/linux_file.c. There is no MD code for this syscall at all. - Correct an argument type for pipe() from l_ulong * to l_int *. Probably this was the source of MI/MD confusion. Reviewed by: emulation	2012-04-16 21:22:02 +00:00
tijl	212d562cf9	Remove some unnecessary includes.	2012-03-18 19:15:11 +00:00
tijl	35c7447060	Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h. Reviewed by: kib	2012-03-18 19:12:11 +00:00
tijl	2bf580ea66	Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98 reg.h with stubs. The tREGISTER macros are only made visible on i386. These macros are deprecated and should not be available on amd64. The i386 and amd64 versions of struct reg have been renamed to struct __reg32 and struct __reg64. During compilation either __reg32 or __reg64 is defined as reg depending on the machine architecture. On amd64 the i386 struct is also available as struct reg32 which is used in COMPAT_FREEBSD32 code. Most of compat/ia32/ia32_reg.h is now IA64 only. Reviewed by: kib (previous version)	2012-03-18 19:06:38 +00:00
tijl	9c671fcaca	Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h. Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed. Create machine/npx.h on amd64 to allow compiling i386 code that uses this header. The original npx.h and fpu.h define struct envxmm differently. Both definitions have been included in the new x86 header as struct __envxmm32 and struct __envxmm64. During compilation either __envxmm32 or __envxmm64 is defined as envxmm depending on machine architecture. On amd64 the i386 struct is also available as struct envxmm32. Reviewed by: kib	2012-03-16 20:24:30 +00:00
brucec	88520fea69	Fix race condition in KfRaiseIrql(). After getting the current irql, if the kthread gets preempted and subsequently runs on a different CPU, the saved irql could be wrong. Also, correct the panic string. PR: kern/165630 Submitted by: Vladislav Movchan <vladislav.movchan at gmail.com>	2012-03-04 17:08:43 +00:00
jmallett	6485e73b87	On MIPS, _ALIGN always aligns to 8 bytes, even for 32-bit binaries. This might not be ideal, but is the ABI we've shipped so far. Fix macros which reflect the results of _ALIGN on 32-bit MIPS to use the right alignment. This fixes sendmsg under COMPAT_FREEBSD32 on n64 MIPS kernels.	2012-03-03 21:39:12 +00:00
jmallett	50c253779f	o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands using the o32 ABI. This mostly follows nwhitehorn's lead in implementing COMPAT_FREEBSD32 on powerpc64. o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the 32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit ABIs use a 64-bit time_t. o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS with 32-bit compatibility, then, disable some code which assumes otherwise wrongly when built for MIPS. A more general macro to check in this case would seem like a good idea eventually. If someone adds support for using n32 userland with n64 kernels on MIPS, then they will have to add a variety of flags related to each piece of the ABI that can vary. That's probably the right time to generalize further. o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the freebsd32 compat code. Probably this should be generalized at some point. Reviewed by: gonzo	2012-03-03 08:19:18 +00:00
mm	77766742e1	Add procfs to jail-mountable filesystems. Reviewed by: jamie MFC after: 1 week	2012-02-29 00:30:18 +00:00
kib	80ae8fe82c	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month	2012-02-21 01:05:12 +00:00
kib	abd1094f17	Fix misuse of the kernel map in miscellaneous image activators. Vnode-backed mappings cannot be put into the kernel map, since it is a system map. Use exec_map for transient mappings, and remove the mappings with kmem_free_wakeup() to notify the waiters on available map space. Do not map the whole executable into KVA at all to copy it out into usermode. Directly use vn_rdwr() for the case of not page aligned binary. There is one place left where the potentially unbounded amount of data is mapped into exec_map, namely, in the COFF image activator enumeration of the needed shared libraries. Reviewed by: alc MFC after: 2 weeks	2012-02-17 23:47:16 +00:00
ed	28b4a002d6	Remove direct access to si_name. Code should just use the devtoname() function to obtain the name of a character device. Also add const keywords to pieces of code that need it to build properly. MFC after: 2 weeks	2012-02-10 12:35:57 +00:00
davidxu	c4909ace45	Add 32-bit compat code for AIO kevent flags introduced in revision 230857.	2012-02-05 04:49:31 +00:00
kib	361bfae5c2	Add support for the extended FPU states on amd64, both for native 64bit and 32bit ABIs. As a side-effect, it enables AVX on capable CPUs. In particular: - Query the CPU support for XSAVE, list of the supported extensions and the required size of FPU save area. The hw.use_xsave tunable is provided for disabling XSAVE, and hw.xsave_mask may be used to select the enabled extensions. - Remove the FPU save area from PCB and dynamically allocate the (run-time sized) user save area on the top of the kernel stack, right above the PCB. Reorganize the thread0 PCB initialization to postpone it after BSP is queried for save area size. - The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as well. FPU state is only useful for suspend, where it is saved in dynamically allocated suspfpusave area. - Use XSAVE and XRSTOR to save/restore FPU state, if supported and enabled. - Define new mcontext_t flag _MC_HASFPXSTATE, indicating that mcontext_t has a valid pointer to out-of-struct extended FPU state. Signal handlers are supplied with stack-allocated fpu state. The sigreturn(2) and setcontext(2) syscall honour the flag, allowing the signal handlers to inspect and manipilate extended state in the interrupted context. - The getcontext(2) never returns extended state, since there is no place in the fixed-sized mcontext_t to place variable-sized save area. And, since mcontext_t is embedded into ucontext_t, makes it impossible to fix in a reasonable way. Instead of extending getcontext(2) syscall, provide a sysarch(2) facility to query extended FPU state. - Add ptrace(2) support for getting and setting extended state; while there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries. - Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to consumers, making it opaque. Internally, struct fpu_kern_ctx now contains a space for the extended state. Convert in-kernel consumers of fpu_kern KPI both on i386 and amd64. First version of the support for AVX was submitted by Tim Bird <tim.bird am sony com> on behalf of Sony. This version was written from scratch. Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org> MFC after: 1 month	2012-01-21 17:45:27 +00:00

... 3 4 5 6 7 ...

2389 Commits