Commit Graph

754 Commits

Author SHA1 Message Date
Dmitry Chagin
26cf41d6ca Remove stale comment about a signal trampoline which
is moved to the shared page at r219609.

Differential Revision:	https://reviews.freebsd.org/D1063
Reviewed by:	trasz
2015-05-24 15:32:52 +00:00
Dmitry Chagin
0020bdf13a Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.

Differential Revision:	https://reviews.freebsd.org/D1062
Reviewed by:	trasz
2015-05-24 15:30:52 +00:00
Dmitry Chagin
32084836c0 Eliminate a now unused global declaration of elf_linux_sysvec.
Differential Revision:	https://reviews.freebsd.org/D1061
Reviewed by:	trasz
2015-05-24 15:29:20 +00:00
Dmitry Chagin
bdc379344a Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.

Differential Revision:	https://reviews.freebsd.org/D1060
2015-05-24 15:28:17 +00:00
Dmitry Chagin
b2e0aad9e5 Regen for r283403. 2015-05-24 15:22:33 +00:00
Dmitry Chagin
ae50b4d7b5 Implement pselect6() system call.
Differential Revision:	https://reviews.freebsd.org/D1051
Reviewed by:	trasz
2015-05-24 15:21:25 +00:00
Dmitry Chagin
e7fa9de6eb Regen for r283401. 2015-05-24 15:19:44 +00:00
Dmitry Chagin
c3978c7bb1 Implement prlimit64() system call.
Differential Revision:	https://reviews.freebsd.org/D1050
Reviewed by:	emaste, trasz
2015-05-24 15:18:19 +00:00
Dmitry Chagin
737325a46d Regen for r283399. 2015-05-24 15:15:46 +00:00
Dmitry Chagin
254a937ee5 Implement dup3() system call.
Differential Revision:	https://reviews.freebsd.org/D1049
Reviewed by:	emaste
2015-05-24 15:14:51 +00:00
Dmitry Chagin
f680d990e8 Regen for r283396. 2015-05-24 15:12:38 +00:00
Dmitry Chagin
7ac9766db4 Implement rt_sigqueueinfo() system call.
Differential Revision:	https://reviews.freebsd.org/D1047
Reviewed by:	trasz
2015-05-24 15:11:32 +00:00
Dmitry Chagin
e4454275a5 Regen for r283394. 2015-05-24 15:08:25 +00:00
Dmitry Chagin
e5fe4ccf59 Implement waitid() system call.
Differential Revision:	https://reviews.freebsd.org/D1046
2015-05-24 15:06:39 +00:00
Dmitry Chagin
18599d43e9 Regen for r283392. 2015-05-24 15:05:22 +00:00
Dmitry Chagin
28ca9c9e0b struct l_rusage does not defined for i386 Linuxulator due to it's nature.
Differential Revision:	https://reviews.freebsd.org/D2139
2015-05-24 15:04:12 +00:00
Dmitry Chagin
001398c4c5 To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().

Differential Revision:	https://reviews.freebsd.org/D2138
Reviewed by:	trasz
2015-05-24 15:03:09 +00:00
Dmitry Chagin
af682d487b Some style(9) && whitespaces fixes. No functional changes.
Differential Revision:	https://reviews.freebsd.org/D1041
Reviewed by:	emaste
2015-05-24 14:55:12 +00:00
Dmitry Chagin
81338031c4 Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
   process reparent when the process group leader exits and close
   to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
   managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
   the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
   bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
   struct thread and freed in the thread_dtor() hook.
   Mandatory holdig of the p_mtx required when referencing emuldata
   from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
   and Linux tgid is the native pid, with the exception of the first
   thread in the process where tid and pid are one and the same.

Ugliness:

   In case when the Linux thread is the initial thread in the thread
   group thread id is equal to the process id. Glibc depends on this
   magic (assert in pthread_getattr_np.c). So for system calls that
   take thread id as a parameter we should use the special method
   to reference struct thread.

Differential Revision:	https://reviews.freebsd.org/D1039
2015-05-24 14:53:16 +00:00
Dmitry Chagin
64cfe4dc38 Regen for r283379. 2015-05-24 14:47:00 +00:00
Dmitry Chagin
2003907d45 Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.

Differential Revision:	https://reviews.freebsd.org/D1036
Reviewed by:	trasz
2015-05-24 14:45:57 +00:00
Dmitry Chagin
37588665e4 Regen for r283375. 2015-05-24 14:43:06 +00:00
Dmitry Chagin
e7b198ab02 In preparation for switching linuxulator to the use the native 1:1
threads use MI linux_sched_rr_get_interval() in i386.

Differential Revision:	https://reviews.freebsd.org/D1033
Reviewed by:	trasz
2015-05-24 14:40:41 +00:00
Dmitry Chagin
8c744294fe Regen for r283370. 2015-05-24 14:34:46 +00:00
Dmitry Chagin
161acbb670 In preparation for switching linuxulator to the use the native 1:1
threads introduce linux_exit() stub instead of sys_exit() call
(which terminates process).
In the new linuxulator exit() system call terminates the calling
thread (not a whole process).

Differential Revision:	https://reviews.freebsd.org/D1027
Reviewed by:	trasz
2015-05-24 14:33:19 +00:00
Dmitry Chagin
1d80c8a8f0 In preparation for switching linuxulator to the use the native 1:1
threads print the thread id in addition to the pid in debug messages.
2015-05-24 14:29:35 +00:00
Konstantin Belousov
7b445033ff On exec, single-threading must be enforced before arguments space is
allocated from exec_map.  If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space.  Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-10 09:00:40 +00:00
Dmitry Chagin
7b15ee61fc Regen after r276508, r276509. 2015-01-01 18:43:31 +00:00
Dmitry Chagin
0db7526f14 Correct an argument status of wait4 syscall for Linuxulator.
Forgot about i386.

MFC after:	1 week
2015-01-01 18:41:34 +00:00
John Baldwin
824fc46089 MFamd64: Add support for extended FPU states on i386. This includes
support for AVX on i386.
- Similar to amd64, move the FPU save area out of the PCB and instead
  store saved FPU state in a variable-sized buffer after the PCB on the
  stack.
- To support the variable PCB location, alter the locore code to only use
  the bottom-most page of proc0stack for init386().  init386() returns
  the correct stack pointer to locore which adjusts the stack for thread0
  before calling mi_startup().
- Don't bother setting cr3 in thread0's pcb in locore before calling
  init386().  It wasn't used (init386() overwrote it at the end) and
  it doesn't work with the variable-sized FPU save area.
- Remove the new-bus attachment from npx.  This was only ever useful for
  external co-processors using IRQ13, but those have not been supported
  for several years.  npxinit() is now called much earlier during boot
  (init386()) similar to amd64.
- Implement PT_{GET,SET}XSTATE and I386_GET_XFPUSTATE.
- npxsave() is now only called from context switch contexts so it can
  use XSAVEOPT.

Differential Revision:	https://reviews.freebsd.org/D1058
Reviewed by:	kib
Tested on:	FreeBSD/i386 VM under bhyve on Intel i5-2520
2014-11-02 22:58:30 +00:00
John Baldwin
716718932f MFamd64: Move extern declaration of _ucodesel and _udatasel to
<machine/md_var.h>
2014-11-02 21:40:32 +00:00
Bjoern A. Zeeb
581980cff8 Re-gen after r271743 implementing most of
timer_{create,settime,gettime,getoverrun,delete}.

MFC after:		3 days
Sponsored by:		DARPA, AFRL
2014-09-18 08:40:00 +00:00
Bjoern A. Zeeb
0a041f3b47 Implement most of timer_{create,settime,gettime,getoverrun,delete}
for amd64/linux32.  Fix the entirely bogus (untested) version from
r161310 for i386/linux using the same shared code in compat/linux.

It is unclear to me if we could support more clock mappings but
the current set allows me to successfully run commercial
32bit linux software under linuxolator on amd64.

Reviewed by:		jhb
Differential Revision:	D784
MFC after:		3 days
Sponsored by:		DARPA, AFRL
2014-09-18 08:36:45 +00:00
Dmitry Chagin
2dedc1281a Revert r266925 as it can lead to instant panic at fexecve():
To allow to run the interpreter itself add a new ELF branding type.

Pointed out by:	kib, mjg
2014-06-17 05:29:18 +00:00
Dmitry Chagin
5f56da1891 To allow to run the interpreter itself add a new ELF branding type.
Allow Linux ABI to run ELF interpreter.

MFC after:	3 days
2014-05-31 15:01:51 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Ed Maste
3d271aaab0 x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)
Debuggers may need to change PSL_RF.  Note that tf_eflags is already stored
in the signal context during signal handling and PSL_RF previously could be
modified via sigreturn, so this change should not provide any new ability
to userspace.

For background see the thread at:
http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html

Reviewed by:	jhb, kib
Sponsored by:	DARPA, AFRL
2013-11-14 15:37:20 +00:00
Roman Divacky
69d912af45 Regen.
Approved by:	re (delphij)
2013-09-18 18:49:26 +00:00
Roman Divacky
b12698e1a1 Revert r255672, it has some serious flaws, leaking file references etc.
Approved by:	re (delphij)
2013-09-18 18:48:33 +00:00
Roman Divacky
70ccaaf58e Regen.
Approved by:    re (delphij)
2013-09-18 17:58:03 +00:00
Roman Divacky
253c75c0de Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.

Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.

Approved by:    re (delphij)
Sponsored by:   Google Summer of Code
Submitted by:   Yuri Victorovich <yuri at rawbw dot com>
Tested by:      Yuri Victorovich <yuri at rawbw dot com>
2013-09-18 17:56:04 +00:00
John Baldwin
edb572a38c Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space.  This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address.  While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by:	alc
Approved by:	re (kib)
2013-09-09 18:11:59 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
John Baldwin
d825ce0a5d Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h
by moving bits that are MI out into headers in compat/linux.

Reviewed by:	Chagin Dmitry  dmitry | gmail
MFC after:	2 weeks
2013-01-29 18:41:30 +00:00
John Baldwin
fb709557a3 Don't assume that all Linux TCP-level socket options are identical to
FreeBSD TCP-level socket options (only the first two are).  Instead,
using a mapping function and fail unsupported options as we do for other
socket option levels.

MFC after:	2 weeks
2013-01-23 21:44:48 +00:00
Kevin Lo
9823d52705 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
Kevin Lo
a10cee30c9 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
John Baldwin
bb07b39fa1 Regen. 2012-07-30 20:45:17 +00:00
John Baldwin
e6e0554a98 The linux_lstat() system call accepts a pointer to a 'struct l_stat', not a
'struct ostat'.
2012-07-30 20:44:45 +00:00