Commit Graph

7049 Commits

Author SHA1 Message Date
Dmitry Chagin
baa232bbfd Change linux faccessat syscall definition to match actual linux one.
The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented
within the glibc wrapper function for faccessat().  If either of these
flags are specified, then the wrapper function employs fstatat() to
determine access permissions.

Differential Revision:	https://reviews.freebsd.org/D1078
Reviewed by:	trasz
2015-05-24 16:18:03 +00:00
Dmitry Chagin
523be40fe4 Regen for r283424. 2015-05-24 16:11:21 +00:00
Dmitry Chagin
b2f587918d Add preliminary support for x86-64 Linux binaries.
Differential Revision:	https://reviews.freebsd.org/D1076
2015-05-24 16:07:11 +00:00
Dmitry Chagin
bc27367760 Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.

As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.

Differential Revision:	https://reviews.freebsd.org/D1073
2015-05-24 15:54:58 +00:00
Dmitry Chagin
67d3974849 Introduce a new module linux_common.ko which is intended for the
following primary purposes:

1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.

2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).

3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.

Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.

Temporarily remove dtrace garbage from linux_mib.c and linux_util.c

Differential Revision:	https://reviews.freebsd.org/D1072
In collaboration with:	Vassilis Laganakos.

Reviewed by:	trasz
2015-05-24 15:51:18 +00:00
Dmitry Chagin
31eb438886 x86_64 Linux do not use multiplexing on ipc system calls.
Move struct ipc_perm definition to the MD path as it differs for 64 and
32 bit platform.

Differential Revision:	https://reviews.freebsd.org/D1068
Reviewed by:	trasz
2015-05-24 15:44:41 +00:00
Dmitry Chagin
26cf41d6ca Remove stale comment about a signal trampoline which
is moved to the shared page at r219609.

Differential Revision:	https://reviews.freebsd.org/D1063
Reviewed by:	trasz
2015-05-24 15:32:52 +00:00
Dmitry Chagin
0020bdf13a Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.

Differential Revision:	https://reviews.freebsd.org/D1062
Reviewed by:	trasz
2015-05-24 15:30:52 +00:00
Dmitry Chagin
32084836c0 Eliminate a now unused global declaration of elf_linux_sysvec.
Differential Revision:	https://reviews.freebsd.org/D1061
Reviewed by:	trasz
2015-05-24 15:29:20 +00:00
Dmitry Chagin
bdc379344a Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.

Differential Revision:	https://reviews.freebsd.org/D1060
2015-05-24 15:28:17 +00:00
Dmitry Chagin
b2e0aad9e5 Regen for r283403. 2015-05-24 15:22:33 +00:00
Dmitry Chagin
ae50b4d7b5 Implement pselect6() system call.
Differential Revision:	https://reviews.freebsd.org/D1051
Reviewed by:	trasz
2015-05-24 15:21:25 +00:00
Dmitry Chagin
e7fa9de6eb Regen for r283401. 2015-05-24 15:19:44 +00:00
Dmitry Chagin
c3978c7bb1 Implement prlimit64() system call.
Differential Revision:	https://reviews.freebsd.org/D1050
Reviewed by:	emaste, trasz
2015-05-24 15:18:19 +00:00
Dmitry Chagin
737325a46d Regen for r283399. 2015-05-24 15:15:46 +00:00
Dmitry Chagin
254a937ee5 Implement dup3() system call.
Differential Revision:	https://reviews.freebsd.org/D1049
Reviewed by:	emaste
2015-05-24 15:14:51 +00:00
Dmitry Chagin
f680d990e8 Regen for r283396. 2015-05-24 15:12:38 +00:00
Dmitry Chagin
7ac9766db4 Implement rt_sigqueueinfo() system call.
Differential Revision:	https://reviews.freebsd.org/D1047
Reviewed by:	trasz
2015-05-24 15:11:32 +00:00
Dmitry Chagin
e4454275a5 Regen for r283394. 2015-05-24 15:08:25 +00:00
Dmitry Chagin
e5fe4ccf59 Implement waitid() system call.
Differential Revision:	https://reviews.freebsd.org/D1046
2015-05-24 15:06:39 +00:00
Dmitry Chagin
001398c4c5 To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().

Differential Revision:	https://reviews.freebsd.org/D2138
Reviewed by:	trasz
2015-05-24 15:03:09 +00:00
Dmitry Chagin
af682d487b Some style(9) && whitespaces fixes. No functional changes.
Differential Revision:	https://reviews.freebsd.org/D1041
Reviewed by:	emaste
2015-05-24 14:55:12 +00:00
Dmitry Chagin
81338031c4 Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
   process reparent when the process group leader exits and close
   to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
   managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
   the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
   bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
   struct thread and freed in the thread_dtor() hook.
   Mandatory holdig of the p_mtx required when referencing emuldata
   from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
   and Linux tgid is the native pid, with the exception of the first
   thread in the process where tid and pid are one and the same.

Ugliness:

   In case when the Linux thread is the initial thread in the thread
   group thread id is equal to the process id. Glibc depends on this
   magic (assert in pthread_getattr_np.c). So for system calls that
   take thread id as a parameter we should use the special method
   to reference struct thread.

Differential Revision:	https://reviews.freebsd.org/D1039
2015-05-24 14:53:16 +00:00
Dmitry Chagin
91d1786f65 In preparation for switching linuxulator to the use the native 1:1
threads add a hook for cleaning thread resources before the thread die.

Differential Revision:	https://reviews.freebsd.org/D1038
2015-05-24 14:51:29 +00:00
Dmitry Chagin
64cfe4dc38 Regen for r283379. 2015-05-24 14:47:00 +00:00
Dmitry Chagin
2003907d45 Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.

Differential Revision:	https://reviews.freebsd.org/D1036
Reviewed by:	trasz
2015-05-24 14:45:57 +00:00
Dmitry Chagin
111c86e3d1 Remove a now unused include.
Differential Revision:	https://reviews.freebsd.org/D1035
Reviewed by:	trasz
2015-05-24 14:44:57 +00:00
Dmitry Chagin
1aa90eca33 In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).

Linuxulator temporarily uses first thread in proc.

Move linux_sched_rr_get_interval() to the MI part.

Differential Revision:	https://reviews.freebsd.org/D1032
Reviewed by:	trasz
2015-05-24 14:39:26 +00:00
Dmitry Chagin
8c744294fe Regen for r283370. 2015-05-24 14:34:46 +00:00
Dmitry Chagin
161acbb670 In preparation for switching linuxulator to the use the native 1:1
threads introduce linux_exit() stub instead of sys_exit() call
(which terminates process).
In the new linuxulator exit() system call terminates the calling
thread (not a whole process).

Differential Revision:	https://reviews.freebsd.org/D1027
Reviewed by:	trasz
2015-05-24 14:33:19 +00:00
Dmitry Chagin
1d80c8a8f0 In preparation for switching linuxulator to the use the native 1:1
threads print the thread id in addition to the pid in debug messages.
2015-05-24 14:29:35 +00:00
Neel Natu
47b9935d9b Exceptions don't deliver an error code in real mode.
MFC after:	1 week
2015-05-23 01:17:50 +00:00
Neel Natu
f149ce540e Remove the verification of instruction length after instruction decode. The
check has been bogus since r273375.

MFC after:	1 week
2015-05-22 21:09:11 +00:00
Neel Natu
1c73ea3ef8 Don't rely on the 'VM-exit instruction length' field in the VMCS to always
have an accurate length on an EPT violation. This is not needed by the
instruction decoding code because it also has to work with AMD/SVM that
does not provide a valid instruction length on a Nested Page Fault.

In collaboration with:	Leon Dang (ldang@nahannisys.com)
Discussed with:		grehan
MFC after:		1 week
2015-05-22 17:34:22 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Neel Natu
b32d1908d5 Emulate the "CMP r/m, reg" instruction (opcode 39H).
Reported and tested by:	Leon Dang (ldang@nahannisys.com)
MFC after:	1 week
2015-05-21 18:23:37 +00:00
Pedro F. Giffuni
cd508278c1 ddb: finish converting boolean values.
The replacement started at r283088 was necessarily incomplete without
replacing boolean_t with bool.  This also involved cleaning some type
mismatches and ansifying old C function declarations.

Pointed out by:	bde
Discussed with:	bde, ian, jhb
2015-05-21 15:16:18 +00:00
Konstantin Belousov
100ac78be1 On amd64, make proc0 pmap initialization slightly more correct. In
particular, switch to the proc0 pmap to have expected %cr3 and PCID
for the thread0 during initialization, and the up to date pm_active
mask.

pmap_pinit0() should be done after proc0->p_vmspace is assigned so
that the amd64 pmap_activate() find the correct curproc pmap.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-15 08:30:29 +00:00
Konstantin Belousov
f83e0dcb3a Implement the support for PCID in UP kernels.
Requested by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-15 07:57:47 +00:00
Jim Harris
6e3471bd0b Add nvme and nvd drivers to GENERIC for amd64 and i386.
MFC after:	3 days
Sponsored by:	Intel
2015-05-14 20:19:22 +00:00
Edward Tomasz Napierala
ba8f0eb8fc Build GENERIC with RACCT/RCTL support by default. Note that it still
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.

Differential Revision:	https://reviews.freebsd.org/D2407
Reviewed by:	emaste@, wblock@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-05-14 14:03:55 +00:00
Konstantin Belousov
f116422f38 Initialize pcids array for the proc0 pmap.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-10 09:09:07 +00:00
Konstantin Belousov
78ac908e9b Tweak assert to also print the thread address.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-10 09:05:57 +00:00
Konstantin Belousov
7b445033ff On exec, single-threading must be enforced before arguments space is
allocated from exec_map.  If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space.  Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-10 09:00:40 +00:00
Konstantin Belousov
b40a715c11 Correct the assertion. We should compare the pmap' curcpu pcid value
against 0, not the pmap.

Noted by:	Oliver Pinter <oliver.pinter@hardenedbsd.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 21:36:44 +00:00
Konstantin Belousov
a546448b8d Rewrite amd64 PCID implementation to follow an algorithm described in
the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency
Algorithms".  The same algorithm is already utilized by the MIPS pmap
to handle ASIDs.

The PCID for the address space is now allocated per-cpu during context
switch to the thread using pmap, when no PCID on the cpu was ever
allocated, or the current PCID is invalidated.  If the PCID is reused,
bit 63 of %cr3 can be set to avoid TLB flush.

Each cpu has PCID' algorithm generation count, which is saved in the
pmap pcpu block when pcpu PCID is allocated.  On invalidation, the
pmap generation count is zeroed, which signals the context switch code
that already allocated PCID is no longer valid.  The implication is
the TLB shootdown for the given cpu/address space, due to the
allocation of new PCID.

The pm_save mask is no longer has to be tracked, which (significantly)
reduces the targets of the TLB shootdown IPIs.  Previously, pm_save
was reset only on pmap_invalidate_all(), which made it accumulate the
cpuids of all processors on which the thread was scheduled between
full TLB shootdowns.

Besides reducing the amount of TLB shootdowns and removing atomics to
update pm_saves in the context switch code, the algorithm is much
simpler than the maintanence of pm_save and selection of the right
address space in the shootdown IPI handler.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 19:11:01 +00:00
Konstantin Belousov
6841f70168 Remove unused define.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-05-09 18:38:35 +00:00
Konstantin Belousov
b57a73f8e7 If x86 CPU implementation of the MWAIT instruction reasonably
interacts with interrupts, query ACPI and use MWAIT for entrance into
Cx sleep states.  Support C1 "I/O then halt" mode.  See Intel'
document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface
Specification" for description.

Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use
it instead of inlining "sti; hlt" sequence in several places.

In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods
sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported}
sysctls.

Both jkim and avg have some other patches implementing the mwait
functionality; this work is unrelated.  Linux does not rely on the
ACPI to provide correct tables describing Cx modes.  Instead, the
driver has pre-defined knowledge of the CPU models, it was supplied by
Intel.

Tested by:    pho (previous versions)
Sponsored by:	The FreeBSD Foundation
2015-05-09 12:28:48 +00:00
Neel Natu
ede0403309 Check 'td_owepreempt' and yield the vcpu thread if it is set.
This is done explicitly because a vcpu thread can be in a critical section
for the entire time slice alloted to it. This in turn can delay the handling
of the 'td_owepreempt'.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D2430
2015-05-06 23:40:24 +00:00
Neel Natu
9c4d547896 Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().
Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.

The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.

Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.

Reviewed by:	tychon
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D2428
2015-05-06 16:25:20 +00:00