Commit Graph

7142 Commits

Author SHA1 Message Date
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Mateusz Guzik
4ea6a9a28f Generalised support for copy-on-write structures shared by threads.
Thread credentials are maintained as follows: each thread has a pointer to
creds and a reference on them. The pointer is compared with proc's creds on
userspace<->kernel boundary and updated if needed.

This patch introduces a counter which can be compared instead, so that more
structures can use this scheme without adding more comparisons on the boundary.
2015-06-10 10:43:59 +00:00
Alan Cox
65a9768f62 Account for superpage mappings that are created by pmap_copy(). 2015-06-09 18:04:28 +00:00
Tycho Nightingale
277bdd9950 Support guest writes to the TSC by enabling the "use TSC offsetting"
execution control and writing the difference between the host TSC and
the guest TSC into the TSC offset in the VMCS upon encountering a
write.

Reviewed by:	neel
2015-06-09 00:14:47 +00:00
Dmitry Chagin
c2bc5b15eb Futex is an aligned 32-bit integer. Use the proper instruction and
operand when dereferencing futex pointer.
2015-06-08 17:39:25 +00:00
Alan Cox
966272ca33 Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages.
Differential Revision:	https://reviews.freebsd.org/D2712
Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2015-06-08 04:59:32 +00:00
Konstantin Belousov
32a1e9e4a5 Update print_INTEL_TLB() by the tag values from the Intel SDM
rev. 55.  The modern CPUs cache and TLB descriptions looked quite
questionable without the update, e.g. Haswell i7 4770S reported:
	Data TLB: 4 KB pages, 4-way set associative, 64 entries
	L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
After the update, the report is:
	Data TLB: 1 GByte pages, 4-way set associative, 4 entries
	Data TLB: 4 KB pages, 4-way set associative, 64 entries
	Instruction TLB: 2M/4M pages, fully associative, 8 entries
	Instruction TLB: 4KByte pages, 8-way set associative, 64 entries
	64-Byte prefetching
	Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries
	L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
Some tags were apparently removed from the table 3-21, Vol. 2A.  Keep
them around, but add a comment stating the removal.

Update the format line for cpu_stdext_feature according to the bits
from the SDM rev.55.  It appears that Haswells do not store %cs and
%ds values in the FPU save area.

Store content of the %ecx register from the CPUID leaf 0x7
subleaf 0 as cpu_stdext_feature2 and print defined bits from it,
again acording to SDM rev. 55.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-06 22:03:24 +00:00
Neel Natu
647c87825c The 'verify_gla()' function is used to ensure that the effective address
after decoding the instruction matches the one provided by hardware.

Prior to r283293 'vie->num_valid' used to contain the actual length of
the instruction whereas now it contains the maximum instruction length
possible. This introduced a bug when calculating a RIP-relative base address.

Fix this by using 'vie->num_processed' rather than 'vie->num_valid' as the
length of the emulated instruction.

Reported and tested by:	tychon
MFC after:	1 week
2015-06-05 21:22:26 +00:00
Neel Natu
b14bd6ac9d Use tunable 'hw.vmm.svm.features' to disable specific SVM features even
though they might be available in hardware.

Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by
the hypervisor.

MFC after:	1 week
2015-06-04 02:12:23 +00:00
Dimitry Andric
38954a1d1c Remove unneeded NULL checks in amd64's trap_fatal().
Since td_name is an array member of struct thread, it can never be NULL,
so the check can be removed.  In addition, curproc can never be NULL,
so remove the if statement, and splice the two printfs() together.

While here, remove the u_long cast, and use the correct printf format
specifier curproc->p_pid.

Reviewed by:	kib
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D2695
2015-06-01 06:50:39 +00:00
Konstantin Belousov
69baeadc31 Remove several write-only variables, all reported by the gcc 4.9
buildkernel run.

Some of them were write-only under some kernel options, e.g. variables
keeping values only used by CTR() macros.  It costs nothing to the
code readability and correctness to eliminate the warnings in those
cases too by removing the local cached values used only for
single-access.

Review:	https://reviews.freebsd.org/D2665
Reviewed by:	rodrigc
Looked at by:	bjk
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-05-29 13:24:17 +00:00
Neel Natu
248e6799e9 Fix non-deterministic delays when accessing a vcpu that was in "running" or
"sleeping" state. This is done by forcing the vcpu to transition to "idle"
by returning to userspace with an exit code of VM_EXITCODE_REQIDLE.

MFC after:      2 weeks
2015-05-28 17:37:01 +00:00
Konstantin Belousov
1c5f423183 Enabled rewritten PCID support by default.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2015-05-27 09:50:18 +00:00
Dmitry Chagin
d707582f83 When I merged the lemul branch I missied kib@'s r282708 commit.
This is not the final fix as I need properly cleanup thread resources
before other threads suicide.

Tested by:	Ruslan Makhmatkhanov
2015-05-25 20:44:46 +00:00
Dmitry Chagin
5c5aac2d45 Regen for r283492. 2015-05-24 18:09:01 +00:00
Dmitry Chagin
9802eb9ebc Implement Linux specific syncfs() system call. 2015-05-24 18:08:01 +00:00
Dmitry Chagin
c532a88cfc Regen for r283488. 2015-05-24 18:05:21 +00:00
Dmitry Chagin
e1ff74c0f7 Implement recvmmsg() and sendmmsg() system calls. 2015-05-24 18:04:04 +00:00
Dmitry Chagin
b7aaa9fdb0 Reduce duplication between MD Linux code by moving msg related
struct definitions out into the compat/linux/linux_socket.h
2015-05-24 18:03:14 +00:00
Dmitry Chagin
3ce05165b1 Regen for r283484. 2015-05-24 18:02:17 +00:00
Dmitry Chagin
6e4c8004dc Implement epoll_pwait() system call. 2015-05-24 18:00:14 +00:00
Dmitry Chagin
ca045164a7 Regen for r283480. 2015-05-24 17:58:24 +00:00
Dmitry Chagin
19d8b461f4 Add utimensat() system call.
The patch developed by Jilles Tjoelker and Andrew Wilcox and
adopted for lemul branch by me.
2015-05-24 17:57:07 +00:00
Dmitry Chagin
0c38abc250 The kernel sends signals to the processes via ABI specific sv_sendsig method.
Native ABI do not need signal conversion, only emulators may want this. Usually
emulators implements its own sv_sendsig method. For now only ibcs2 emulator does
not have own sv_sendsig implementation and depends on native sendsig() method.
So, remove any extra attempts to convert signal numbers from native sendsig()
methods except from i386 where ibsc2 is living.
2015-05-24 17:56:02 +00:00
Dmitry Chagin
4ab7403bbd Rework signal code to allow using it by other modules, like linprocfs:
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.

2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.

3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.

4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.

PR:		197216
2015-05-24 17:47:20 +00:00
Dmitry Chagin
a7ac457613 According to Linux man sigaltstack(3) shall return EINVAL if the ss
argument is not a null pointer, and the ss_flags member pointed to by ss
contains flags other than SS_DISABLE. However, in fact, Linux also
allows SS_ONSTACK flag which is simply ignored.

For buggy apps (at least mono) ignore other than SS_DISABLE
flags as a Linux do.

While here move MI part of sigaltstack code to the appropriate place.

Reported by:	abi at abinet dot ru
2015-05-24 17:44:08 +00:00
Dmitry Chagin
657100de57 Regen for r283467. 2015-05-24 17:39:18 +00:00
Dmitry Chagin
fcdffc03f8 Call nosys in case when the incorrect syscall number is specified.
Reported by:	trinity
2015-05-24 17:38:02 +00:00
Dmitry Chagin
8d939ad405 Regen for r283465. 2015-05-24 17:35:42 +00:00
Dmitry Chagin
b6aeb7d5dd Add preliminary fallocate system call implementation
to emulate posix_fallocate() function.

Differential Revision:	https://reviews.freebsd.org/D1523
Reviewed by:	emaste
2015-05-24 17:33:21 +00:00
Dmitry Chagin
274d2df2e5 Regen for r283451. 2015-05-24 17:00:43 +00:00
Dmitry Chagin
a6b40812ec Implement ppoll() system call.
Differential Revision:	https://reviews.freebsd.org/D1105
Reviewed by:	trasz
2015-05-24 16:59:25 +00:00
Dmitry Chagin
e8b026b37e Include opt_compat.h, so that COMPAT_LINUX32 is defined, and we can
access to the semop structs and functions.

Submitted by:	cognet@

Differential Revision:	https://reviews.freebsd.org/D1095
Reviewed by:	trasz
2015-05-24 16:51:04 +00:00
Dmitry Chagin
22f3dfdc12 Regen for r283444. 2015-05-24 16:50:17 +00:00
Dmitry Chagin
a31d76867d Implement eventfd system call.
Differential Revision:	https://reviews.freebsd.org/D1094
In collaboration with:	Jilles Tjoelker
2015-05-24 16:49:14 +00:00
Dmitry Chagin
3e89b64168 Put the correct value for the abi_nfdbits parameter of kern_select() for
all supported Linuxulators.

Differential Revision:	https://reviews.freebsd.org/D1093
Reviewed by:	trasz
2015-05-24 16:47:13 +00:00
Dmitry Chagin
28fb55359b Regen for r283441. 2015-05-24 16:42:49 +00:00
Dmitry Chagin
e16fe1c730 Implement epoll family system calls. This is a tiny wrapper
around kqueue() to implement epoll subset of functionality.
The kqueue user data are 32bit on i386 which is not enough for
epoll user data, so we keep user data in the proc emuldata.

Initial patch developed by rdivacky@ in 2007, then extended
by Yuri Victorovich @ r255672 and finished by me
in collaboration with mjg@ and jillies@.

Differential Revision:	https://reviews.freebsd.org/D1092
2015-05-24 16:41:39 +00:00
Dmitry Chagin
4d0f380d87 To avoid code duplication move open/fcntl definitions to the MI
header file.

Differential Revision:	https://reviews.freebsd.org/D1087
Reviewed by:	trasz
2015-05-24 16:31:44 +00:00
Dmitry Chagin
26c68e1fe5 Use the BSD_TO_LINUX_SIGNAL() wherever there is no need
to check the ABI as it is known.

Differential Revision:	https://reviews.freebsd.org/D1086
2015-05-24 16:30:23 +00:00
Dmitry Chagin
437c43c1cb Being exported through vdso the note.Linux section used by glibc
to determine the kernel version (this saves one uname call).
Temporarily disable the export of a note.Linux section until I figured
out how to change the kernel version in the note.Linux on the fly.

Differential Revision:	https://reviews.freebsd.org/D1081
Reviewed by:	trasz
2015-05-24 16:25:44 +00:00
Dmitry Chagin
4048f59cd0 Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by
glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory.

Differential Revision:	https://reviews.freebsd.org/D1080
2015-05-24 16:24:24 +00:00
Dmitry Chagin
0a1884d768 Regen for r283428. 2015-05-24 16:19:57 +00:00
Dmitry Chagin
baa232bbfd Change linux faccessat syscall definition to match actual linux one.
The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented
within the glibc wrapper function for faccessat().  If either of these
flags are specified, then the wrapper function employs fstatat() to
determine access permissions.

Differential Revision:	https://reviews.freebsd.org/D1078
Reviewed by:	trasz
2015-05-24 16:18:03 +00:00
Dmitry Chagin
523be40fe4 Regen for r283424. 2015-05-24 16:11:21 +00:00
Dmitry Chagin
b2f587918d Add preliminary support for x86-64 Linux binaries.
Differential Revision:	https://reviews.freebsd.org/D1076
2015-05-24 16:07:11 +00:00
Dmitry Chagin
bc27367760 Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.

As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.

Differential Revision:	https://reviews.freebsd.org/D1073
2015-05-24 15:54:58 +00:00
Dmitry Chagin
67d3974849 Introduce a new module linux_common.ko which is intended for the
following primary purposes:

1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.

2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).

3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.

Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.

Temporarily remove dtrace garbage from linux_mib.c and linux_util.c

Differential Revision:	https://reviews.freebsd.org/D1072
In collaboration with:	Vassilis Laganakos.

Reviewed by:	trasz
2015-05-24 15:51:18 +00:00
Dmitry Chagin
31eb438886 x86_64 Linux do not use multiplexing on ipc system calls.
Move struct ipc_perm definition to the MD path as it differs for 64 and
32 bit platform.

Differential Revision:	https://reviews.freebsd.org/D1068
Reviewed by:	trasz
2015-05-24 15:44:41 +00:00
Dmitry Chagin
26cf41d6ca Remove stale comment about a signal trampoline which
is moved to the shared page at r219609.

Differential Revision:	https://reviews.freebsd.org/D1063
Reviewed by:	trasz
2015-05-24 15:32:52 +00:00
Dmitry Chagin
0020bdf13a Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.

Differential Revision:	https://reviews.freebsd.org/D1062
Reviewed by:	trasz
2015-05-24 15:30:52 +00:00
Dmitry Chagin
32084836c0 Eliminate a now unused global declaration of elf_linux_sysvec.
Differential Revision:	https://reviews.freebsd.org/D1061
Reviewed by:	trasz
2015-05-24 15:29:20 +00:00
Dmitry Chagin
bdc379344a Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.

Differential Revision:	https://reviews.freebsd.org/D1060
2015-05-24 15:28:17 +00:00
Dmitry Chagin
b2e0aad9e5 Regen for r283403. 2015-05-24 15:22:33 +00:00
Dmitry Chagin
ae50b4d7b5 Implement pselect6() system call.
Differential Revision:	https://reviews.freebsd.org/D1051
Reviewed by:	trasz
2015-05-24 15:21:25 +00:00
Dmitry Chagin
e7fa9de6eb Regen for r283401. 2015-05-24 15:19:44 +00:00
Dmitry Chagin
c3978c7bb1 Implement prlimit64() system call.
Differential Revision:	https://reviews.freebsd.org/D1050
Reviewed by:	emaste, trasz
2015-05-24 15:18:19 +00:00
Dmitry Chagin
737325a46d Regen for r283399. 2015-05-24 15:15:46 +00:00
Dmitry Chagin
254a937ee5 Implement dup3() system call.
Differential Revision:	https://reviews.freebsd.org/D1049
Reviewed by:	emaste
2015-05-24 15:14:51 +00:00
Dmitry Chagin
f680d990e8 Regen for r283396. 2015-05-24 15:12:38 +00:00
Dmitry Chagin
7ac9766db4 Implement rt_sigqueueinfo() system call.
Differential Revision:	https://reviews.freebsd.org/D1047
Reviewed by:	trasz
2015-05-24 15:11:32 +00:00
Dmitry Chagin
e4454275a5 Regen for r283394. 2015-05-24 15:08:25 +00:00
Dmitry Chagin
e5fe4ccf59 Implement waitid() system call.
Differential Revision:	https://reviews.freebsd.org/D1046
2015-05-24 15:06:39 +00:00
Dmitry Chagin
001398c4c5 To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().

Differential Revision:	https://reviews.freebsd.org/D2138
Reviewed by:	trasz
2015-05-24 15:03:09 +00:00
Dmitry Chagin
af682d487b Some style(9) && whitespaces fixes. No functional changes.
Differential Revision:	https://reviews.freebsd.org/D1041
Reviewed by:	emaste
2015-05-24 14:55:12 +00:00
Dmitry Chagin
81338031c4 Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
   process reparent when the process group leader exits and close
   to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
   managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
   the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
   bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
   struct thread and freed in the thread_dtor() hook.
   Mandatory holdig of the p_mtx required when referencing emuldata
   from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
   and Linux tgid is the native pid, with the exception of the first
   thread in the process where tid and pid are one and the same.

Ugliness:

   In case when the Linux thread is the initial thread in the thread
   group thread id is equal to the process id. Glibc depends on this
   magic (assert in pthread_getattr_np.c). So for system calls that
   take thread id as a parameter we should use the special method
   to reference struct thread.

Differential Revision:	https://reviews.freebsd.org/D1039
2015-05-24 14:53:16 +00:00
Dmitry Chagin
91d1786f65 In preparation for switching linuxulator to the use the native 1:1
threads add a hook for cleaning thread resources before the thread die.

Differential Revision:	https://reviews.freebsd.org/D1038
2015-05-24 14:51:29 +00:00
Dmitry Chagin
64cfe4dc38 Regen for r283379. 2015-05-24 14:47:00 +00:00
Dmitry Chagin
2003907d45 Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.

Differential Revision:	https://reviews.freebsd.org/D1036
Reviewed by:	trasz
2015-05-24 14:45:57 +00:00
Dmitry Chagin
111c86e3d1 Remove a now unused include.
Differential Revision:	https://reviews.freebsd.org/D1035
Reviewed by:	trasz
2015-05-24 14:44:57 +00:00
Dmitry Chagin
1aa90eca33 In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).

Linuxulator temporarily uses first thread in proc.

Move linux_sched_rr_get_interval() to the MI part.

Differential Revision:	https://reviews.freebsd.org/D1032
Reviewed by:	trasz
2015-05-24 14:39:26 +00:00
Dmitry Chagin
8c744294fe Regen for r283370. 2015-05-24 14:34:46 +00:00
Dmitry Chagin
161acbb670 In preparation for switching linuxulator to the use the native 1:1
threads introduce linux_exit() stub instead of sys_exit() call
(which terminates process).
In the new linuxulator exit() system call terminates the calling
thread (not a whole process).

Differential Revision:	https://reviews.freebsd.org/D1027
Reviewed by:	trasz
2015-05-24 14:33:19 +00:00
Dmitry Chagin
1d80c8a8f0 In preparation for switching linuxulator to the use the native 1:1
threads print the thread id in addition to the pid in debug messages.
2015-05-24 14:29:35 +00:00
Neel Natu
47b9935d9b Exceptions don't deliver an error code in real mode.
MFC after:	1 week
2015-05-23 01:17:50 +00:00
Neel Natu
f149ce540e Remove the verification of instruction length after instruction decode. The
check has been bogus since r273375.

MFC after:	1 week
2015-05-22 21:09:11 +00:00
Neel Natu
1c73ea3ef8 Don't rely on the 'VM-exit instruction length' field in the VMCS to always
have an accurate length on an EPT violation. This is not needed by the
instruction decoding code because it also has to work with AMD/SVM that
does not provide a valid instruction length on a Nested Page Fault.

In collaboration with:	Leon Dang (ldang@nahannisys.com)
Discussed with:		grehan
MFC after:		1 week
2015-05-22 17:34:22 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Neel Natu
b32d1908d5 Emulate the "CMP r/m, reg" instruction (opcode 39H).
Reported and tested by:	Leon Dang (ldang@nahannisys.com)
MFC after:	1 week
2015-05-21 18:23:37 +00:00
Pedro F. Giffuni
cd508278c1 ddb: finish converting boolean values.
The replacement started at r283088 was necessarily incomplete without
replacing boolean_t with bool.  This also involved cleaning some type
mismatches and ansifying old C function declarations.

Pointed out by:	bde
Discussed with:	bde, ian, jhb
2015-05-21 15:16:18 +00:00
Konstantin Belousov
100ac78be1 On amd64, make proc0 pmap initialization slightly more correct. In
particular, switch to the proc0 pmap to have expected %cr3 and PCID
for the thread0 during initialization, and the up to date pm_active
mask.

pmap_pinit0() should be done after proc0->p_vmspace is assigned so
that the amd64 pmap_activate() find the correct curproc pmap.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-15 08:30:29 +00:00
Konstantin Belousov
f83e0dcb3a Implement the support for PCID in UP kernels.
Requested by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-15 07:57:47 +00:00
Jim Harris
6e3471bd0b Add nvme and nvd drivers to GENERIC for amd64 and i386.
MFC after:	3 days
Sponsored by:	Intel
2015-05-14 20:19:22 +00:00
Edward Tomasz Napierala
ba8f0eb8fc Build GENERIC with RACCT/RCTL support by default. Note that it still
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.

Differential Revision:	https://reviews.freebsd.org/D2407
Reviewed by:	emaste@, wblock@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-05-14 14:03:55 +00:00
Konstantin Belousov
f116422f38 Initialize pcids array for the proc0 pmap.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-10 09:09:07 +00:00
Konstantin Belousov
78ac908e9b Tweak assert to also print the thread address.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-10 09:05:57 +00:00
Konstantin Belousov
7b445033ff On exec, single-threading must be enforced before arguments space is
allocated from exec_map.  If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space.  Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-10 09:00:40 +00:00
Konstantin Belousov
b40a715c11 Correct the assertion. We should compare the pmap' curcpu pcid value
against 0, not the pmap.

Noted by:	Oliver Pinter <oliver.pinter@hardenedbsd.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 21:36:44 +00:00
Konstantin Belousov
a546448b8d Rewrite amd64 PCID implementation to follow an algorithm described in
the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency
Algorithms".  The same algorithm is already utilized by the MIPS pmap
to handle ASIDs.

The PCID for the address space is now allocated per-cpu during context
switch to the thread using pmap, when no PCID on the cpu was ever
allocated, or the current PCID is invalidated.  If the PCID is reused,
bit 63 of %cr3 can be set to avoid TLB flush.

Each cpu has PCID' algorithm generation count, which is saved in the
pmap pcpu block when pcpu PCID is allocated.  On invalidation, the
pmap generation count is zeroed, which signals the context switch code
that already allocated PCID is no longer valid.  The implication is
the TLB shootdown for the given cpu/address space, due to the
allocation of new PCID.

The pm_save mask is no longer has to be tracked, which (significantly)
reduces the targets of the TLB shootdown IPIs.  Previously, pm_save
was reset only on pmap_invalidate_all(), which made it accumulate the
cpuids of all processors on which the thread was scheduled between
full TLB shootdowns.

Besides reducing the amount of TLB shootdowns and removing atomics to
update pm_saves in the context switch code, the algorithm is much
simpler than the maintanence of pm_save and selection of the right
address space in the shootdown IPI handler.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 19:11:01 +00:00
Konstantin Belousov
6841f70168 Remove unused define.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-05-09 18:38:35 +00:00
Konstantin Belousov
b57a73f8e7 If x86 CPU implementation of the MWAIT instruction reasonably
interacts with interrupts, query ACPI and use MWAIT for entrance into
Cx sleep states.  Support C1 "I/O then halt" mode.  See Intel'
document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface
Specification" for description.

Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use
it instead of inlining "sti; hlt" sequence in several places.

In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods
sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported}
sysctls.

Both jkim and avg have some other patches implementing the mwait
functionality; this work is unrelated.  Linux does not rely on the
ACPI to provide correct tables describing Cx modes.  Instead, the
driver has pre-defined knowledge of the CPU models, it was supplied by
Intel.

Tested by:    pho (previous versions)
Sponsored by:	The FreeBSD Foundation
2015-05-09 12:28:48 +00:00
Neel Natu
ede0403309 Check 'td_owepreempt' and yield the vcpu thread if it is set.
This is done explicitly because a vcpu thread can be in a critical section
for the entire time slice alloted to it. This in turn can delay the handling
of the 'td_owepreempt'.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D2430
2015-05-06 23:40:24 +00:00
Neel Natu
9c4d547896 Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().
Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.

The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.

Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.

Reviewed by:	tychon
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D2428
2015-05-06 16:25:20 +00:00
Neel Natu
ea91ca92ba Do a proper emulation of guest writes to MSR_EFER.
- Must-Be-Zero bits cannot be set.
- EFER_LME and EFER_LMA should respect the long mode consistency checks.
- EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities.
- Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce
  segment limits in 64-bit mode.

MFC after:	2 weeks
2015-05-06 05:40:20 +00:00
Neel Natu
6a273d5ef7 Emulate the 'CMP r/m8, imm8' instruction encountered when booting a Windows
Vista guest.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	1 week
2015-05-04 04:27:23 +00:00
Neel Natu
317080849e Don't advertise the Intel SMX capability to the guest.
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	1 week
2015-05-02 19:07:49 +00:00
Neel Natu
1d29bfc149 Emulate machine check related MSRs to allow guest OSes like Windows to boot.
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-05-02 04:19:11 +00:00
Neel Natu
44e2f0fea9 r281630 relaxed the limits on the vectors that can be asserted in the IRRs.
Do the same when transitioning a vector from the IRR to the ISR and also
when extinguishing it from the ISR in response to an EOI.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-05-01 16:00:29 +00:00
Neel Natu
fe22991fb8 Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are
enabled.

MFC after:	2 weeks
2015-05-01 05:11:14 +00:00
Neel Natu
8325ce5c7e Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.
Only a subset of source files that include <machine/vmm.h> need to use the
APIs that require the inclusion of <sys/cpuset.h>.

MFC after:	1 week
2015-04-30 22:23:22 +00:00