Commit Graph

7036 Commits

Author SHA1 Message Date
Dmitry Chagin
c3978c7bb1 Implement prlimit64() system call.
Differential Revision:	https://reviews.freebsd.org/D1050
Reviewed by:	emaste, trasz
2015-05-24 15:18:19 +00:00
Dmitry Chagin
737325a46d Regen for r283399. 2015-05-24 15:15:46 +00:00
Dmitry Chagin
254a937ee5 Implement dup3() system call.
Differential Revision:	https://reviews.freebsd.org/D1049
Reviewed by:	emaste
2015-05-24 15:14:51 +00:00
Dmitry Chagin
f680d990e8 Regen for r283396. 2015-05-24 15:12:38 +00:00
Dmitry Chagin
7ac9766db4 Implement rt_sigqueueinfo() system call.
Differential Revision:	https://reviews.freebsd.org/D1047
Reviewed by:	trasz
2015-05-24 15:11:32 +00:00
Dmitry Chagin
e4454275a5 Regen for r283394. 2015-05-24 15:08:25 +00:00
Dmitry Chagin
e5fe4ccf59 Implement waitid() system call.
Differential Revision:	https://reviews.freebsd.org/D1046
2015-05-24 15:06:39 +00:00
Dmitry Chagin
001398c4c5 To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().

Differential Revision:	https://reviews.freebsd.org/D2138
Reviewed by:	trasz
2015-05-24 15:03:09 +00:00
Dmitry Chagin
af682d487b Some style(9) && whitespaces fixes. No functional changes.
Differential Revision:	https://reviews.freebsd.org/D1041
Reviewed by:	emaste
2015-05-24 14:55:12 +00:00
Dmitry Chagin
81338031c4 Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
   process reparent when the process group leader exits and close
   to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
   managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
   the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
   bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
   struct thread and freed in the thread_dtor() hook.
   Mandatory holdig of the p_mtx required when referencing emuldata
   from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
   and Linux tgid is the native pid, with the exception of the first
   thread in the process where tid and pid are one and the same.

Ugliness:

   In case when the Linux thread is the initial thread in the thread
   group thread id is equal to the process id. Glibc depends on this
   magic (assert in pthread_getattr_np.c). So for system calls that
   take thread id as a parameter we should use the special method
   to reference struct thread.

Differential Revision:	https://reviews.freebsd.org/D1039
2015-05-24 14:53:16 +00:00
Dmitry Chagin
91d1786f65 In preparation for switching linuxulator to the use the native 1:1
threads add a hook for cleaning thread resources before the thread die.

Differential Revision:	https://reviews.freebsd.org/D1038
2015-05-24 14:51:29 +00:00
Dmitry Chagin
64cfe4dc38 Regen for r283379. 2015-05-24 14:47:00 +00:00
Dmitry Chagin
2003907d45 Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.

Differential Revision:	https://reviews.freebsd.org/D1036
Reviewed by:	trasz
2015-05-24 14:45:57 +00:00
Dmitry Chagin
111c86e3d1 Remove a now unused include.
Differential Revision:	https://reviews.freebsd.org/D1035
Reviewed by:	trasz
2015-05-24 14:44:57 +00:00
Dmitry Chagin
1aa90eca33 In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).

Linuxulator temporarily uses first thread in proc.

Move linux_sched_rr_get_interval() to the MI part.

Differential Revision:	https://reviews.freebsd.org/D1032
Reviewed by:	trasz
2015-05-24 14:39:26 +00:00
Dmitry Chagin
8c744294fe Regen for r283370. 2015-05-24 14:34:46 +00:00
Dmitry Chagin
161acbb670 In preparation for switching linuxulator to the use the native 1:1
threads introduce linux_exit() stub instead of sys_exit() call
(which terminates process).
In the new linuxulator exit() system call terminates the calling
thread (not a whole process).

Differential Revision:	https://reviews.freebsd.org/D1027
Reviewed by:	trasz
2015-05-24 14:33:19 +00:00
Dmitry Chagin
1d80c8a8f0 In preparation for switching linuxulator to the use the native 1:1
threads print the thread id in addition to the pid in debug messages.
2015-05-24 14:29:35 +00:00
Neel Natu
47b9935d9b Exceptions don't deliver an error code in real mode.
MFC after:	1 week
2015-05-23 01:17:50 +00:00
Neel Natu
f149ce540e Remove the verification of instruction length after instruction decode. The
check has been bogus since r273375.

MFC after:	1 week
2015-05-22 21:09:11 +00:00
Neel Natu
1c73ea3ef8 Don't rely on the 'VM-exit instruction length' field in the VMCS to always
have an accurate length on an EPT violation. This is not needed by the
instruction decoding code because it also has to work with AMD/SVM that
does not provide a valid instruction length on a Nested Page Fault.

In collaboration with:	Leon Dang (ldang@nahannisys.com)
Discussed with:		grehan
MFC after:		1 week
2015-05-22 17:34:22 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Neel Natu
b32d1908d5 Emulate the "CMP r/m, reg" instruction (opcode 39H).
Reported and tested by:	Leon Dang (ldang@nahannisys.com)
MFC after:	1 week
2015-05-21 18:23:37 +00:00
Pedro F. Giffuni
cd508278c1 ddb: finish converting boolean values.
The replacement started at r283088 was necessarily incomplete without
replacing boolean_t with bool.  This also involved cleaning some type
mismatches and ansifying old C function declarations.

Pointed out by:	bde
Discussed with:	bde, ian, jhb
2015-05-21 15:16:18 +00:00
Konstantin Belousov
100ac78be1 On amd64, make proc0 pmap initialization slightly more correct. In
particular, switch to the proc0 pmap to have expected %cr3 and PCID
for the thread0 during initialization, and the up to date pm_active
mask.

pmap_pinit0() should be done after proc0->p_vmspace is assigned so
that the amd64 pmap_activate() find the correct curproc pmap.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-15 08:30:29 +00:00
Konstantin Belousov
f83e0dcb3a Implement the support for PCID in UP kernels.
Requested by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-15 07:57:47 +00:00
Jim Harris
6e3471bd0b Add nvme and nvd drivers to GENERIC for amd64 and i386.
MFC after:	3 days
Sponsored by:	Intel
2015-05-14 20:19:22 +00:00
Edward Tomasz Napierala
ba8f0eb8fc Build GENERIC with RACCT/RCTL support by default. Note that it still
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.

Differential Revision:	https://reviews.freebsd.org/D2407
Reviewed by:	emaste@, wblock@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-05-14 14:03:55 +00:00
Konstantin Belousov
f116422f38 Initialize pcids array for the proc0 pmap.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-10 09:09:07 +00:00
Konstantin Belousov
78ac908e9b Tweak assert to also print the thread address.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-10 09:05:57 +00:00
Konstantin Belousov
7b445033ff On exec, single-threading must be enforced before arguments space is
allocated from exec_map.  If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space.  Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-10 09:00:40 +00:00
Konstantin Belousov
b40a715c11 Correct the assertion. We should compare the pmap' curcpu pcid value
against 0, not the pmap.

Noted by:	Oliver Pinter <oliver.pinter@hardenedbsd.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 21:36:44 +00:00
Konstantin Belousov
a546448b8d Rewrite amd64 PCID implementation to follow an algorithm described in
the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency
Algorithms".  The same algorithm is already utilized by the MIPS pmap
to handle ASIDs.

The PCID for the address space is now allocated per-cpu during context
switch to the thread using pmap, when no PCID on the cpu was ever
allocated, or the current PCID is invalidated.  If the PCID is reused,
bit 63 of %cr3 can be set to avoid TLB flush.

Each cpu has PCID' algorithm generation count, which is saved in the
pmap pcpu block when pcpu PCID is allocated.  On invalidation, the
pmap generation count is zeroed, which signals the context switch code
that already allocated PCID is no longer valid.  The implication is
the TLB shootdown for the given cpu/address space, due to the
allocation of new PCID.

The pm_save mask is no longer has to be tracked, which (significantly)
reduces the targets of the TLB shootdown IPIs.  Previously, pm_save
was reset only on pmap_invalidate_all(), which made it accumulate the
cpuids of all processors on which the thread was scheduled between
full TLB shootdowns.

Besides reducing the amount of TLB shootdowns and removing atomics to
update pm_saves in the context switch code, the algorithm is much
simpler than the maintanence of pm_save and selection of the right
address space in the shootdown IPI handler.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 19:11:01 +00:00
Konstantin Belousov
6841f70168 Remove unused define.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-05-09 18:38:35 +00:00
Konstantin Belousov
b57a73f8e7 If x86 CPU implementation of the MWAIT instruction reasonably
interacts with interrupts, query ACPI and use MWAIT for entrance into
Cx sleep states.  Support C1 "I/O then halt" mode.  See Intel'
document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface
Specification" for description.

Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use
it instead of inlining "sti; hlt" sequence in several places.

In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods
sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported}
sysctls.

Both jkim and avg have some other patches implementing the mwait
functionality; this work is unrelated.  Linux does not rely on the
ACPI to provide correct tables describing Cx modes.  Instead, the
driver has pre-defined knowledge of the CPU models, it was supplied by
Intel.

Tested by:    pho (previous versions)
Sponsored by:	The FreeBSD Foundation
2015-05-09 12:28:48 +00:00
Neel Natu
ede0403309 Check 'td_owepreempt' and yield the vcpu thread if it is set.
This is done explicitly because a vcpu thread can be in a critical section
for the entire time slice alloted to it. This in turn can delay the handling
of the 'td_owepreempt'.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D2430
2015-05-06 23:40:24 +00:00
Neel Natu
9c4d547896 Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().
Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.

The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.

Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.

Reviewed by:	tychon
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D2428
2015-05-06 16:25:20 +00:00
Neel Natu
ea91ca92ba Do a proper emulation of guest writes to MSR_EFER.
- Must-Be-Zero bits cannot be set.
- EFER_LME and EFER_LMA should respect the long mode consistency checks.
- EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities.
- Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce
  segment limits in 64-bit mode.

MFC after:	2 weeks
2015-05-06 05:40:20 +00:00
Neel Natu
6a273d5ef7 Emulate the 'CMP r/m8, imm8' instruction encountered when booting a Windows
Vista guest.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	1 week
2015-05-04 04:27:23 +00:00
Neel Natu
317080849e Don't advertise the Intel SMX capability to the guest.
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	1 week
2015-05-02 19:07:49 +00:00
Neel Natu
1d29bfc149 Emulate machine check related MSRs to allow guest OSes like Windows to boot.
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-05-02 04:19:11 +00:00
Neel Natu
44e2f0fea9 r281630 relaxed the limits on the vectors that can be asserted in the IRRs.
Do the same when transitioning a vector from the IRR to the ISR and also
when extinguishing it from the ISR in response to an EOI.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-05-01 16:00:29 +00:00
Neel Natu
fe22991fb8 Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are
enabled.

MFC after:	2 weeks
2015-05-01 05:11:14 +00:00
Neel Natu
8325ce5c7e Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.
Only a subset of source files that include <machine/vmm.h> need to use the
APIs that require the inclusion of <sys/cpuset.h>.

MFC after:	1 week
2015-04-30 22:23:22 +00:00
Neel Natu
c07a0648ec When an instruction cannot be decoded just return to userspace so bhyve(8)
can dump the instruction bytes.

Requested by:	grehan
MFC after:	1 week
2015-04-30 21:00:47 +00:00
Neel Natu
7d786ee2a9 Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.
This is required for booting Windows guests.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-30 19:23:50 +00:00
John Baldwin
ed95805e90 Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains.  Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead.  Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
  components for PV kernels.  These components are now standard as they
  are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
  etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision:	https://reviews.freebsd.org/D2362
Reviewed by:	royger
Tested by:	royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes:	yes
2015-04-30 15:48:48 +00:00
Neel Natu
787fb3d026 Re-implement RTC current time calculation to eliminate the possibility of
losing time.

The problem with the earlier implementation was that the uptime value
used by 'vrtc_curtime()' could be different than the uptime value when
'vrtc_time_update()' actually updated 'base_uptime'.

Fix this by calculating and updating the (rtctime, uptime) tuple together.

MFC after:	2 weeks
2015-04-29 23:44:28 +00:00
Wei Hu
da2f98a1cf Microsoft vmbus, storage and other related driver enhancements for HyperV.
- Vmbus multi channel support.
    - Vector interrupt support.
    - Signal optimization.
    - Storvsc driver performance improvement.
    - Scatter and gather support for storvsc driver.
    - Minor bug fix for KVP driver.
Thanks royger, jhb and delphij from FreeBSD community for the reviews
and comments. Also thanks Hovy Xu from NetApp for the contributions to
the storvsc driver.

PR:     195238
Submitted by:   whu
Reviewed by:    royger, jhb, delphij
Approved by:    royger
MFC after:      2 weeks
Relnotes:       yes
Sponsored by:   Microsoft OSTC
2015-04-29 10:12:34 +00:00
Neel Natu
b8070ef5b1 Emulate the 'bit test' instruction. Windows 7 uses 'bit test' to check the
'Delivery Status' bit in APIC ICR register.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-29 02:01:46 +00:00