freebsd-skq

Author	SHA1	Message	Date
Dmitry Chagin	3e89b64168	Put the correct value for the abi_nfdbits parameter of kern_select() for all supported Linuxulators. Differential Revision: https://reviews.freebsd.org/D1093 Reviewed by: trasz	2015-05-24 16:47:13 +00:00
Dmitry Chagin	28fb55359b	Regen for r283441.	2015-05-24 16:42:49 +00:00
Dmitry Chagin	e16fe1c730	Implement epoll family system calls. This is a tiny wrapper around kqueue() to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data, so we keep user data in the proc emuldata. Initial patch developed by rdivacky@ in 2007, then extended by Yuri Victorovich @ r255672 and finished by me in collaboration with mjg@ and jillies@. Differential Revision: https://reviews.freebsd.org/D1092	2015-05-24 16:41:39 +00:00
Dmitry Chagin	4d0f380d87	To avoid code duplication move open/fcntl definitions to the MI header file. Differential Revision: https://reviews.freebsd.org/D1087 Reviewed by: trasz	2015-05-24 16:31:44 +00:00
Dmitry Chagin	26c68e1fe5	Use the BSD_TO_LINUX_SIGNAL() wherever there is no need to check the ABI as it is known. Differential Revision: https://reviews.freebsd.org/D1086	2015-05-24 16:30:23 +00:00
Dmitry Chagin	437c43c1cb	Being exported through vdso the note.Linux section used by glibc to determine the kernel version (this saves one uname call). Temporarily disable the export of a note.Linux section until I figured out how to change the kernel version in the note.Linux on the fly. Differential Revision: https://reviews.freebsd.org/D1081 Reviewed by: trasz	2015-05-24 16:25:44 +00:00
Dmitry Chagin	4048f59cd0	Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory. Differential Revision: https://reviews.freebsd.org/D1080	2015-05-24 16:24:24 +00:00
Dmitry Chagin	0a1884d768	Regen for r283428.	2015-05-24 16:19:57 +00:00
Dmitry Chagin	baa232bbfd	Change linux faccessat syscall definition to match actual linux one. The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented within the glibc wrapper function for faccessat(). If either of these flags are specified, then the wrapper function employs fstatat() to determine access permissions. Differential Revision: https://reviews.freebsd.org/D1078 Reviewed by: trasz	2015-05-24 16:18:03 +00:00
Dmitry Chagin	523be40fe4	Regen for r283424.	2015-05-24 16:11:21 +00:00
Dmitry Chagin	b2f587918d	Add preliminary support for x86-64 Linux binaries. Differential Revision: https://reviews.freebsd.org/D1076	2015-05-24 16:07:11 +00:00
Dmitry Chagin	bc27367760	Refund the proc emuldata struct for future use. For now move flags from thread emuldata to proc emuldata as it was originally intended. As we can have both 64 & 32 bit Linuxulator running any eventhandler can be called twice for us. To prevent this move eventhandlers code from linux_emul.c to the linux_common.ko module. Differential Revision: https://reviews.freebsd.org/D1073	2015-05-24 15:54:58 +00:00
Dmitry Chagin	67d3974849	Introduce a new module linux_common.ko which is intended for the following primary purposes: 1. Remove the dependency of linsysfs and linprocfs modules from linux.ko, which will be architecture specific on amd64. 2. Incorporate into linux_common.ko general code for platforms on which we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit). 3. Move malloc(9) declaration to linux_common.ko, to enable getting memory usage statistics properly. Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko. Temporarily remove dtrace garbage from linux_mib.c and linux_util.c Differential Revision: https://reviews.freebsd.org/D1072 In collaboration with: Vassilis Laganakos. Reviewed by: trasz	2015-05-24 15:51:18 +00:00
Dmitry Chagin	31eb438886	x86_64 Linux do not use multiplexing on ipc system calls. Move struct ipc_perm definition to the MD path as it differs for 64 and 32 bit platform. Differential Revision: https://reviews.freebsd.org/D1068 Reviewed by: trasz	2015-05-24 15:44:41 +00:00
Dmitry Chagin	26cf41d6ca	Remove stale comment about a signal trampoline which is moved to the shared page at r219609. Differential Revision: https://reviews.freebsd.org/D1063 Reviewed by: trasz	2015-05-24 15:32:52 +00:00
Dmitry Chagin	0020bdf13a	Put linux_platform into the vdso to avoid copying it onto the stack at every exec. Differential Revision: https://reviews.freebsd.org/D1062 Reviewed by: trasz	2015-05-24 15:30:52 +00:00
Dmitry Chagin	32084836c0	Eliminate a now unused global declaration of elf_linux_sysvec. Differential Revision: https://reviews.freebsd.org/D1061 Reviewed by: trasz	2015-05-24 15:29:20 +00:00
Dmitry Chagin	bdc379344a	Implement vdso - virtual dynamic shared object. Through vdso Linux exposes functions from kernel with proper DWARF CFI information so that it becomes easier to unwind through them. Using vdso is a mandatory for a thread cancelation && cleanup on a modern glibc. Differential Revision: https://reviews.freebsd.org/D1060	2015-05-24 15:28:17 +00:00
Dmitry Chagin	b2e0aad9e5	Regen for r283403.	2015-05-24 15:22:33 +00:00
Dmitry Chagin	ae50b4d7b5	Implement pselect6() system call. Differential Revision: https://reviews.freebsd.org/D1051 Reviewed by: trasz	2015-05-24 15:21:25 +00:00
Dmitry Chagin	e7fa9de6eb	Regen for r283401.	2015-05-24 15:19:44 +00:00
Dmitry Chagin	c3978c7bb1	Implement prlimit64() system call. Differential Revision: https://reviews.freebsd.org/D1050 Reviewed by: emaste, trasz	2015-05-24 15:18:19 +00:00
Dmitry Chagin	737325a46d	Regen for r283399.	2015-05-24 15:15:46 +00:00
Dmitry Chagin	254a937ee5	Implement dup3() system call. Differential Revision: https://reviews.freebsd.org/D1049 Reviewed by: emaste	2015-05-24 15:14:51 +00:00
Dmitry Chagin	f680d990e8	Regen for r283396.	2015-05-24 15:12:38 +00:00
Dmitry Chagin	7ac9766db4	Implement rt_sigqueueinfo() system call. Differential Revision: https://reviews.freebsd.org/D1047 Reviewed by: trasz	2015-05-24 15:11:32 +00:00
Dmitry Chagin	e4454275a5	Regen for r283394.	2015-05-24 15:08:25 +00:00
Dmitry Chagin	e5fe4ccf59	Implement waitid() system call. Differential Revision: https://reviews.freebsd.org/D1046	2015-05-24 15:06:39 +00:00
Dmitry Chagin	001398c4c5	To reduce code duplication introduce linux_copyout_rusage() method. Use it in linux_wait4() system call and move linux_wait4() to the MI path. While here add a prototype for the static bsd_to_linux_rusage(). Differential Revision: https://reviews.freebsd.org/D2138 Reviewed by: trasz	2015-05-24 15:03:09 +00:00
Dmitry Chagin	af682d487b	Some style(9) && whitespaces fixes. No functional changes. Differential Revision: https://reviews.freebsd.org/D1041 Reviewed by: emaste	2015-05-24 14:55:12 +00:00
Dmitry Chagin	81338031c4	Switch linuxulator to use the native 1:1 threads. The reasons: 1. Get rid of the stubs/quirks with process dethreading, process reparent when the process group leader exits and close to this problems on wait(), waitpid(), etc. 2. Reuse our kernel code instead of writing excessive thread managment routines in Linuxulator. Implementation details: 1. The thread is created via kern_thr_new() in the clone() call with the CLONE_THREAD parameter. Thus, everything else is a process. 2. The test that the process has a threads is done via P_HADTHREADS bit p_flag of struct proc. 3. Per thread emulator state data structure is now located in the struct thread and freed in the thread_dtor() hook. Mandatory holdig of the p_mtx required when referencing emuldata from the other threads. 4. PID mangling has changed. Now Linux pid is the native tid and Linux tgid is the native pid, with the exception of the first thread in the process where tid and pid are one and the same. Ugliness: In case when the Linux thread is the initial thread in the thread group thread id is equal to the process id. Glibc depends on this magic (assert in pthread_getattr_np.c). So for system calls that take thread id as a parameter we should use the special method to reference struct thread. Differential Revision: https://reviews.freebsd.org/D1039	2015-05-24 14:53:16 +00:00
Dmitry Chagin	91d1786f65	In preparation for switching linuxulator to the use the native 1:1 threads add a hook for cleaning thread resources before the thread die. Differential Revision: https://reviews.freebsd.org/D1038	2015-05-24 14:51:29 +00:00
Dmitry Chagin	64cfe4dc38	Regen for r283379.	2015-05-24 14:47:00 +00:00
Dmitry Chagin	2003907d45	Implement a Linux version of sched_getparam() && sched_setparam(). Temporarily use the first thread in proc. Differential Revision: https://reviews.freebsd.org/D1036 Reviewed by: trasz	2015-05-24 14:45:57 +00:00
Dmitry Chagin	111c86e3d1	Remove a now unused include. Differential Revision: https://reviews.freebsd.org/D1035 Reviewed by: trasz	2015-05-24 14:44:57 +00:00
Dmitry Chagin	1aa90eca33	In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator). Linuxulator temporarily uses first thread in proc. Move linux_sched_rr_get_interval() to the MI part. Differential Revision: https://reviews.freebsd.org/D1032 Reviewed by: trasz	2015-05-24 14:39:26 +00:00
Dmitry Chagin	8c744294fe	Regen for r283370.	2015-05-24 14:34:46 +00:00
Dmitry Chagin	161acbb670	In preparation for switching linuxulator to the use the native 1:1 threads introduce linux_exit() stub instead of sys_exit() call (which terminates process). In the new linuxulator exit() system call terminates the calling thread (not a whole process). Differential Revision: https://reviews.freebsd.org/D1027 Reviewed by: trasz	2015-05-24 14:33:19 +00:00
Dmitry Chagin	1d80c8a8f0	In preparation for switching linuxulator to the use the native 1:1 threads print the thread id in addition to the pid in debug messages.	2015-05-24 14:29:35 +00:00
Neel Natu	47b9935d9b	Exceptions don't deliver an error code in real mode. MFC after: 1 week	2015-05-23 01:17:50 +00:00
Neel Natu	f149ce540e	Remove the verification of instruction length after instruction decode. The check has been bogus since r273375. MFC after: 1 week	2015-05-22 21:09:11 +00:00
Neel Natu	1c73ea3ef8	Don't rely on the 'VM-exit instruction length' field in the VMCS to always have an accurate length on an EPT violation. This is not needed by the instruction decoding code because it also has to work with AMD/SVM that does not provide a valid instruction length on a Nested Page Fault. In collaboration with: Leon Dang (ldang@nahannisys.com) Discussed with: grehan MFC after: 1 week	2015-05-22 17:34:22 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Neel Natu	b32d1908d5	Emulate the "CMP r/m, reg" instruction (opcode 39H). Reported and tested by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week	2015-05-21 18:23:37 +00:00
Pedro F. Giffuni	cd508278c1	ddb: finish converting boolean values. The replacement started at r283088 was necessarily incomplete without replacing boolean_t with bool. This also involved cleaning some type mismatches and ansifying old C function declarations. Pointed out by: bde Discussed with: bde, ian, jhb	2015-05-21 15:16:18 +00:00
Konstantin Belousov	100ac78be1	On amd64, make proc0 pmap initialization slightly more correct. In particular, switch to the proc0 pmap to have expected %cr3 and PCID for the thread0 during initialization, and the up to date pm_active mask. pmap_pinit0() should be done after proc0->p_vmspace is assigned so that the amd64 pmap_activate() find the correct curproc pmap. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-15 08:30:29 +00:00
Konstantin Belousov	f83e0dcb3a	Implement the support for PCID in UP kernels. Requested by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-15 07:57:47 +00:00
Jim Harris	6e3471bd0b	Add nvme and nvd drivers to GENERIC for amd64 and i386. MFC after: 3 days Sponsored by: Intel	2015-05-14 20:19:22 +00:00
Edward Tomasz Napierala	ba8f0eb8fc	Build GENERIC with RACCT/RCTL support by default. Note that it still needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf. Differential Revision: https://reviews.freebsd.org/D2407 Reviewed by: emaste@, wblock@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation	2015-05-14 14:03:55 +00:00
Konstantin Belousov	f116422f38	Initialize pcids array for the proc0 pmap. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-10 09:09:07 +00:00
Konstantin Belousov	78ac908e9b	Tweak assert to also print the thread address. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-10 09:05:57 +00:00
Konstantin Belousov	7b445033ff	On exec, single-threading must be enforced before arguments space is allocated from exec_map. If many threads try to perform execve(2) in parallel, the exec map is exhausted and some threads sleep uninterruptible waiting for the map space. Then, the thread which won the race for the space allocation, cannot single-thread the process, causing deadlock. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-10 09:00:40 +00:00
Konstantin Belousov	b40a715c11	Correct the assertion. We should compare the pmap' curcpu pcid value against 0, not the pmap. Noted by: Oliver Pinter <oliver.pinter@hardenedbsd.org> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-09 21:36:44 +00:00
Konstantin Belousov	a546448b8d	Rewrite amd64 PCID implementation to follow an algorithm described in the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency Algorithms". The same algorithm is already utilized by the MIPS pmap to handle ASIDs. The PCID for the address space is now allocated per-cpu during context switch to the thread using pmap, when no PCID on the cpu was ever allocated, or the current PCID is invalidated. If the PCID is reused, bit 63 of %cr3 can be set to avoid TLB flush. Each cpu has PCID' algorithm generation count, which is saved in the pmap pcpu block when pcpu PCID is allocated. On invalidation, the pmap generation count is zeroed, which signals the context switch code that already allocated PCID is no longer valid. The implication is the TLB shootdown for the given cpu/address space, due to the allocation of new PCID. The pm_save mask is no longer has to be tracked, which (significantly) reduces the targets of the TLB shootdown IPIs. Previously, pm_save was reset only on pmap_invalidate_all(), which made it accumulate the cpuids of all processors on which the thread was scheduled between full TLB shootdowns. Besides reducing the amount of TLB shootdowns and removing atomics to update pm_saves in the context switch code, the algorithm is much simpler than the maintanence of pm_save and selection of the right address space in the shootdown IPI handler. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-09 19:11:01 +00:00
Konstantin Belousov	6841f70168	Remove unused define. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-05-09 18:38:35 +00:00
Konstantin Belousov	b57a73f8e7	If x86 CPU implementation of the MWAIT instruction reasonably interacts with interrupts, query ACPI and use MWAIT for entrance into Cx sleep states. Support C1 "I/O then halt" mode. See Intel' document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface Specification" for description. Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use it instead of inlining "sti; hlt" sequence in several places. In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported} sysctls. Both jkim and avg have some other patches implementing the mwait functionality; this work is unrelated. Linux does not rely on the ACPI to provide correct tables describing Cx modes. Instead, the driver has pre-defined knowledge of the CPU models, it was supplied by Intel. Tested by: pho (previous versions) Sponsored by: The FreeBSD Foundation	2015-05-09 12:28:48 +00:00
Neel Natu	ede0403309	Check 'td_owepreempt' and yield the vcpu thread if it is set. This is done explicitly because a vcpu thread can be in a critical section for the entire time slice alloted to it. This in turn can delay the handling of the 'td_owepreempt'. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D2430	2015-05-06 23:40:24 +00:00
Neel Natu	9c4d547896	Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup(). Prior to this change both functions returned 0 for success, -1 for failure and +1 to indicate that an exception was injected into the guest. The numerical value of ERESTART also happens to be -1 so when these functions returned -1 it had to be translated to a positive errno value to prevent the VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce bugs when writing emulation code. Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if an exception was delivered to the guest. The return value is 0 or EFAULT so no additional translation is needed. Reviewed by: tychon MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D2428	2015-05-06 16:25:20 +00:00
Neel Natu	ea91ca92ba	Do a proper emulation of guest writes to MSR_EFER. - Must-Be-Zero bits cannot be set. - EFER_LME and EFER_LMA should respect the long mode consistency checks. - EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities. - Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce segment limits in 64-bit mode. MFC after: 2 weeks	2015-05-06 05:40:20 +00:00
Neel Natu	6a273d5ef7	Emulate the 'CMP r/m8, imm8' instruction encountered when booting a Windows Vista guest. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week	2015-05-04 04:27:23 +00:00
Neel Natu	317080849e	Don't advertise the Intel SMX capability to the guest. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week	2015-05-02 19:07:49 +00:00
Neel Natu	1d29bfc149	Emulate machine check related MSRs to allow guest OSes like Windows to boot. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks	2015-05-02 04:19:11 +00:00
Neel Natu	44e2f0fea9	r281630 relaxed the limits on the vectors that can be asserted in the IRRs. Do the same when transitioning a vector from the IRR to the ISR and also when extinguishing it from the ISR in response to an EOI. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks	2015-05-01 16:00:29 +00:00
Neel Natu	fe22991fb8	Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are enabled. MFC after: 2 weeks	2015-05-01 05:11:14 +00:00
Neel Natu	8325ce5c7e	Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>. Only a subset of source files that include <machine/vmm.h> need to use the APIs that require the inclusion of <sys/cpuset.h>. MFC after: 1 week	2015-04-30 22:23:22 +00:00
Neel Natu	c07a0648ec	When an instruction cannot be decoded just return to userspace so bhyve(8) can dump the instruction bytes. Requested by: grehan MFC after: 1 week	2015-04-30 21:00:47 +00:00
Neel Natu	7d786ee2a9	Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs. This is required for booting Windows guests. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks	2015-04-30 19:23:50 +00:00
John Baldwin	ed95805e90	Remove support for Xen PV domU kernels. Support for HVM domU kernels remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes	2015-04-30 15:48:48 +00:00
Neel Natu	787fb3d026	Re-implement RTC current time calculation to eliminate the possibility of losing time. The problem with the earlier implementation was that the uptime value used by 'vrtc_curtime()' could be different than the uptime value when 'vrtc_time_update()' actually updated 'base_uptime'. Fix this by calculating and updating the (rtctime, uptime) tuple together. MFC after: 2 weeks	2015-04-29 23:44:28 +00:00
Wei Hu	da2f98a1cf	Microsoft vmbus, storage and other related driver enhancements for HyperV. - Vmbus multi channel support. - Vector interrupt support. - Signal optimization. - Storvsc driver performance improvement. - Scatter and gather support for storvsc driver. - Minor bug fix for KVP driver. Thanks royger, jhb and delphij from FreeBSD community for the reviews and comments. Also thanks Hovy Xu from NetApp for the contributions to the storvsc driver. PR: 195238 Submitted by: whu Reviewed by: royger, jhb, delphij Approved by: royger MFC after: 2 weeks Relnotes: yes Sponsored by: Microsoft OSTC	2015-04-29 10:12:34 +00:00
Neel Natu	b8070ef5b1	Emulate the 'bit test' instruction. Windows 7 uses 'bit test' to check the 'Delivery Status' bit in APIC ICR register. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks	2015-04-29 02:01:46 +00:00
Neel Natu	f39630c2d6	Implement the century byte in the RTC. Some guests require this field to be properly set. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks	2015-04-28 23:44:47 +00:00
Tycho Nightingale	57f7026c0f	STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation. Reviewed by: neel	2015-04-25 19:02:06 +00:00
Konstantin Belousov	02c26f81a7	Move common code from sys/i386/i386/mp_machdep.c and sys/amd64/amd64/mp_machdep.c, to the new common x86 source sys/x86/x86/mp_x86.c. Proposed and reviewed by: jhb Review: https://reviews.freebsd.org/D2347 Sponsored by: The FreeBSD Foundation	2015-04-24 16:20:56 +00:00
John Baldwin	179fa75e6e	Reassign copyright statements on several files from Advanced Computing Technologies LLC to Hudson River Trading LLC. Approved by: Hudson River Trading LLC (who owns ACT LLC) MFC after: 1 week	2015-04-23 14:22:20 +00:00
Marcelo Araujo	dbec2c5c65	Missing break in switch case. Differential Revision: D2342 Reviewed by: neel	2015-04-23 02:50:06 +00:00
Konstantin Belousov	dfe7b3bfbc	Move some common code from sys/amd64/amd64/machdep.c and sys/i386/i386/machdep.c to new file sys/x86/x86/cpu_machdep.c. Most of the code is related to the idle handling. Discussed with: pluknet Sponsored by: The FreeBSD Foundation	2015-04-22 12:32:14 +00:00
Konstantin Belousov	19b5b56d7f	Remove duplicate definitions of MWAIT_CX hints. Identical defines in specialreg.h are enough. Discussed with: mav Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-20 08:25:55 +00:00
Konstantin Belousov	1c8e7232b4	Remove lazy pmap switch code from i386. Naive benchmark with md(4) shows no difference with the code removed. On both amd64 and i386, assert that a released pmap is not active. Proposed and reviewed by: alc Discussed with: Svatopluk Kraus <onwahe@gmail.com>, peter Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-04-18 21:23:16 +00:00
Neel Natu	631947366f	Relax the check on which vectors can be delivered through the APIC. According to the Intel SDM vectors 16 through 255 are allowed to be delivered via the local APIC. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks	2015-04-16 22:44:51 +00:00
Neel Natu	7c0b0b9ad3	Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly. MFC after: 1 week	2015-04-16 20:15:47 +00:00
Ed Maste	27513bd396	Use explicitly sized types in EFI module metadata This will allow the same metadata struct to be used on all platforms. Differential Revision: https://reviews.freebsd.org/D2275 Reviewed by: jhb	2015-04-10 19:26:45 +00:00
Tycho Nightingale	fb5e95b4f6	Enhance the support for Group 1 Extended opcodes: * Implemement the 0x81 and 0x83 CMP instructions. * Implemement the 0x83 AND instruction. * Implemement the 0x81 OR instruction. Reviewed by: neel	2015-04-06 12:22:41 +00:00
Eitan Adler	f5cd4abcd0	adrian asked me to revert and get more testing	2015-04-05 05:18:14 +00:00
Eitan Adler	7e937fbfc2	head/sys/amd64/amd64/support.S: unroll loop unroll the loop in ENTRY(pagezero) acc' to the submitter this results in a reproducible 1% perf improvement under buildworld like workload I validated correctness and run-testing, but not performance impact Submitted by: lidl@pix.net Reviewed by: adrian PR: 199151 MFC After: 1 month	2015-04-05 05:07:24 +00:00
Ryan Stone	f2c2231e0c	Fix integer truncation bug in malloc(9) A couple of internal functions used by malloc(9) and uma truncated a size_t down to an int. This could cause any number of issues (e.g. indefinite sleeps, memory corruption) if any kernel subsystem tried to allocate 2GB or more through malloc. zfs would attempt such an allocation when run on a system with 2TB or more of RAM. Note to self: When this is MFCed, sparc64 needs the same fix. Differential revision: https://reviews.freebsd.org/D2106 Reviewed by: kib Reported by: Michael Fuckner <michael@fuckner.net> Tested by: Michael Fuckner <michael@fuckner.net> MFC after: 2 weeks	2015-04-01 12:42:26 +00:00
Tycho Nightingale	ef7c2a82ed	Fix "MOVS" instruction memory to MMIO emulation. Currently updates to %rdi, %rsi, etc are inadvertently bypassed along with the check to see if the instruction needs to be repeated per the 'rep' prefix. Add "MOVS" instruction support for the 'MMIO to MMIO' case. Reviewed by: neel	2015-04-01 00:15:31 +00:00
Konstantin Belousov	333d295946	Provide workaround for a performance issue with the popcnt instruction on Intel processors. Clear spurious dependency by explicitely xoring the destination register of popcnt. Use bitcount64() instead of re-implementing SWAR locally, for processors without popcnt instruction. Reviewed by: jhb Discussed with: jilles (previous version) Sponsored by: The FreeBSD Foundation	2015-03-31 01:44:07 +00:00
John Baldwin	2f22c84c31	Wait 100 microseconds for a local APIC to dispatch each startup-related IPI rather than 20. The MP 1.4 specification states in Appendix B.2: "A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions". (Note that this appears to be separate from the 10 millisecond (INIT) and 200 microsecond (STARTUP) waits after the IPIs are dispatched.) The Intel SDM is silent on this issue as far as I can tell. At least some hardware requires 60 microseconds as noted in the PR, so bump this to 100 to be on the safe side. PR: 197756 Reported by: zaphod@berentweb.com MFC after: 1 week	2015-03-30 20:13:22 +00:00
Konstantin Belousov	a9eb27a990	Make it possible for the signal handler to act on #ss. Load the canonical user data segment' selector into %ss when calling the handler. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-03-28 09:03:54 +00:00
Konstantin Belousov	f024d1a3a3	The #ss fault handler erronously does not check for the fault originated from the return to usermode. #ss must be handled same as #np. Reported by: Andrew Lutomirski through secteam Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-03-28 09:02:19 +00:00
Neel Natu	f213ae0be6	Fix the RTC device model to operate correctly in 12-hour mode. The following table documents the values in the RTC 'hour' field in the two modes: Hour-of-the-day 12-hour mode 24-hour mode 12 AM 12 0 [1-11] AM [1-11] [1-11] 12 PM 0x80 \| 12 12 [1-11] PM 0x80 \| [1-11] [13-23] Reported by: Julian Hsiao (madoka@nyanisore.net) MFC after: 1 week	2015-03-28 02:55:16 +00:00
Tycho Nightingale	e4f605ee81	When fetching an instruction in non-64bit mode, consider the value of the code segment base address. Also if an instruction doesn't support a mod R/M (modRM) byte, don't be concerned if the CPU is in real mode. Reviewed by: neel	2015-03-24 17:12:36 +00:00
Konstantin Belousov	0a110d5b17	Use VT-d interrupt remapping block (IR) to perform FSB messages translation. In particular, despite IO-APICs only take 8bit apic id, IR translation structures accept 32bit APIC Id, which allows x2APIC mode to function properly. Extend msi_cpu of struct msi_intrsrc and io_cpu of ioapic_intsrc to full int from one byte. KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid bringing all dmar headers into interrupt code. The non-PCI(e) devices which generate message interrupts on FSB require special handling. The HPET FSB interrupts are remapped, while DMAR interrupts are not. For each msi and ioapic interrupt source, the iommu cookie is added, which is in fact index of the IRE (interrupt remap entry) in the IR table. Cookie is made at the source allocation time, and then used at the map time to fill both IRE and device registers. The MSI address/data registers and IO-APIC redirection registers are programmed with the special values which are recognized by IR and used to restore the IRE index, to find proper delivery mode and target. Map all MSI interrupts in the block when msi_map() is called. Since an interrupt source setup and dismantle code are done in the non-sleepable context, flushing interrupt entries cache in the IR hardware, which is done async and ideally waits for the interrupt, requires busy-wait for queue to drain. The dmar_qi_wait_for_seq() is modified to take a boolean argument requesting busy-wait for the written sequence number instead of waiting for interrupt. Some interrupts are configured before IR is initialized, e.g. ACPI SCI. Add intr_reprogram() function to reprogram all already configured interrupts, and call it immediately before an IR unit is enabled. There is still a small window after the IO-APIC redirection entry is reprogrammed with cookie but before the unit is enabled, but to fix this properly, IR must be started much earlier. Add workarounds for 5500 and X58 northbridges, some revisions of which have severe flaws in handling IR. Use the same identification methods as employed by Linux. Review: https://reviews.freebsd.org/D1892 Reviewed by: neel Discussed with: jhb Tested by: glebius, pho (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-03-19 13:57:47 +00:00
Jack F Vogel	758cc3dcd5	Update to the Intel ixgbe driver: - Split the driver into independent pf and vf loadables. This is in preparation for SRIOV support which will be following shortly. This also allows us to keep a seperate revision control over the two parts, making for easier sustaining. - Make the TX/RX code a shared/seperated file, in the old code base the ixv code would miss fixes that went into ixgbe, this model will eliminate that problem. - The driver loadables will now match the device names, something that has been requested for some time. - Rather than a modules/ixgbe there is now modules/ix and modules/ixv - It will also be possible to make your static kernel with only one or the other for streamlined installs, or both. Enjoy! Submitted by: jfv and erj	2015-03-17 18:32:28 +00:00
Alexander Motin	c077e6287f	Report ARAT (APIC-Timer-always-running) feature for virtual CPU. This makes FreeBSD guest to not avoid using LAPIC timer, preferring HPET due to worries about non-existing for virtual CPUs deep sleep states. Benchmarks of usleep(1) on guest and host show such extra latencies: - 51us for virtual HPET, - 22us for virtual LAPIC timer, - 22us for host HPET and - 3us for host LAPIC timer. MFC after: 2 weeks	2015-03-16 11:57:03 +00:00
Neel Natu	18a2b08e65	Use lapic_ipi_alloc() to dynamically allocate IPI slots needed by bhyve when vmm.ko is loaded. Also relocate the 'justreturn' IPI handler to be alongside all other handlers. Requested by: kib	2015-03-14 02:32:08 +00:00
John Baldwin	0915f6f2ac	Only schedule interrupts on a single hyperthread of a modern Intel CPU core by default. Previously we used a single hyperthread on Pentium4-era cores but used both hyperthreads on more recent CPUs. MFC after: 2 weeks	2015-03-06 20:34:28 +00:00
Tycho Nightingale	76b3c718be	When ICW1 is issued the edge sense circuit is reset which means that following an initialization a low-to-high transistion is necesary to generate an interrupt. Reviewed by: neel	2015-03-06 02:05:45 +00:00
Neel Natu	7d69783ae4	Fix warnings/errors when building vmm.ko with gcc: - fix warning about comparison of 'uint8_t v_tpr >= 0' always being true. - fix error triggered by an empty clobber list in the inline assembly for "clgi" and "stgi" - fix error when compiling "vmload %rax", "vmrun %rax" and "vmsave %rax". The gcc assembler does not like the explicit operand "%rax" while the clang assembler requires specifying the operand "%rax". Fix this by encoding the instructions using the ".byte" directive. Reported by: julian MFC after: 1 week	2015-03-02 20:13:49 +00:00
Ryan Stone	9bfb1e36d9	Implement interface to create SR-IOV Virtual Functions Implement the interace to create SR-IOV Virtual Functions (VFs). When a driver registers that they support SR-IOV by calling pci_setup_iov(), the SR-IOV code creates a new node in /dev/iov for that device. An ioctl can be invoked on that device to create VFs and have the driver initialize them. At this point, allocating memory I/O windows (BARs) is not supported. Differential Revision: https://reviews.freebsd.org/D76 Reviewed by: jhb MFC after: 1 month Sponsored by: Sandvine Inc.	2015-03-01 00:40:09 +00:00
Ryan Stone	a15f820a27	Allow passthrough devices to be hinted. Allow the ppt driver to attach to devices that were hinted to be passthrough devices by the PCI code creating them with a driver name of "ppt". Add a tunable that allows the IOMMU to be forced to be used. With SR-IOV passthrough devices the VFs may be created after vmm.ko is loaded. The current code will not initialize the IOMMU in that case, meaning that the passthrough devices can't actually be used. Differential Revision: https://reviews.freebsd.org/D73 Reviewed by: neel MFC after: 1 month Sponsored by: Sandvine Inc.	2015-03-01 00:39:48 +00:00
Konstantin Belousov	81f94399a9	Supposed fix for some SandyBridge mobile CPUs hang on AP startup when x2APIC mode is detected and enabled. Current theory is that switching the APIC mode while an IPI is in flight might be the issue. Postpone switching to x2APIC mode until we are guaranteed that all starting IPIs are already send and aknowledged. Use aps_ready signal as an indication that the BSP is done with us. Tested by: adrian Sponsored by: The FreeBSD Foundation MFC after: 2 months	2015-02-28 20:37:38 +00:00
Neel Natu	a318f7ddb2	Always emulate MSR_PAT on Intel processors and don't rely on PAT save/restore capability of VT-x. This lets bhyve run nested in older VMware versions that don't support the PAT save/restore capability. Note that the actual value programmed by the guest in MSR_PAT is irrelevant because bhyve sets the 'Ignore PAT' bit in the nested PTE. Reported by: marcel Tested by: Leon Dang (ldang@nahannisys.com) Sponsored by: Nahanni Systems MFC after: 2 weeks	2015-02-24 05:35:15 +00:00
John Baldwin	8935302fe1	Ensure that the supplied data length is large enough to hold the base FPU state to avoid passing a negative length to fpusetregs() / npxsetregs(). Differential Revision: https://reviews.freebsd.org/D1861 Reviewed by: kib, emaste	2015-02-18 23:34:03 +00:00
Konstantin Belousov	5f674c4cbd	Initialize x2APIC mode on the resume path before accessing LAPIC. Remove unneeded disable of LAPIC in the native_lapic_xapic_mode(). We attempt to send wakeup IPI on the resume path right after BSP wakeup, so disabling is wrong. Reported and tested by: glebius, "Ranjan1018 ." <214748mv@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 2 months	2015-02-16 21:56:19 +00:00
Mark Johnston	7f192d49b7	Add support for decoding multibyte NOPs. Differential Revision: https://reviews.freebsd.org/D1830 Reviewed by: jhb, kib MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Divison	2015-02-13 01:35:53 +00:00
Konstantin Belousov	4c918926cd	Add x2APIC support. Enable it by default if CPU is capable. The hw.x2apic_enable tunable allows disabling it from the loader prompt. To closely repeat effects of the uncached memory ops when accessing registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded by mfence, except for the EOI notifications. This is probably too strict, only ICR writes to send IPI require serialization to ensure that other CPUs see the previous actions when IPI is delivered. This may be changed later. In vmm justreturn IPI handler, call doreti_iret instead of doing iretd inline, to handle corner conditions. Note that the patch only switches LAPICs into x2APIC mode. It does not enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC MADT entries and doing interrupts remapping, but is the required step on the way. Reviewed by: neel Tested by: pho (real hardware), neel (on bhyve) Discussed with: jhb, grehan Sponsored by: The FreeBSD Foundation MFC after: 2 months	2015-02-09 21:00:56 +00:00
John Baldwin	f418f79ce2	Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that the INIT IPIs here are invalid, but other systems follow the MP spec instead. While here, fix the IPI wait routine to accept a timeout in microseconds instead of a raw spin count, and don't spin forever during AP startup. Instead, panic if a STARTUP IPI is not delivered after 20 us. PR: 196542 Differential Revision: https://reviews.freebsd.org/D1719 MFC after: 2 weeks	2015-02-06 18:19:59 +00:00
Bryan Venteicher	d3ccddf3ce	Generalized parts of the XEN timer code into a generic pvclock KVM clock shares the same data structures between the guest and the host as Xen so it makes sense to just have a single copy of this code. Differential Revision: https://reviews.freebsd.org/D1429 Reviewed by: royger (eariler version) MFC after: 1 month	2015-02-04 08:26:43 +00:00
Konstantin Belousov	206f09eb46	Do not qualify the mcontext_t mcp argument for set_mcontext(9) as const. On x86, even after the machine context is supposedly read into the struct ucontext, lazy FPU state save code might only mark the FPU data as hardware-owned. Later, set_fpcontext() needs to fetch the state from hardware, modifying the mcp. The set_mcontext(9) is called from sigreturn(2) and setcontext(2) implementations and old create_thread(2) interface, which throw the *mcp out after the set_mcontext() call. Reported by: dim Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-31 21:43:46 +00:00
Roger Pau Monné	f336605b72	amd64: allow base memory segment to start at address different than 0 Current code requires that the first physical memory segment starts at 0, but this is not really needed. We only need to make sure the bootstrap code and page tables for APs are allocated below 4GB. This patch removes this requirement and allows booting a Dell R710 from UEFI, where the first physical memory segment starts at 0x10000. Sponsored by: Citrix Systems R&D Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D1417	2015-01-26 08:42:47 +00:00
John Baldwin	8d44440ad9	If the boot-time memory test is enabled, output a dot ('.') for each GB of RAM tested so people watching the console can see that the machine is making progress and not hung. PR: 196650 Submitted by: Ravi Pokala <rpokala@panasas.com> Suggestions from: Eric van Gyzen <eric@vangyzen.net> MFC after: 2 weeks	2015-01-25 20:16:45 +00:00
Dag-Erling Smørgrav	d4ff1726f9	Remove ISA NICs. Anyone still using these on amd64 can build their own kernel.	2015-01-25 12:02:38 +00:00
Neel Natu	e09ff17129	Add macro to identify AVIC capability (advanced virtual interrupt controller) in AMD processors. Submitted by: Dmitry Luhtionov (dmitryluhtionov@gmail.com)	2015-01-24 00:35:49 +00:00
Neel Natu	7534635359	MOVS instruction emulation. These instructions are emitted by 'bus_space_read_region()' when accessing MMIO regions. Since MOVS can be used with a repeat prefix start decoding the REPZ and REPNZ prefixes. Also start decoding the segment override prefix since MOVS allows overriding the source operand segment register. Tested by: tychon MFC after: 1 week	2015-01-19 06:53:31 +00:00
Neel Natu	d087a39935	Simplify instruction restart logic in bhyve. Keep track of the next instruction to be executed by the vcpu as 'nextrip'. As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should start execution. Also, instruction restart happens implicitly via 'vm_inject_exception()' or explicitly via 'vm_restart_instruction()'. The APIs behave identically in both kernel and userspace contexts. The main beneficiary is the instruction emulation code that executes in both contexts. bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length' as readonly: - Restarting an instruction is now done by calling 'vm_restart_instruction()' as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout()) - Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch()) Differential Revision: https://reviews.freebsd.org/D1526 Reviewed by: grehan MFC after: 2 weeks	2015-01-18 03:08:30 +00:00
Navdeep Parhar	ca7fe84a61	Plug cxgbe(4) back into !powerpc && !arm builds, instead of building it on amd64 only.	2015-01-16 01:39:24 +00:00
Roger Pau Monné	ca49b3342d	loader: implement multiboot support for Xen Dom0 Implement a subset of the multiboot specification in order to boot Xen and a FreeBSD Dom0 from the FreeBSD bootloader. This multiboot implementation is tailored to boot Xen and FreeBSD Dom0, and it will most surely fail to boot any other multiboot compilant kernel. In order to detect and boot the Xen microkernel, two new file formats are added to the bootloader, multiboot and multiboot_obj. Multiboot support must be tested before regular ELF support, since Xen is a multiboot kernel that also uses ELF. After a multiboot kernel is detected, all the other loaded kernels/modules are parsed by the multiboot_obj format. The layout of the loaded objects in memory is the following; first the Xen kernel is loaded as a 32bit ELF into memory (Xen will switch to long mode by itself), after that the FreeBSD kernel is loaded as a RAW file (Xen will parse and load it using it's internal ELF loader), and finally the metadata and the modules are loaded using the native FreeBSD way. After everything is loaded we jump into Xen's entry point using a small trampoline. The order of the multiboot modules passed to Xen is the following, the first module is the RAW FreeBSD kernel, and the second module is the metadata and the FreeBSD modules. Since Xen will relocate the memory position of the second multiboot module (the one that contains the metadata and native FreeBSD modules), we need to stash the original modulep address inside of the metadata itself in order to recalculate its position once booted. This also means the metadata must come before the loaded modules, so after loading the FreeBSD kernel a portion of memory is reserved in order to place the metadata before booting. In order to tell the loader to boot Xen and then the FreeBSD kernel the following has to be added to the /boot/loader.conf file: xen_cmdline="dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga" xen_kernel="/boot/xen" The first argument contains the command line that will be passed to the Xen kernel, while the second argument is the path to the Xen kernel itself. This can also be done manually from the loader command line, by for example typing the following set of commands: OK unload OK load /boot/xen dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga OK load kernel OK load zfs OK load if_tap OK load ... OK boot Sponsored by: Citrix Systems R&D Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D517 For the Forth bits: Submitted by: Julien Grall <julien.grall AT citrix.com>	2015-01-15 16:27:20 +00:00
Warner Losh	af8cf71035	New MINIMAL kernel config. The goal with this configuration is to only compile in those options in GENERIC that cannot be loaded as modules. ufs is still included because many of its options aren't present in the kernel module. There's some other exceptions documented in the file. This is part of some work to get more things automatically loading in the hopes of obsoleting GENERIC one day.	2015-01-15 00:42:06 +00:00
Neel Natu	07820b4b4c	Fix typo (missing comma). MFC after: 3 days	2015-01-14 07:18:51 +00:00
Neel Natu	c9c75df48c	'struct vm_exception' was intended to be used only as the collateral for the VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping track pending exceptions for a vcpu. This in turn causes confusion because some fields in 'struct vm_exception' like 'vcpuid' make sense only in the ioctl context. It also makes it harder to add or remove structure fields. Fix this by using 'struct vm_exception' only to communicate information from userspace to vmm.ko when injecting an exception. Also, add a field 'restart_instruction' to 'struct vm_exception'. This field is set to '1' for exceptions where the faulting instruction is restarted after the exception is handled. MFC after: 1 week	2015-01-13 22:00:47 +00:00
Konstantin Belousov	18cc2ff047	Revert r263475: TDP_DEVMEMIO no longer needed, since amd64 /dev/kmem does not access kernel mappings directly. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-12 08:58:07 +00:00
Konstantin Belousov	b5243bd4f7	Revert r276600: PHYS_TO_DMAP_RAW() and DMAP_TO_PHYS_RAW() are no longer used, remove them. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-12 07:50:55 +00:00
Konstantin Belousov	22bb3201ac	Fix several issues with /dev/mem and /dev/kmem devices on amd64. For /dev/mem, when requested physical address is not accessible by the direct map, do temporal remaping with the caching attribute 'uncached'. Limit the accessible addresses by MAXPHYADDR, since the architecture disallowes writing non-zero into reserved bits of ptes (or setting garbage into NX). For /dev/kmem, only access existing kernel mappings for direct map region. For all other addresses, obtain a physical address of the mapping and fall back to the /dev/mem mechanism. This ensures that /dev/kmem i/o does not fault even if the accessed region is changed in parallel, by using either direct map or temporal mapping. For both devices, operate on one page by iteration. Do not return error if any bytes were moved around, return the (partial) bytes count to userspace. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-12 07:48:22 +00:00
Konstantin Belousov	b1752aa0ea	For x86, read MAXPHYADDR, defined in SDM vol 3 4.1.4 Enumeration of Paging Features by CPUID as CPUID.80000008H:EAX[7:0], into variable cpu_maxphyaddr. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-12 07:36:25 +00:00
Mark Johnston	bdb9ab0dd9	Factor out duplicated code from dumpsys() on each architecture into generic code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly identical and simply redefine a number of constants and helper subroutines; a generic implementation will make it easier to implement features around kernel core dumps. This change does not alter any minidump code and should have no functional impact. PR: 193873 Differential Revision: https://reviews.freebsd.org/D904 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Reviewed by: jhibbits (earlier version) Sponsored by: EMC / Isilon Storage Division	2015-01-07 01:01:39 +00:00
Neel Natu	2ce1242309	Clear blocking due to STI or MOV SS in the hypervisor when an instruction is emulated or when the vcpu incurs an exception. This matches the CPU behavior. Remove special case code in HLT processing that was clearing the interrupt shadow. This is now redundant because the interrupt shadow is always cleared when the vcpu is resumed after an instruction is emulated. Reported by: David Reed (david.reed@tidalscale.com) MFC after: 2 weeks	2015-01-06 19:04:02 +00:00
John Baldwin	3e32dff52c	Remove "New" label from NFSCL/NFSD now that they are the only NFS client/server. While here, remove duplicate NFSCL from sys/conf/NOTES. Approved by: rmacklem	2015-01-06 16:15:57 +00:00
John Baldwin	92597e064b	On some Intel CPUs with a P-state but not C-state invariant TSC the TSC may also halt in C2 and not just C3 (it seems that in some cases the BIOS advertises its C3 state as a C2 state in _CST). Just play it safe and disable both C2 and C3 states if a user forces the use of the TSC as the timecounter on such CPUs. PR: 192316 Differential Revision: https://reviews.freebsd.org/D1441 No objection from: jkim MFC after: 1 week	2015-01-05 20:44:44 +00:00
Konstantin Belousov	a40e51e355	For /dev/mem and /dev/kmem accesses, avoid asserting that addresses are within direct map. We want to return error instead of panicing. PR: 194995 Sponsored by: The FreeBSD Foundation	2015-01-03 01:28:58 +00:00
Scott Long	a614ff4d01	Fix a missed comment from r276526.	2015-01-02 15:46:54 +00:00
Konstantin Belousov	91a82f9585	Callers of pmap_kextract() cannot distinguish between failure and physical address zero. Assume that the lowest page is always mapped by direct map. This restores access to the page at zero through /dev/mem after r263475. Reported and tested by: neel Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-02 01:05:08 +00:00
Konstantin Belousov	ae7c85e9b0	Actually remove GIANT_REQUIRED, declared but not done in r263475. Style. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-02 01:00:38 +00:00
Dmitry Chagin	7b15ee61fc	Regen after r276508, r276509.	2015-01-01 18:43:31 +00:00
Dmitry Chagin	0161038329	Correct an argument status of wait4 syscall for Linuxulator. MFC after: 1 week	2015-01-01 18:37:03 +00:00
Navdeep Parhar	183dc9860a	Temporarily unplug cxgbe(4) from !amd64 builds.	2014-12-31 20:34:12 +00:00
Alan Cox	d866a563d4	The physical memory allocator supports the use of distinct free lists for managing pages from different address ranges. Generally speaking, this feature is used to increase the likelihood that physical pages are available that can meet special DMA requirements or can be accessed through a limited-coverage direct mapping (e.g., MIPS). However, prior to this change, the configuration of the free lists was static, i.e., it was determined at compile time. Consequentally, free lists could be created for address ranges that held no actual pages, for example, on 32-bit MIPS- based systems with 512 MB or less of physical memory. This change makes the creation of the free lists dynamic, i.e., it is based on the available physical memory at boot time. On 64-bit x86-based systems with 64 GB or more of physical memory, create free lists for managing pages with physical addresses below 4 GB. This change is to address reported problems with initializing devices that require the allocation of physical pages below 4 GB on some systems with 128 GB or more of physical memory. PR: 185727 Differential Revision: https://reviews.freebsd.org/D1274 Reviewed by: jhb, kib MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division	2014-12-31 00:54:38 +00:00
Neel Natu	cd86d3634b	Initialize all fields of 'struct vm_exception exception' before passing it to vm_inject_exception(). This fixes the issue that 'exception.cpuid' is uninitialized when calling 'vm_inject_exception()'. However, in practice this change is a no-op because vm_inject_exception() does not use 'exception.cpuid' for anything. Reported by: Coverity Scan CID: 1261297 MFC after: 3 days	2014-12-30 23:38:31 +00:00
Neel Natu	0dafa5cd4b	Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko. The new RTC emulation supports all interrupt modes: periodic, update ended and alarm. It is also capable of maintaining the date/time and NVRAM contents across virtual machine reset. Also, the date/time fields can now be modified by the guest. Since bhyve now emulates both the PIT and the RTC there is no need for "Legacy Replacement Routing" in the HPET so get rid of it. The RTC device state can be inspected via bhyvectl as follows: bhyvectl --vm=vm --get-rtc-time bhyvectl --vm=vm --set-rtc-time=<unix_time_secs> bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value> Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D1385 MFC after: 2 weeks	2014-12-30 22:19:34 +00:00
Neel Natu	95474bc26a	Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT' on an AMD/SVM host. MFC after: 1 week	2014-12-30 02:44:33 +00:00
Neel Natu	1a5934ef8e	Implement "special mask mode" in vatpic. OpenBSD guests always enable "special mask mode" during boot. As a result of r275952 this is flagged as an error and the guest cannot boot. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D1384 MFC after: 1 week	2014-12-28 00:53:52 +00:00
Konstantin Belousov	4cc6942f37	Change the way the lcall $7,$0 is reflected to usermode. Instead of setting call gate, which must be 64 bit, put a code segment descriptor into ldt slot 0. This way, syscall shim does not switch temporary to 64bit trampoline, and does not create a window where signal delivery interrupts 64 bit mode (signal handler cannot return). The cost is shim running with non-zero based segment in %cs, which requires vfork() handling make more assumptions. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-27 23:19:08 +00:00
Poul-Henning Kamp	91b050b27b	Use compiled in default keymaps which are available both in syscons and vt.	2014-12-25 17:50:04 +00:00
Mark Johnston	cafe874475	Restore the trap type argument to the DTrace trap hook, removed in r268600. It's redundant at the moment since it can be obtained from the trapframe on the architectures where DTrace is supported, but this won't be the case with ARM.	2014-12-23 15:38:19 +00:00
Neel Natu	b053814333	Allow ktr(4) tracing of all guest exceptions via the tunable "hw.vmm.trace_guest_exceptions". To enable this feature set the tunable to "1" before loading vmm.ko. Tracing the guest exceptions can be useful when debugging guest triple faults. Note that there is a performance impact when exception tracing is enabled since every exception will now trigger a VM-exit. Also, handle machine check exceptions that happen during guest execution by vectoring to the host's machine check handler via "int $18". Discussed with: grehan MFC after: 2 weeks	2014-12-23 02:14:49 +00:00
Neel Natu	d66bcddce4	Emulate writes to the IA32_MISC_ENABLE MSR. PR: 196093 Reported by: db Tested by: db Discussed with: grehan MFC after: 1 week	2014-12-20 19:47:51 +00:00
Neel Natu	ac721e53ec	Various 8259 device model improvements: - implement 8259 "polled" mode. - set 'atpic->sfn' if bit 4 in ICW4 is set during master initialization. - report error if guest tries to enable the "special mask" mode. Differential Revision: https://reviews.freebsd.org/D1328 Reviewed by: tychon Reported by: grehan Tested by: grehan MFC after: 1 week	2014-12-20 04:57:45 +00:00
Neel Natu	e64c5af3f8	Fix 8259 IRQ priority resolver. Initialize the 8259 such that IRQ7 is the lowest priority. Reviewed by: tychon Differential Revision: https://reviews.freebsd.org/D1322 MFC after: 1 week	2014-12-17 03:04:43 +00:00
Konstantin Belousov	2d45c2d52d	The iret instruction may generate #np and #ss fault, besides #gp. When returning to usermode, the handler for that exceptions is also executed with wrong gs base. Handle all three possible faults in the same way, checking for iret fault, and performing full iret. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2014-12-16 18:28:33 +00:00
Neel Natu	09eced2549	For level triggered interrupts clear the PIC IRR bit when the interrupt pin is deasserted. Prior to this change each assertion on a level triggered irq pin resulted in two interrupts being delivered to the CPU. Differential Revision: https://reviews.freebsd.org/D1310 Reviewed by: tychon MFC after: 1 week	2014-12-16 06:33:57 +00:00
George V. Neville-Neil	bd19924f6b	This configuration file removes several debugging options, including WITNESS and INVARIANTS checking, which are known to have significant performance impact on running systems. When benchmarking new features this kernel should be used instead of the standard GENERIC. This kernel configuration should never appear outside of the HEAD of the FreeBSD tree.	2014-12-02 19:55:43 +00:00
Ed Maste	294246bb7d	Revert r274772: it is not valid on MIPS Reported by: sbruno	2014-11-25 03:50:31 +00:00
Peter Grehan	526c8885fd	Change the lower bound for guest vmspace allocation to 0 instead of using the VM_MIN_ADDRESS constant. HardenedBSD redefines VM_MIN_ADDRESS to be 64K, which results in bhyve VM startup failing. Guest memory is always assumed to start at 0 so use the absolute value instead. Reported by: Shawn Webb, lattera at gmail com Reviewed by: neel, grehan Obtained from: Oliver Pinter via HardenedBSD `23bd719ce1` MFC after: 1 week	2014-11-23 23:07:21 +00:00
John Baldwin	180e57e5c7	Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0). Differential Revision: https://reviews.freebsd.org/D1193 Reviewed by: kib MFC after: 2 weeks	2014-11-21 20:53:17 +00:00
Ed Maste	688fd61ae8	Use canonical __PIC__ flag It is automatically set when -fPIC is passed to the compiler. Reviewed by: dim, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D1179	2014-11-21 02:05:48 +00:00
Alan Cox	271f0f1219	Enable the use of VM_PHYSSEG_SPARSE on amd64 and i386, making it the default on i386 PAE. Previously, VM_PHYSSEG_SPARSE could not be used on amd64 and i386 because vm_page_startup() would not create vm_page structures for the kernel page table pages allocated during pmap_bootstrap() but those vm_page structures are needed when the kernel attempts to promote the corresponding kernel virtual addresses to superpage mappings. To address this problem, a new public function, vm_phys_add_seg(), is introduced and vm_phys_init() is updated to reflect the creation of vm_phys_seg structures by calls to vm_phys_add_seg(). Discussed with: Svatopluk Kraus MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division	2014-11-15 23:40:44 +00:00
Konstantin Belousov	eb81bf559c	Fix END()s for fueword and fueword64, match the name in END() with entry. Submitted by: Jeroen Hofstee <jeroen@myspectrum.nl> MFC after: 1 week	2014-11-15 21:25:17 +00:00
Scott Long	18e3d9f521	Extend earlier addition of stack frames to most of support.S. This makes stack traces in KDB, HWPMC, and DTrace much more reliable and useful. Reviewed by: kan, kib Obtained from: Netflix, Inc. MFC after: 5 days	2014-11-13 22:11:44 +00:00
Ed Maste	96699e86a3	Add workaround for vt efifb's early use of PHYS_TO_DMAP In vt_efifb_init the framebuffer's physaddr is passed to PHYS_TO_DMAP before the DMAP is setup. The result is not actually accessed until after the mapping is setup, though. Loosen the assertion in PHYS_TO_DMAP for now, to allow use when dmaplimit == 0. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D1142	2014-11-11 14:59:46 +00:00
Alexander V. Chernikov	603eaf792b	Renove faith(4) and faithd(8) from base. It looks like industry have chosen different (and more traditional) stateless/statuful NAT64 as translation mechanism. Last non-trivial commits to both faith(4) and faithd(8) happened more than 12 years ago, so I assume it is time to drop RFC3142 in FreeBSD. No objections from: net@	2014-11-09 21:33:01 +00:00
Gleb Smirnoff	2c59cd89c8	Remove unused includes. Reviewed by: kib	2014-11-09 19:58:30 +00:00
Konstantin Belousov	2818ac81d4	MFi386 r253328: Create a proper stack frame for amd64 version of bcopy(). Note that this also makes the stack properly aligned in the function, despite it is not strictly needed. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-08 11:56:26 +00:00
George V. Neville-Neil	169760a49b	Add support for netmap in GENERIC by default.	2014-11-05 06:22:37 +00:00
Bryan Venteicher	217eb1256d	Add VirtIO console to the x86 NOTES and files Requested by: jhb	2014-11-03 22:37:10 +00:00
John Baldwin	824fc46089	MFamd64: Add support for extended FPU states on i386. This includes support for AVX on i386. - Similar to amd64, move the FPU save area out of the PCB and instead store saved FPU state in a variable-sized buffer after the PCB on the stack. - To support the variable PCB location, alter the locore code to only use the bottom-most page of proc0stack for init386(). init386() returns the correct stack pointer to locore which adjusts the stack for thread0 before calling mi_startup(). - Don't bother setting cr3 in thread0's pcb in locore before calling init386(). It wasn't used (init386() overwrote it at the end) and it doesn't work with the variable-sized FPU save area. - Remove the new-bus attachment from npx. This was only ever useful for external co-processors using IRQ13, but those have not been supported for several years. npxinit() is now called much earlier during boot (init386()) similar to amd64. - Implement PT_{GET,SET}XSTATE and I386_GET_XFPUSTATE. - npxsave() is now only called from context switch contexts so it can use XSAVEOPT. Differential Revision: https://reviews.freebsd.org/D1058 Reviewed by: kib Tested on: FreeBSD/i386 VM under bhyve on Intel i5-2520	2014-11-02 22:58:30 +00:00
John Baldwin	01e1933dcc	Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel	2014-10-28 19:17:44 +00:00
Konstantin Belousov	0a2c94b86e	Replace some calls to fuword() by fueword() with proper error checking. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:28:20 +00:00
Konstantin Belousov	4f3dc90023	Add fueword(9) and casueword(9) functions. They are like fuword(9) and casuword(9), but do not mix value read and indication of fault. I know (or remember) enough assembly to handle x86 and powerpc. For arm, mips and sparc64, implement fueword() and casueword() as wrappers around fuword() and casuword(), which means that the functions cannot distinguish between -1 and fault. On architectures where fueword() and casueword() are native, implement fuword() and casuword() using fueword() and casuword(), to reduce assembly code duplication. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks (ia64 needs treating)	2014-10-28 15:22:13 +00:00
Marcelo Araujo	d018cd6f5e	Reported by: Coverity CID: 1249760 Reviewed by: neel Approved by: neel Sponsored by: QNAP Systems Inc.	2014-10-28 07:19:02 +00:00
Peter Grehan	f1be09bd95	Remove bhyve SVM feature printf's now that they are available in the general CPU feature detection code. Reviewed by: neel	2014-10-27 22:20:51 +00:00
Neel Natu	f0c8263e55	Change the type of the first argument to the I/O emulation handlers to 'struct vm '. Previously it used to be a 'void ' but there is no reason to hide the actual type from the handler. Discussed with: tychon MFC after: 1 week	2014-10-26 19:03:06 +00:00
Alan Cox	d6e53ebe5e	By the time that pmap_init() runs, vm_phys_segs[] has been initialized. Obtaining the end of memory address from vm_phys_segs[] is a little easier than obtaining it from phys_avail[]. Discussed with: Svatopluk Kraus	2014-10-26 17:56:47 +00:00
Neel Natu	160ef77abf	Move the ACPI PM timer emulation into vmm.ko. This reduces variability during timer calibration by keeping the emulation "close" to the guest. Additionally having all timer emulations in the kernel will ease the transition to a per-VM clock source (as opposed to using the host's uptime keep track of time). Discussed with: grehan	2014-10-26 04:44:28 +00:00
Neel Natu	31b117bec9	Don't pass the 'error' return from an I/O port handler directly to vm_run(). Most I/O port handlers return -1 to signal an error. If this value is returned without modification to vm_run() then it leads to incorrect behavior because '-1' is interpreted as ERESTART at the system call level. Fix this by always returning EIO to signal an error from an I/O port handler. MFC after: 1 week	2014-10-26 03:03:41 +00:00
John Baldwin	7d313e7bdb	Add COMPAT_FREEBSD9 and COMPAT_FREEBSD10 options to wrap code that provides compatability for FreeBSD 9.x and 10.x binaries. Enable these options in kernel configs that enable other COMPAT_FREEBSD<n> options.	2014-10-24 19:58:24 +00:00
Roger Pau Monné	927dc0e02a	amd64: make uiomove_fromphys functional for pages not mapped by the DMAP Place the code introduced in r268660 into a separate function that can be called from uiomove_fromphys. Instead of pre-allocating two KVA pages use vmem_alloc to allocate them on demand when needed. This prevents blocking if a page fault is taken while physical addresses from outside the DMAP are used, since the lock is now removed. Also introduce a safety catch in PHYS_TO_DMAP and DMAP_TO_PHYS. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D947 amd64/amd64/pmap.c: - Factor out the code to deal with non DMAP addresses from pmap_copy_pages and place it in pmap_map_io_transient. - Change the code to use vmem_alloc instead of a set of pre-allocated pages. - Use pmap_qenter and don't pin the thread if there can be page faults. amd64/amd64/uio_machdep.c: - Use pmap_map_io_transient in order to correctly deal with physical addresses not covered by the DMAP. amd64/include/pmap.h: - Add the prototypes for the new functions. amd64/include/vmparam.h: - Add safety catches to make sure PHYS_TO_DMAP and DMAP_TO_PHYS are only used with addresses covered by the DMAP.	2014-10-24 09:48:58 +00:00
Roger Pau Monné	bf7313e3b7	xen: implement the privcmd user-space device This device is only attached to priviledged domains, and allows the toolstack to interact with Xen. The two functions of the privcmd interface is to allow the execution of hypercalls from user-space, and the mapping of foreign domain memory. Sponsored by: Citrix Systems R&D i386/include/xen/hypercall.h: amd64/include/xen/hypercall.h: - Introduce a function to make generic hypercalls into Xen. xen/interface/xen.h: xen/interface/memory.h: - Import the new hypercall XENMEM_add_to_physmap_range used by auto-translated guests to map memory from foreign domains. dev/xen/privcmd/privcmd.c: - This device has the following functions: - Allow user-space applications to make hypercalls into Xen. - Allow user-space applications to map memory from foreign domains, this is accomplished using the newly introduced hypercall (XENMEM_add_to_physmap_range). xen/privcmd.h: - Public ioctl interface for the privcmd device. x86/xen/hvm.c: - Remove declaration of hypercall_page, now it's declared in hypercall.h. conf/files: - Add the privcmd device to the build process.	2014-10-22 17:07:20 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Neel Natu	a78dc03254	Merge projects/bhyve_svm into HEAD. After this change bhyve supports AMD processors with the SVM/AMD-V hardware extensions. More details available here: https://lists.freebsd.org/pipermail/freebsd-virtualization/2014-October/002905.html Submitted by: Anish Gupta (akgupt3@gmail.com) Tested by: Benjamin Perrault (ben.perrault@gmail.com) Tested by: Willem Jan Withagen (wjw@digiware.nl)	2014-10-21 07:10:43 +00:00
Neel Natu	a5045426db	Fix a race in pmap_emulate_accessed_dirty() that could trigger a EPT misconfiguration VM-exit. An EPT misconfiguration is triggered when the processor encounters a PTE that is writable but not readable (WR=10). On processors that require A/D bit emulation PG_M and PG_A map to EPT_PG_WRITE and EPT_PG_READ respectively. If the PTE is updated as in the following code snippet: pte \|= PG_M; pte \|= PG_A; then it is possible for another processor to observe the PTE after the PG_M (aka EPT_PG_WRITE) bit is set but before PG_A (aka EPT_PG_READ) bit is set. This will trigger an EPT misconfiguration VM-exit on the other processor. Reported by: rodrigc Reviewed by: grehan MFC after: 3 days	2014-10-21 01:06:58 +00:00
Neel Natu	e011dc962c	Merge from projects/bhyve_svm all the changes outside vmm.ko or bhyve utilities: Add support for AMD's nested page tables in pmap.c: - Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit) for a pmap of type PT_RVI. - Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of type PT_EPT or PT_RVI. Add CPU_SET_ATOMIC_ACQ(num, cpuset): This is used when activating a vcpu in the nested pmap. Using the 'acquire' variant guarantees that the load of the 'pm_eptgen' will happen only after the vcpu is activated in 'pm_active'. Add defines for various AMD-specific MSRs. Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-10-20 18:09:33 +00:00
Neel Natu	e1a172e1c2	IFC @r273214	2014-10-20 02:57:30 +00:00
Neel Natu	867b59607c	IFC @r273206	2014-10-19 23:05:18 +00:00
Neel Natu	592cd7d3be	Don't advertise the "OS visible workarounds" feature in cpuid.80000001H:ECX. bhyve doesn't emulate the MSRs needed to support this feature at this time. Don't expose any model-specific RAS and performance monitoring features in cpuid leaf 80000007H. Emulate a few more MSRs for AMD: TSEG base address, TSEG address mask and BIOS signature and P-state related MSRs. This eliminates all the unimplemented MSRs accessed by Linux/x86_64 kernels 2.6.32, 3.10.0 and 3.17.0.	2014-10-19 21:38:58 +00:00
Neel Natu	65d5111ac1	Don't advertise support for the NodeID MSR since bhyve doesn't emulate it.	2014-10-18 05:39:32 +00:00
Warner Losh	b82e2e94e2	Fix build to not bogusly always rebuild vmm.ko. Rename vmx_assym.s to vmx_assym.h to reflect that file's actual use and update vmx_support.S's include to match. Add vmx_assym.h to the SRCS to that it gets properly added to the dependency list. Add vmx_support.S to SRCS as well, so it gets built and needs fewer special-case goo. Remove now-redundant special-case goo. Finally, vmx_genassym.o doesn't need to depend on a hand expanded ${_ILINKS} explicitly, that's all taken care of by beforedepend. With these items fixed, we no longer build vmm.ko every single time through the modules on a KERNFAST build. Sponsored by: Netflix	2014-10-17 13:20:49 +00:00
Neel Natu	2688a818a3	Don't advertise the Instruction Based Sampling feature because it requires emulating a large number of MSRs. Ignore writes to a couple more AMD-specific MSRs and return 0 on read. This further reduces the unimplemented MSRs accessed by a Linux guest on boot.	2014-10-17 06:23:04 +00:00
Neel Natu	02904c45ab	Hide extended PerfCtr MSRs on AMD processors by clearing bits 23, 24 and 28 in CPUID.80000001H:ECX. Handle accesses to PerfCtrX and PerfEvtSelX MSRs by ignoring writes and returning 0 on reads. This further reduces the number of unimplemented MSRs hit by a Linux guest during boot.	2014-10-17 03:04:38 +00:00
Neel Natu	b1cf7bb5e4	Use the correct fault type (VM_PROT_EXECUTE) for an instruction fetch.	2014-10-16 18:16:31 +00:00
Neel Natu	5a1f0b36b1	Fix topology enumeration issues exposed by AMD Bulldozer Family 15h processor. Initialize CPUID.80000008H:ECX[7:0] with the number of logical processors in the package. This fixes a panic during early boot in NetBSD 7.0 BETA. Clear the Topology Extension feature bit from CPUID.80000001H:ECX since we don't emulate leaves 0x8000001D and 0x8000001E. This fixes a divide by zero panic in early boot in Centos 6.4. Tested on an "AMD Opteron 6320" courtesy of Ben Perrault. Reviewed by: grehan	2014-10-16 18:13:10 +00:00
Davide Italiano	2be111bf7d	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe	2014-10-16 18:04:43 +00:00
Neel Natu	06053618cb	Actually hide the SVM capability by clearing CPUID.80000001H:ECX[bit 3] after it has been initialized by cpuid_count(). Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-10-15 04:29:03 +00:00
Neel Natu	d63e02ea96	Emulate "POP r/m". This is needed to boot OpenBSD/i386 MP kernel in bhyve. Reported by: grehan MFC after: 1 week	2014-10-14 21:02:33 +00:00
Neel Natu	f37dbf579d	Remove extraneous comments.	2014-10-11 04:57:17 +00:00
Neel Natu	8fe9436d4c	Get rid of unused headers. Restrict scope of malloc types M_SVM and M_SVM_VLAPIC by making them static. Replace ERR() with KASSERT(). style(9) cleanup.	2014-10-11 04:41:21 +00:00
Neel Natu	3d492b65bc	Get rid of unused forward declaration of 'struct svm_softc'.	2014-10-11 03:21:33 +00:00
Neel Natu	92337d968c	style(9) fixes. Get rid of unused headers.	2014-10-11 03:19:26 +00:00
Neel Natu	882a1f1942	Use a consistent style for messages emitted when the module is loaded.	2014-10-11 03:09:34 +00:00
Neel Natu	ed6aacb51f	IFC @r272887	2014-10-10 23:52:56 +00:00
Neel Natu	faba66190e	Fix bhyvectl so it works correctly on AMD/SVM hosts. Also, add command line options to display some key VMCB fields. The set of valid options that can be passed to bhyvectl now depends on the processor type. AMD-specific options are identified by a "--vmcb" or "--avic" in the option name. Intel-specific options are identified by a "--vmcs" in the option name. Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-10-10 21:48:59 +00:00
Neel Natu	5295c3e61d	Support Intel-specific MSRs that are accessed when booting up a linux in bhyve: - MSR_PLATFORM_INFO - MSR_TURBO_RATIO_LIMITx - MSR_RAPL_POWER_UNIT Reviewed by: grehan MFC after: 1 week	2014-10-09 19:13:33 +00:00
Mark Johnston	5eaae1411f	Pass up the error status of minidumpsys() to its callers. PR: 193761 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Sponsored by: EMC / Isilon Storage Division	2014-10-08 20:25:21 +00:00
Konstantin Belousov	07a92f34d6	Add an argument to the x86 pmap_invalidate_cache_range() to request forced invalidation of the cache range regardless of the presence of self-snoop feature. Some recent Intel GPUs in some modes are not coherent, and dirty lines in CPU cache must be flushed before the pages are transferred to GPU domain. Reviewed by: alc (previous version) Tested by: pho (amd64) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-10-08 16:48:03 +00:00
Neel Natu	65145c7f50	Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'. The hypervisor hides the MONITOR/MWAIT capability by unconditionally setting CPUID.01H:ECX[3] to 0 so the guest should not expect these instructions to be present anyways. Discussed with: grehan	2014-10-06 20:48:01 +00:00
Neel Natu	107af8f2ed	IFC @r272481	2014-10-05 01:28:21 +00:00
Neel Natu	d72978ecd7	Get rid of code that dealt with the hardware not being able to save/restore the PAT MSR on guest exit/entry. This workaround was done for a beta release of VMware Fusion 5 but is no longer needed in later versions. All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT in the VM exit and entry controls. Discussed with: grehan	2014-10-02 05:32:29 +00:00
Roger Pau Monné	44e06d158a	msi: add Xen MSI implementation This patch adds support for MSI interrupts when running on Xen. Apart from adding the Xen related code needed in order to register MSI interrupts this patch also makes the msi_init function a hook in init_ops, so different MSI implementations can have different initialization functions. Sponsored by: Citrix Systems R&D xen/interface/physdev.h: - Add the MAP_PIRQ_TYPE_MULTI_MSI to map multi-vector MSI to the Xen public interface. x86/include/init.h: - Add a hook for setting custom msi_init methods. amd64/amd64/machdep.c: i386/i386/machdep.c: - Set the default msi_init hook to point to the native MSI initialization method. x86/xen/pv.c: - Set the Xen MSI init hook when running as a Xen guest. x86/x86/local_apic.c: - Call the msi_init hook instead of directly calling msi_init. xen/xen_intr.h: x86/xen/xen_intr.c: - Introduce support for registering/releasing MSI interrupts with Xen. - The MSI interrupts will use the same PIC as the IO APIC interrupts. xen/xen_msi.h: x86/xen/xen_msi.c: - Introduce a Xen MSI implementation. x86/xen/xen_nexus.c: - Overwrite the default MSI hooks in the Xen Nexus to use the Xen MSI implementation. x86/xen/xen_pci.c: - Introduce a Xen specific PCI bus that inherits from the ACPI PCI bus and overwrites the native MSI methods. - This is needed because when running under Xen the MSI messages used to configure MSI interrupts on PCI devices are written by Xen itself. dev/acpica/acpi_pci.c: - Lower the quality of the ACPI PCI bus so the newly introduced Xen PCI bus can take over when needed. conf/files.i386: conf/files.amd64: - Add the newly created files to the build process.	2014-09-30 16:46:45 +00:00
Neel Natu	970388bf8d	IFC @r272185	2014-09-27 22:15:50 +00:00
Neel Natu	30571674ce	Simplify register state save and restore across a VMRUN: - Host registers are now stored on the stack instead of a per-cpu host context. - Host %FS and %GS selectors are not saved and restored across VMRUN. - Restoring the %FS/%GS selectors was futile anyways since that only updates the low 32 bits of base address in the hidden descriptor state. - GS.base is properly updated via the MSR_GSBASE on return from svm_launch(). - FS.base is not used while inside the kernel so it can be safely ignored. - Add function prologue/epilogue so svm_launch() can be traced with Dtrace's FBT entry/exit probes. They also serve to save/restore the host %rbp across VMRUN. Reviewed by: grehan Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-27 02:04:58 +00:00
Peter Grehan	a48c333805	Allow the PIC's IMR register to be read before ICW initialisation. As of git submit e179f6914152eca9, the Linux kernel does a simple probe of the PIC by writing a pattern to the IMR and then reading it back, prior to the init sequence of ICW words. The bhyve PIC emulation wasn't allowing the IMR to be read until the ICW sequence was complete. This limitation isn't required so relax the test. With this change, Linux kernels 3.15-rc2 and later won't hang on boot when calibrating the local APIC. Reviewed by: tychon MFC after: 3 days	2014-09-27 01:15:24 +00:00
Roger Pau Monné	c98a2727cc	ddb: allow specifying the exact address of the symtab and strtab When the FreeBSD kernel is loaded from Xen the symtab and strtab are not loaded the same way as the native boot loader. This patch adds three new global variables to ddb that can be used to specify the exact position and size of those tables, so they can be directly used as parameters to db_add_symbol_table. A new helper is introduced, so callers that used to set ksym_start and ksym_end can use this helper to set the new variables. It also adds support for loading them from the Xen PVH port, that was previously missing those tables. Sponsored by: Citrix Systems R&D Reviewed by: kib ddb/db_main.c: - Add three new global variables: ksymtab, kstrtab, ksymtab_size that can be used to specify the position and size of the symtab and strtab. - Use those new variables in db_init in order to call db_add_symbol_table. - Move the logic in db_init to db_fetch_symtab in order to set ksymtab, kstrtab, ksymtab_size from ksym_start and ksym_end. ddb/ddb.h: - Add prototype for db_fetch_ksymtab. - Declate the extern variables ksymtab, kstrtab and ksymtab_size. x86/xen/pv.c: - Add support for finding the symtab and strtab when booted as a Xen PVH guest. Since Xen loads the symtab and strtab as NetBSD expects to find them we have to adapt and use the same method. amd64/amd64/machdep.c: arm/arm/machdep.c: i386/i386/machdep.c: mips/mips/machdep.c: pc98/pc98/machdep.c: powerpc/aim/machdep.c: powerpc/booke/machdep.c: sparc64/sparc64/machdep.c: - Use the newly introduced db_fetch_ksymtab in order to set ksymtab, kstrtab and ksymtab_size.	2014-09-25 08:28:10 +00:00
Bjoern A. Zeeb	14f2533c56	As per [1] Intel only supports this driver on 64bit platforms. For now restrict it to amd64. Other architectures might be re-added later once tested. Remove the drivers from the global NOTES and files files and move them to the amd64 specifics. Remove the drivers from the i386 modules build and only leave the amd64 version. Rather than depending on "inet" depend on "pci" and make sure that ixl(4) and ixlv(4) can be compiled independently [2]. This also allows the drivers to build properly on IPv4-only or IPv6-only kernels. PR: 193824 [2] Reviewed by: eric.joyner intel.com MFC after: 3 days References: [1] http://lists.freebsd.org/pipermail/svn-src-all/2014-August/090470.html	2014-09-23 08:33:03 +00:00
Neel Natu	af198d882a	Allow more VMCB fields to be cached: - CR2 - CR0, CR3, CR4 and EFER - GDT/IDT base/limit fields - CS/DS/ES/SS selector/base/limit/attrib fields The caching can be further restricted via the tunable 'hw.vmm.svm.vmcb_clean'. Restructure the code such that the fields above are only modified in a single place. This makes it easy to invalidate the VMCB cache when any of these fields is modified.	2014-09-21 23:42:54 +00:00
Neel Natu	4eea1566cb	Get rid of unused stat VMM_HLT_IGNORED.	2014-09-21 18:52:56 +00:00
Konstantin Belousov	060cd4d500	Update and clarify comments. Remove the useless counter for impossible, but seen in wild situation (on buggy hypervisors). In collaboration with: bde MFC after: 1 week	2014-09-21 09:06:50 +00:00
Neel Natu	ba28c094bb	The memory type bits (PAT, PCD, PWT) associated with a nested PTE or PDE are identical to the traditional x86 page tables.	2014-09-21 06:36:17 +00:00
Neel Natu	8f02c5e456	IFC r271888. Restructure MSR emulation so it is all done in processor-specific code.	2014-09-20 21:46:31 +00:00
Neel Natu	b6cf6c8ca6	IFC @r271887	2014-09-20 06:27:37 +00:00
Neel Natu	9d8d8e3ee7	Add some more KTR events to help debugging.	2014-09-20 05:13:03 +00:00
Neel Natu	cb44ea41cb	MSR_KGSBASE is no longer saved and restored from the guest MSR save area. This behavior was changed in r271888 so update the comment block to reflect this. MSR_KGSBASE is accessible from the guest without triggering a VM-exit. The permission bitmap for MSR_KGSBASE is modified by vmx_msr_guest_init() so get rid of redundant code in vmx_vminit().	2014-09-20 05:12:34 +00:00
Neel Natu	c3498942a5	Restructure the MSR handling so it is entirely handled by processor-specific code. There are only a handful of MSRs common between the two so there isn't too much duplicate functionality. The VT-x code has the following types of MSRs: - MSRs that are unconditionally saved/restored on every guest/host context switch (e.g., MSR_GSBASE). - MSRs that are restored to guest values on entry to vmx_run() and saved before returning. This is an optimization for MSRs that are not used in host kernel context (e.g., MSR_KGSBASE). - MSRs that are emulated and every access by the guest causes a trap into the hypervisor (e.g., MSR_IA32_MISC_ENABLE). Reviewed by: grehan	2014-09-20 02:35:21 +00:00
Konstantin Belousov	6dfc9e44fa	- Use NULL instead of 0 for fpcurthread. - Note the quirk with the interrupt enabled state of the dna handler. - Use just panic() instead of printf() and panic(). Print tid instead of pid, the fpu state is per-thread. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-09-18 09:13:20 +00:00
Bjoern A. Zeeb	581980cff8	Re-gen after r271743 implementing most of timer_{create,settime,gettime,getoverrun,delete}. MFC after: 3 days Sponsored by: DARPA, AFRL	2014-09-18 08:40:00 +00:00
Bjoern A. Zeeb	0a041f3b47	Implement most of timer_{create,settime,gettime,getoverrun,delete} for amd64/linux32. Fix the entirely bogus (untested) version from r161310 for i386/linux using the same shared code in compat/linux. It is unclear to me if we could support more clock mappings but the current set allows me to successfully run commercial 32bit linux software under linuxolator on amd64. Reviewed by: jhb Differential Revision: D784 MFC after: 3 days Sponsored by: DARPA, AFRL	2014-09-18 08:36:45 +00:00
Konstantin Belousov	490356e5b7	Presence of any VM_PROT bits in the permission argument on x86 implies that the entry is readable and valid. Reported by: markj Submitted by: alc Tested by: pho (previous version), markj MFC after: 3 days	2014-09-17 18:49:57 +00:00
Neel Natu	4e27d36d38	IFC @r271694	2014-09-17 18:46:51 +00:00
Neel Natu	6b844b87e2	Rework vNMI injection. Keep track of NMI blocking by enabling the IRET intercept on a successful vNMI injection. The NMI blocking condition is cleared when the handler executes an IRET and traps back into the hypervisor. Don't inject NMI if the processor is in an interrupt shadow to preserve the atomic nature of "STI;HLT". Take advantage of this and artificially set the interrupt shadow to prevent NMI injection when restarting the "iret". Reviewed by: Anish Gupta (akgupt3@gmail.com), grehan	2014-09-17 00:30:25 +00:00
Neel Natu	5fb3bc71f8	Minor cleanup. Get rid of unused 'svm_feature' from the softc. Get rid of the redundant 'vcpu_cnt' checks in svm.c. There is a similar check in vmm.c against 'vm->active_cpus' before the AMD-specific code is called. Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-09-16 04:01:55 +00:00
Neel Natu	79ad53fba3	Use V_IRQ, V_INTR_VECTOR and V_TPR to offload APIC interrupt delivery to the processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes care of injecting this vector when the guest is able to receive it. Legacy PIC interrupts are still delivered via the event injection mechanism. This is because the vector injected by the PIC must reflect the state of its pins at the time the CPU is ready to accept the interrupt. Accesses to the TPR via %CR8 are handled entirely in hardware. This requires that the emulated TPR must be synced to V_TPR after a #VMEXIT. The guest can also modify the TPR via the memory mapped APIC. This requires that the V_TPR must be synced with the emulated TPR before a VMRUN. Reviewed by: Anish Gupta (akgupt3@gmail.com)	2014-09-16 03:31:40 +00:00
Neel Natu	bbadcde418	Set the 'vmexit->inst_length' field properly depending on the type of the VM-exit and ultimately on whether nRIP is valid. This allows us to update the %rip after the emulation is finished so any exceptions triggered during the emulation will point to the right instruction. Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability is available. The effective segment field in EXITINFO1 is not valid without this capability. Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging. Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2 fields in bhyve(8). Reviewed by: Anish Gupta (akgupt3@gmail.com) Reviewed by: grehan	2014-09-14 04:39:04 +00:00
Neel Natu	74accc3170	Bug fixes. - Don't enable the HLT intercept by default. It will be enabled by bhyve(8) if required. Prior to this change HLT exiting was always enabled making the "-H" option to bhyve(8) meaningless. - Recognize a VM exit triggered by a non-maskable interrupt. Prior to this change the exit would be punted to userspace and the virtual machine would terminate.	2014-09-13 23:48:43 +00:00
Neel Natu	fa7caa91cb	style(9): insert an empty line if the function has no local variables Pointed out by: grehan	2014-09-13 22:45:04 +00:00
Neel Natu	c2a875f970	AMD processors that have the SVM decode assist capability will store the instruction bytes in the VMCB on a nested page fault. This is useful because it saves having to walk the guest page tables to fetch the instruction. vie_init() now takes two additional parameters 'inst_bytes' and 'inst_len' that map directly to 'vie->inst[]' and 'vie->num_valid'. The instruction emulation handler skips calling 'vmm_fetch_instruction()' if 'vie->num_valid' is non-zero. The use of this capability can be turned off by setting the sysctl/tunable 'hw.vmm.svm.disable_npf_assist' to '1'. Reviewed by: Anish Gupta (akgupt3@gmail.com) Discussed with: grehan	2014-09-13 22:16:40 +00:00
John Baldwin	7d8312cc92	Add a sysctl to export the EFI memory map along with a handler in the sysctl(8) binary to format it. Reviewed by: emaste MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D771	2014-09-13 03:10:02 +00:00
Neel Natu	d181963296	Optimize the common case of injecting an interrupt into a vcpu after a HLT by explicitly moving it out of the interrupt shadow. The hypervisor is done "executing" the HLT and by definition this moves the vcpu out of the 1-instruction interrupt shadow. Prior to this change the interrupt would be held pending because the VMCS guest-interruptibility-state would indicate that "blocking by STI" was in effect. This resulted in an unnecessary round trip into the guest before the pending interrupt could be injected. Reviewed by: grehan	2014-09-12 06:15:20 +00:00
Neel Natu	442a04ca83	style(9): indent the switch, don't indent the case, indent case body one tab.	2014-09-11 06:17:56 +00:00
Neel Natu	e441104d63	Repurpose the V_IRQ interrupt injection to implement VMX-style interrupt window exiting. This simply involves setting V_IRQ and enabling the VINTR intercept. This instructs the CPU to trap back into the hypervisor as soon as an interrupt can be injected into the guest. The pending interrupt is then injected via the traditional event injection mechanism. Rework vcpu interrupt injection so that Linux guests now idle with host cpu utilization close to 0%. Reviewed by: Anish Gupta (earlier version) Discussed with: grehan	2014-09-11 02:37:02 +00:00
John Baldwin	de2b02fc74	MFamd64: Use initializecpu() to set various model-specific registers on AP startup and AP resume (it was already used for BSP startup and BSP resume). - Split code to do one-time probing of cache properties out of initializecpu() and into initializecpucache(). This is called once on the BSP during boot. - Move enable_sse() into initializecpu(). - Call initializecpu() for AP startup instead of enable_sse() and manually frobbing MSR_EFER to enable PG_NX. - Call initializecpu() when an AP resumes. In theory this will now properly re-enable PG_NX in MSR_EFER when resuming a PAE kernel on APs.	2014-09-10 21:37:47 +00:00
Neel Natu	238b6cb761	Allow intercepts and irq fields to be cached by the VMCB. Provide APIs svm_enable_intercept()/svm_disable_intercept() to add/delete VMCB intercepts. These APIs ensure that the VMCB state cache is invalidated when intercepts are modified. Each intercept is identified as a (index,bitmask) tuple. For e.g., the VINTR intercept is identified as (VMCB_CTRL1_INTCPT,VMCB_INTCPT_VINTR). The first 20 bytes in control area that are used to enable intercepts are represented as 'uint32_t intercept[5]' in 'struct vmcb_ctrl'. Modify svm_setcap() and svm_getcap() to use the new APIs. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-10 03:13:40 +00:00
Neel Natu	e5397c9fdd	Move the VMCB initialization into svm.c in preparation for changes to the interrupt injection logic. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-10 02:35:19 +00:00
Neel Natu	840b1a2760	Move the event injection function into svm.c and add KTR logging for every event injection. This in in preparation for changes to SVM guest interrupt injection. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-10 02:20:32 +00:00
Neel Natu	2591ee3e80	Remove a bogus check that flagged an error if the guest %rip was zero. An AP begins execution with %rip set to 0 after a startup IPI. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-10 01:46:22 +00:00
Neel Natu	5e467bd098	Make the KTR tracepoints uniform and ensure that every VM-exit is logged. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-10 01:37:32 +00:00
Neel Natu	a2901ce7ad	Allow guest read access to MSR_EFER without hypervisor intervention. Dirty the VMCB_CACHE_CR state cache when MSR_EFER is modified.	2014-09-10 01:10:53 +00:00
Neel Natu	501f03eba2	Remove gratuitous forward declarations. Remove tabs on empty lines.	2014-09-09 23:39:43 +00:00
Neel Natu	a268481428	Do proper ASID management for guest vcpus. Prior to this change an ASID was hard allocated to a guest and shared by all its vcpus. The meant that the number of VMs that could be created was limited to the number of ASIDs supported by the CPU. It was also inefficient because it forced a TLB flush on every VMRUN. With this change the number of guests that can be created is independent of the number of available ASIDs. Also, the TLB is flushed only when a new ASID is allocated. Discussed with: grehan Reviewed by: Anish Gupta (akgupt3@gmail.com)	2014-09-06 19:02:52 +00:00
John Baldwin	b1d735ba4c	Create a separate structure for per-CPU state saved across suspend and resume that is a superset of a pcb. Move the FPU state out of the pcb and into this new structure. As part of this, move the FPU resume code on amd64 into a C function. This allows resumectx() to still operate only on a pcb and more closely mirrors the i386 code. Reviewed by: kib (earlier version)	2014-09-06 15:23:28 +00:00
Neel Natu	3865879753	Merge svm_set_vmcb() and svm_init_vmcb() into a single function that is called just once when a vcpu is initialized. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-05 03:33:16 +00:00
Pedro F. Giffuni	11db54f172	Apply known workarounds for modern MacBooks. The legacy USB circuit tends to give trouble on MacBook. While the original report covered MacBook, extend the fix preemptively for the newer MacBookPro too. PR: 191693 Reviewed by: emaste MFC after: 5 days	2014-09-05 01:06:45 +00:00
Mark Johnston	a58b4afa9f	Add mrsas(4) to GENERIC for i386 and amd64. Approved by: ambrisko, kadesai MFC after: 3 days	2014-09-04 21:06:33 +00:00
John Baldwin	33a50f1b0f	Merge the amd64 and i386 identcpu.c into a single x86 implementation. This brings the structured extended features mask and VT-x reporting to i386 and Intel cache and TLB info (under bootverbose) to amd64.	2014-09-04 14:26:25 +00:00
Neel Natu	bf4993ba2a	Remove unused header file. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-04 06:07:32 +00:00
Neel Natu	fea6bd5cd3	Consolidate the code to restore the host TSS after a #VMEXIT into a single function restore_host_tss(). Don't bother to restore MSR_KGSBASE after a #VMEXIT since it is not used in the kernel. It will be restored on return to userspace. Discussed with: Anish Gupta (akgupt3@gmail.com)	2014-09-04 06:00:18 +00:00
John Baldwin	2b793beefd	Remove trailing whitespace.	2014-09-04 01:56:15 +00:00
John Baldwin	7fb40488d6	- Move prototypes for various functions into out of C files and into <machine/md_var.h>. - Move some CPU-related variables out of i386/i386/identcpu.c to initcpu.c to match amd64. - Move the declaration of has_f00f_hack out of identcpu.c to machdep.c. - Remove a misleading comment from i386/i386/initcpu.c (locore zeros the BSS before it calls identify_cpu()) and remove explicit zero assignments to reduce the diff with amd64.	2014-09-04 01:46:06 +00:00
Neel Natu	246e7a2b64	IFC @r269962 Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-09-02 04:22:42 +00:00
Alan Cox	ad2e88a14f	Update a comment to reflect the changes in r213408. MFC after: 5 days	2014-09-02 04:11:20 +00:00
Neel Natu	4c98655ece	The "SUB" instruction used in getcc() actually does 'x -= y' so use the proper constraint for 'x'. The "+r" constraint indicates that 'x' is an input and output register operand. While here generate code for different variants of getcc() using a macro GETCC(sz) where 'sz' indicates the operand size. Update the status bits in %rflags when emulating AND and OR opcodes. Reviewed by: grehan	2014-08-30 19:59:42 +00:00
Pedro F. Giffuni	ec4a0b4408	Minor space/tab cleanups. Most of them were ripped from the GSoC 2104 SMAP + kpatch project. This is only a cosmetic change. Taken from: Oliver Pinter (op@) MFC after: 5 days	2014-08-30 15:41:07 +00:00
John Baldwin	89871cdeb6	- Add a new structure type for the ACPI 3.0 SMAP entry that includes the optional attributes field. - Add a 'machdep.smap' sysctl that exports the SMAP table of the running system as an array of the ACPI 3.0 structure. (On older systems, the attributes are given a value of zero.) Note that the sysctl only exports the SMAP table if it is available via the metadata passed from the loader to the kernel. If an SMAP is not available, an empty array is returned. - Add a format handler for the ACPI 3.0 SMAP structure to the sysctl(8) binary to format the SMAP structures in a readable format similar to the format found in boot messages. MFC after: 2 weeks	2014-08-29 21:25:47 +00:00
Peter Grehan	fc3dde9099	Implement the 0x2B SUB instruction, and the OR variant of 0x81. Found with local APIC accesses from bitrig/amd64 bsd.rd, 07/15-snap. Reviewed by: neel MFC after: 3 days	2014-08-27 00:53:56 +00:00
Neel Natu	48e8c2137a	An exception is allowed to be injected even if the vcpu is in an interrupt shadow, so move the check for pending exception before bailing out due to an interrupt shadow. Change return type of 'vmcb_eventinject()' to a void and convert all error returns into KASSERTs. Fix VMCB_EXITINTINFO_EC(x) and VMCB_EXITINTINFO_TYPE(x) to do the shift before masking the result. Reviewed by: Anish Gupta (akgupt3@gmail.com)	2014-08-25 00:58:20 +00:00
Peter Grehan	7f21538b6e	Change __inline style to be consistent with FreeBSD usage, and also fix gcc build (on STABLE, when MFCd). PR: 192880 Reviewed by: neel Reported by: ngie MFC after: 1 day	2014-08-24 02:07:34 +00:00
Neel Natu	8bd3845d3c	Add "hw.vmm.topology.threads_per_core" and "hw.vmm.topology.cores_per_package" tunables to modify the default cpu topology advertised by bhyve. Also add a tunable "hw.vmm.topology.cpuid_leaf_b" to disable the CPUID leaf 0xb. This is intended for testing guest behavior when it falls back on using CPUID leaf 0x4 to deduce CPU topology. The default behavior is to advertise each vcpu as a core in a separate soket.	2014-08-24 01:10:06 +00:00
Neel Natu	534dc967d7	Fix a bug in the emulation of CPUID leaf 0x4 where bhyve was claiming that the vcpu had no caches at all. This causes problems when executing applications in the guest compiled with the Intel compiler. Submitted by: Mark Hill (mark.hill@tidalscale.com)	2014-08-23 22:44:31 +00:00
Neel Natu	7a244722d1	Return the spurious interrupt vector (IRQ7 or IRQ15) if the atpic cannot find any unmasked pin with an interrupt asserted. Reviewed by: tychon CR: https://reviews.freebsd.org/D669 MFC after: 1 week	2014-08-23 21:16:26 +00:00
John Baldwin	669eac89c5	Fix build of si(4) and enable it in LINT on amd64 and i386.	2014-08-20 16:07:17 +00:00
John Baldwin	64d6de263b	Bump MAXCPU on amd64 from 64 to 256. In practice APIC only permits 255 CPUs (IDs 0 through 254). Getting above that limit requires x2APIC. MFC after: 1 month	2014-08-20 16:06:24 +00:00
Konstantin Belousov	3165194c6b	Increase max number of physical segments on amd64 to 63. Eventually, the vmd_segs of the struct vm_domain should become bitset instead of long, to allow arbitrary compile-time selected maximum. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-20 08:07:08 +00:00
Alan Cox	ada1ae623e	There exists a possible sequence of page table page allocation failures starting with a superpage demotion by pmap_enter() that could result in a PV list lock being held when pmap_enter() is just about to return KERN_RESOURCE_SHORTAGE. Consequently, the KASSERT that no PV list locks are held needs to be replaced with a conditional unlock. Discussed with: kib X-MFC with: r269728 Sponsored by: EMC / Isilon Storage Division	2014-08-18 20:28:08 +00:00
Gavin Atkinson	67e3b91b31	Update i386/NOTES and amd64/NOTES files to contain the complete list of firmwares for iwn(4) and sort them. MFC after: 1 week	2014-08-14 18:29:55 +00:00
Neel Natu	4eec602102	Reword comment to match the interrupt mode names from the MPtable spec. Reviewed by: tychon	2014-08-14 18:03:38 +00:00
Neel Natu	477867a0e5	Use the max guest memory address when creating its iommu domain. Also, assert that the GPA being mapped in the domain is less than its maxaddr. Reviewed by: grehan Pointed out by: Anish Gupta (akgupt3@gmail.com)	2014-08-14 05:00:45 +00:00
Alan Cox	4d33fe39e4	Update the text of a KASSERT() to reflect the changes in r269728.	2014-08-09 17:13:02 +00:00
Konstantin Belousov	39ffa8c138	Change pmap_enter(9) interface to take flags parameter and superpage mapping size (currently unused). The flags includes the fault access bits, wired flag as PMAP_ENTER_WIRED, and a new flag PMAP_ENTER_NOSLEEP to indicate that pmap should not sleep. For powerpc aim both 32 and 64 bit, fix implementation to ensure that the requested mapping is created when PMAP_ENTER_NOSLEEP is not specified, in particular, wait for the available memory required to proceed. In collaboration with: alc Tested by: nwhitehorn (ppc aim32 and booke) Sponsored by: The FreeBSD Foundation and EMC / Isilon Storage Division MFC after: 2 weeks	2014-08-08 17:12:03 +00:00
Neel Natu	12a6eb99a1	Support PCI extended config space in bhyve. Add the ACPI MCFG table to advertise the extended config memory window. Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted or moved in the guest's address space. The PCI extended config space is an example of an immutable memory range. Add emulation for the "movzw" instruction. This instruction is used by FreeBSD to read a 16-bit extended config space register. CR: https://phabric.freebsd.org/D505 Reviewed by: jhb, grehan Requested by: tychon	2014-08-08 03:49:01 +00:00
Gleb Smirnoff	c8d2ffd6a7	Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c The MD allocators were very common, however there were some minor differencies. These differencies were all consolidated in the MI allocator, under ifdefs. The defines from machine/vmparam.h turn on features required for a particular machine. For details look in the comment in sys/sf_buf.h. As result no MD code left in sys///vm_machdep.c. Some arches still have machine/sf_buf.h, which is usually quite small. Tested by: glebius (i386), tuexen (arm32), kevlo (arm32) Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-08-05 09:44:10 +00:00
Alan Cox	a695d9b25b	Retire pmap_change_wiring(). We have never used it to wire virtual pages. We continue to use pmap_enter() for that. For unwiring virtual pages, we now use pmap_unwire(), which unwires a range of virtual addresses instead of a single virtual page. Sponsored by: EMC / Isilon Storage Division	2014-08-03 20:40:51 +00:00
John Baldwin	06fc6db948	- Output a summary of optional VT-x features in dmesg similar to CPU features. If bootverbose is enabled, a detailed list is provided; otherwise, a single-line summary is displayed. - Add read-only sysctls for optional VT-x capabilities used by bhyve under a new hw.vmm.vmx.cap node. Move a few exiting sysctls that indicate the presence of optional capabilities under this node. CR: https://phabric.freebsd.org/D498 Reviewed by: grehan, neel MFC after: 1 week	2014-07-30 00:00:12 +00:00
Neel Natu	f008d1571d	If a vcpu has issued a HLT instruction with interrupts disabled then it sleeps forever in vm_handle_hlt(). This is usually not an issue as long as one of the other vcpus properly resets or powers off the virtual machine. However, if the bhyve(8) process is killed with a signal the halted vcpu cannot be woken up because it's sleep cannot be interrupted. Fix this by waking up periodically and returning from vm_handle_hlt() if TDF_ASTPENDING is set. Reported by: Leon Dang Sponsored by: Nahanni Systems	2014-07-26 02:53:51 +00:00
Neel Natu	1edccd0f30	Don't return -1 from the push emulation handler. Negative return values are interpreted specially on return from sys_ioctl() and may cause undesirable side-effects like restarting the system call.	2014-07-26 02:51:46 +00:00
Neel Natu	830be8acb4	Fix a couple of issues in the PUSH emulation: It is not possible to PUSH a 32-bit operand on the stack in 64-bit mode. The default operand size for PUSH is 64-bits and the operand size override prefix changes that to 16-bits. vm_copy_setup() can return '1' if it encounters a fault when walking the guest page tables. This is a guest issue and is now handled properly by resuming the guest to handle the fault.	2014-07-24 23:01:53 +00:00
Marius Strobl	c615e6a8bf	Copying pages via temporary mappings in the !DMAP case of pmap_copy_pages() involves updating the corresponding page tables followed by accesses to the pages in question. This sequence is subject to the situation exactly described in the "AMD64 Architecture Programmer's Manual Volume 2: System Programming" rev. 3.23, "7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore, issuing the INVLPG right after modifying the PTE bits is crucial (see also r269050). For the amd64 PMAP code, the order of instructions was already correct. The above fact still is worth documenting, though. 1: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf Reviewed by: alc Sponsored by: Bally Wulff Games & Entertainment GmbH	2014-07-24 10:12:22 +00:00
Neel Natu	d37f2adb38	Fix fault injection in bhyve. The faulting instruction needs to be restarted when the exception handler is done handling the fault. bhyve now does this correctly by setting 'vmexit[vcpu].inst_length' to zero so the %rip is not advanced. A minor complication is that the fault injection APIs are used by instruction emulation code that is shared by vmm.ko and bhyve. Thus the argument that refers to 'struct vm ' in kernel or 'struct vmctx ' in userspace needs to be loosely typed as a 'void *'.	2014-07-24 01:38:11 +00:00
Roger Pau Monné	f5417a03e3	don't set CR4 PSE bit on amd64 Setting PSE together with PAE or in long mode just makes the PSE bit completely ignored, so don't set it. Sponsored by: Citrix Systems R&D Reviewed by: kib	2014-07-23 15:53:29 +00:00
Neel Natu	d665d229ce	Emulate instructions emitted by OpenBSD/i386 version 5.5: - CMP REG, r/m - MOV AX/EAX/RAX, moffset - MOV moffset, AX/EAX/RAX - PUSH r/m	2014-07-23 04:28:51 +00:00
Ed Maste	b47228854f	Don't pass null kmdp to preload_search_info On Xen PVH guests kmdp == NULL. Submitted by: royger MFC after: 3 days Sponsored by: The FreeBSD Foundation	2014-07-22 13:58:33 +00:00
Mark Johnston	26cf239814	Fix the build when DTrace isn't enabled. Reported by: stefanf X-MFC-With: r268600	2014-07-20 18:44:56 +00:00
Neel Natu	019008ebf5	Fix build without INVARIANTS defined by getting rid of unused variable 'exc'. Reported by: adrian, stefanf	2014-07-20 16:34:35 +00:00
Neel Natu	091d453222	Handle nested exceptions in bhyve. A nested exception condition arises when a second exception is triggered while delivering the first exception. Most nested exceptions can be handled serially but some are converted into a double fault. If an exception is generated during delivery of a double fault then the virtual machine shuts down as a result of a triple fault. vm_exit_intinfo() is used to record that a VM-exit happened while an event was being delivered through the IDT. If an exception is triggered while handling the VM-exit it will be treated like a nested exception. vm_entry_intinfo() is used by processor-specific code to get the event to be injected into the guest on the next VM-entry. This function is responsible for deciding the disposition of nested exceptions.	2014-07-19 20:59:08 +00:00
Mark Johnston	5a5f9d21dd	Use a C wrapper for trap() instead of checking and calling the DTrace trap hook in assembly. Suggested by: kib Reviewed by: kib (original version) X-MFC-With: r268600	2014-07-19 02:27:31 +00:00
Neel Natu	3d5444c864	Add emulation for legacy x86 task switching mechanism. FreeBSD/i386 uses task switching to handle double fault exceptions and this change enables that to work. Reported by: glebius	2014-07-16 21:26:26 +00:00
Neel Natu	f7a9f1784f	Add support for operand size and address size override prefixes in bhyve's instruction emulation [1]. Fix bug in emulation of opcode 0x8A where the destination is a legacy high byte register and the guest vcpu is in 32-bit mode. Prior to this change instead of modifying %ah, %bh, %ch or %dh the emulation would end up modifying %spl, %bpl, %sil or %dil instead. Add support for moffsets by treating it as a 2, 4 or 8 byte immediate value during instruction decoding. Fix bug in verify_gla() where the linear address computed after decoding the instruction was not being truncated to the effective address size [2]. Tested by: Leon Dang [1] Reported by: Peter Grehan [2] Sponsored by: Nahanni Systems	2014-07-15 17:37:17 +00:00
Konstantin Belousov	5e351014e0	Make amd64 pmap_copy_pages() functional for pages not mapped by DMAP. Requested and reviewed by: royger Tested by: pho, royger Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-15 09:30:43 +00:00
Mark Johnston	291624fdf6	Invoke the DTrace trap handler before calling trap() on amd64. This matches the upstream implementation and helps ensure that a trap induced by tracing fbt::trap:entry is handled without recursively generating another trap. This makes it possible to run most (but not all) of the DTrace tests under common/safety/ without triggering a kernel panic. Submitted by: Anton Rang <anton.rang@isilon.com> (original version) Phabric: D95	2014-07-14 04:38:17 +00:00
Neel Natu	3ada6e07ac	Use the correct offset when converting a logical address (segment:offset) to a linear address.	2014-07-11 01:23:38 +00:00
Konstantin Belousov	fd815c0b8d	For safety, ensure that any consumer of the set_regs() and ptrace_set_pc() use the correct return to userspace using iret. The signal return, PT_CONTINUE (which in fact uses signal return path) set the pcb flag already. The setcontext(2) enforces iret return when %rip is incorrect. Due to this, the change is redundand, but is made to ensure that no path which modifies context, forgets to set PCB_FULL_IRET. Inspired by: CVE-2014-4699 Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-09 21:39:40 +00:00
Neel Natu	b301b9e28f	Accurately identify the vcpu's operating mode as 64-bit, compatibility, protected or real.	2014-07-08 21:48:57 +00:00
Neel Natu	3527963b26	Invalidate guest TLB mappings as a side-effect of its CR3 being updated. This is a pre-requisite for task switch emulation since the CR3 is loaded from the new TSS.	2014-07-08 20:51:03 +00:00

... 4 5 6 7 8 ...

7307 Commits