freebsd-skq

Author	SHA1	Message	Date
Konstantin Belousov	3c48106aaa	bhyve: limit max GPA to VM_MAXUSER_ADDRESS_LA48. We use 4-level EPT pages, correct the upper bound. Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27402	2020-11-29 10:32:38 +00:00
Peter Grehan	15add60d37	Convert vmm_ops calls to IFUNC There is no need for these to be function pointers since they are never modified post-module load. Rename AMD/Intel ops to be more consistent. Submitted by: adam_fenn.io Reviewed by: markj, grehan Approved by: grehan (bhyve) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D27375	2020-11-28 01:16:59 +00:00
Peter Grehan	ddfc488c36	Remove manual instruction encodings for VMLOAD, VMRUN, and VMSAVE. This is a relic from when these instructions weren't supported by the toolchain. No functional change. Submitted by: adam_fenn.io Reviewed by: grehan Approved by: grehan (bhyve) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D27130	2020-11-26 05:58:55 +00:00
John Baldwin	908dca3ef4	Pull the check for VM ownership into ppt_find(). This reduces some code duplication. One behavior change is that ppt_assign_device() will now only succeed if the device is unowned. Previously, a device could be assigned to the same VM multiple times, but each time it was assigned, the device's state was reset. Reviewed by: markj, grehan MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D27301	2020-11-24 23:56:33 +00:00
John Baldwin	1925586e03	Honor the disabled setting for MSI-X interrupts for passthrough devices. Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough device and invoke it if a write to the MSI-X capability registers disables MSI-X. This avoids leaving MSI-X interrupts enabled on the host if a guest device driver has disabled them (e.g. as part of detaching a guest device driver). This was found by Chelsio QA when testing that a Linux guest could switch from MSI-X to MSI interrupts when using the cxgb4vf driver. While here, explicitly fail requests to enable MSI on a passthrough device if MSI-X is enabled and vice versa. Reported by: Sony Arpita Das @ Chelsio Reviewed by: grehan, markj MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D27212	2020-11-24 23:18:52 +00:00
Mark Johnston	6f5a960678	vmm: Make pmap_invalidate_ept() wait synchronously for guest exits Currently EPT TLB invalidation is done by incrementing a generation counter and issuing an IPI to all CPUs currently running vCPU threads. The VMM inner loop caches the most recently observed generation on each host CPU and invalidates TLB entries before executing the VM if the cached generation number is not the most recent value. pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing guest instructions and reload the generation number. However, it does not actually wait for vCPUs to exit, potentially creating a window where guests may continue to reference stale TLB entries. Fix the problem by bracketing guest execution with an SMR read section which is entered before loading the invalidation generation. Then, pmap_invalidate_ept() increments the current write sequence before loading pm_active and sending IPIs, and polls readers to ensure that all vCPUs potentially operating with stale TLB entries have exited before pmap_invalidate_ept() returns. Also ensure that unsynchronized loads of the generation counter are wrapped with atomic(9), and stop (inconsistently) updating the invalidation counter and pm_active bitmask with acquire semantics. Reviewed by: grehan, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26910	2020-11-11 15:01:17 +00:00
Mark Johnston	8e2cbc5660	vmx: Implement pmap (de)activation in C Rewrite the code that maintains pm_active and invalidates EPTP-tagged TLB entries in C. Previously this work was done in vmx_enter_guest(), in assembly, but there is no good reason for that and it makes the TLB invalidation algorithm for nested page tables harder to review. No functional change intended. Now, an error from the invept instruction results in a kernel panic rather than a vmexit. Such errors should occur only as a result of VMM bugs. Reviewed by: grehan, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26830	2020-10-19 15:24:35 +00:00
Mark Johnston	494955366a	Remove svn:executable from a couple of vmm(4) source files. MFC after: 3 days	2020-10-01 22:20:29 +00:00
John Baldwin	a3f2a9c57e	Clear the upper 32-bits of registers in x86_emulate_cpuid(). Per the Intel manuals, CPUID is supposed to unconditionally zero the upper 32 bits of the involved (rax/rbx/rcx/rdx) registers. Previously, the emulation would cast pointers to the 64-bit register values down to `uint32_t`, which while properly manipulating the lower bits, would leave any garbage in the upper bits uncleared. While no existing guest OSes seem to stumble over this in practice, the bhyve emulation should match x86 expectations. This was discovered through alignment warnings emitted by gcc9, while testing it against SmartOS/bhyve. SmartOS bug: https://smartos.org/bugview/OS-8168 Submitted by: Patrick Mooney Reviewed by: rgrimes Differential Revision: https://reviews.freebsd.org/D24727	2020-10-01 16:45:11 +00:00
Ed Maste	09860d44e4	bhyve: do not permit write access to VMCB / VMCS Reported by: Patrick Mooney Submitted by: jhb Security: CVE-2020-24718	2020-09-15 21:04:27 +00:00
Konstantin Belousov	101d5b527a	bhyve: intercept AMD SVM instructions. Intercept and report #UD to VM on SVM/AMD in case VM tried to execute an SVM instruction. Otherwise, SVM allows execution of them, and instructions operate on host physical addresses despite being executed in guest mode. Reported by: Maxime Villard <max@m00nbsd.net> admbug: 972 CVE: CVE-2020-7467 Reviewed by: grehan, markj Differential revision: https://reviews.freebsd.org/D26313	2020-09-15 20:22:50 +00:00
John Baldwin	385f4a5ac8	Use vmcb_read/write for the vmcb snapshot functions. This avoids some unnecessary layers of indirection.	2020-09-10 22:22:23 +00:00
Mateusz Guzik	543769bf83	amd64: clean up empty lines in .c and .h files	2020-09-01 21:16:54 +00:00
Konstantin Belousov	f3eb12e4a6	Add bhyve support for LA57 guest mode. Noted and reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273	2020-08-23 20:37:21 +00:00
Konstantin Belousov	9ce875d9b5	amd64 pmap: LA57 AKA 5-level paging Since LA57 was moved to the main SDM document with revision 072, it seems that we should have a support for it, and silicons are coming. This patch makes pmap support both LA48 and LA57 hardware. The selection of page table level is done at startup, kernel always receives control from loader with 4-level paging. It is not clear how UEFI spec would adapt LA57, for instance it could hand out control in LA57 mode sometimes. To switch from LA48 to LA57 requires turning off long mode, requesting LA57 in CR4, then re-entering long mode. This is somewhat delicate and done in pmap_bootstrap_la57(). AP startup in LA57 mode is much easier, we only need to toggle a bit in CR4 and load right value in CR3. I decided to not change kernel map for now. Single PML5 entry is created that points to the existing kernel_pml4 (KML4Phys) page, and a pml5 entry to create our recursive mapping for vtopte()/vtopde(). This decision is motivated by the fact that we cannot overcommit for KVA, so large space there is unusable until machines start providing wider physical memory addressing. Another reason is that I do not want to break our fragile autotuning, so the KVA expansion is not included into this first step. Nice side effect is that minidumps are compatible. On the other hand, (very) large address space is definitely immediately useful for some userspace applications. For userspace, numbering of pte entries (or page table pages) is always done for 5-level structures even if we operate in 4-level mode. The pmap_is_la57() function is added to report the mode of the specified pmap, this is done not to allow simultaneous 4-/5-levels (which is not allowed by hw), but to accomodate for EPT which has separate level control and in principle might not allow 5-leve EPT despite x86 paging supports it. Anyway, it does not seems critical to have 5-level EPT support now. Tested by: pho (LA48 hardware) Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273	2020-08-23 20:19:04 +00:00
Peter Grehan	3a3f1e9dfa	Export a routine to provide the TSC_AUX MSR value and use this in vmm. Also, drop an unnecessary set of braces. Requested by: kib Reviewed by: kib MFC after: 3 weeks	2020-08-18 11:36:38 +00:00
Peter Grehan	f5f5f1e7d6	Support guest rdtscp and rdpid instructions on Intel VT-x Enable any of rdtscp and/or rdpid for bhyve guests on Intel-based hosts that support the "enable RDTSCP" VM-execution control. Submitted by: adam_fenn.io Reported by: chuck Reviewed by: chuck, grehan, jhb Approved by: jhb (bhyve), grehan MFC after: 3 weeks Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D26003	2020-08-18 07:23:47 +00:00
Peter Grehan	46567b4f5e	Allow guest device MMIO access from bootmem memory segments. Recent versions of UEFI have moved local APIC timer initialization into the early SEC phase which runs out of ROM, prior to self-relocating into RAM. This results in a hypervisor exit. Currently bhyve prevents instruction emulation from segments that aren't marked as "sysmem" aka guest RAM, with the vm_gpa_hold() routine failing. However, there is no reason for this restriction: the hypervisor already controls whether EPT mappings are marked as executable. Fix by dropping the redundant check of sysmem. MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D25955	2020-08-18 07:08:17 +00:00
Conrad Meyer	4daa95f85d	bhyve(8): For prototyping, reattempt decode in userspace If userspace has a newer bhyve than the kernel, it may be able to decode and emulate some instructions vmm.ko is unaware of. In this scenario, reset decoder state and try again. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D24464	2020-06-25 00:18:42 +00:00
Conrad Meyer	f4ce062964	vmm(4): Add 12 user ABI compat after r349948 Reported by: kp Reviewed by: jhb, kp Tested by: kp Differential Revision: https://reviews.freebsd.org/D24929	2020-05-20 17:27:54 +00:00
Conrad Meyer	8a68ae80f6	vmm(4), bhyve(8): Expose kernel-emulated special devices to userspace Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace for use in, e.g., fallback instruction emulation (when userspace has a newer instruction decode/emulation layer than the kernel vmm(4)). Plumb the ioctl through libvmmapi and register the memory ranges in bhyve(8). Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D24525	2020-05-15 15:54:22 +00:00
Peter Grehan	ec048c7550	Hide host CPUID 0x15 TSC/Crystal ratio/freq info from guest In recent Linux (5.3+) and OpenBSD (6.6+) kernels, and with hosts that support CPUID 0x15, the local APIC frequency is determined directly from the reported crystal clock to avoid calibration against the 8254 timer. However, the local APIC frequency implemented by bhyve is 128MHz, where most h/w systems report frequencies around 25MHz. This shows up on OpenBSD guests as repeated keystrokes on the emulated PS2 keyboard when using VNC, since the kernel's timers are now much shorter. Fix by reporting all-zeroes for CPUID 0x15. This allows guests to fall back to using the 8254 to calibrate the local APIC frequency. Future work could be to compute values returned for 0x15 that would match the host TSC and bhyve local APIC frequency, though all dependencies on this would need to be examined (for example, Linux will start using 0x16 for some hosts). PR: 246321 Reported by: Jason Tubnor (and tested) Reviewed by: jhb Approved by: jhb, bz (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D24837	2020-05-14 22:18:12 +00:00
John Baldwin	483d953a86	Initial support for bhyve save and restore. Save and restore (also known as suspend and resume) permits a snapshot to be taken of a guest's state that can later be resumed. In the current implementation, bhyve(8) creates a UNIX domain socket that is used by bhyvectl(8) to send a request to save a snapshot (and optionally exit after the snapshot has been taken). A snapshot currently consists of two files: the first holds a copy of guest RAM, and the second file holds other guest state such as vCPU register values and device model state. To resume a guest, bhyve(8) must be started with a matching pair of command line arguments to instantiate the same set of device models as well as a pointer to the saved snapshot. While the current implementation is useful for several uses cases, it has a few limitations. The file format for saving the guest state is tied to the ABI of internal bhyve structures and is not self-describing (in that it does not communicate the set of device models present in the system). In addition, the state saved for some device models closely matches the internal data structures which might prove a challenge for compatibility of snapshot files across a range of bhyve versions. The file format also does not currently support versioning of individual chunks of state. As a result, the current file format is not a fixed binary format and future revisions to save and restore will break binary compatiblity of snapshot files. The goal is to move to a more flexible format that adds versioning, etc. and at that point to commit to providing a reasonable level of compatibility. As a result, the current implementation is not enabled by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option for userland builds, and the kernel option BHYVE_SHAPSHOT. Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz Relnotes: yes Sponsored by: University Politehnica of Bucharest Sponsored by: Matthew Grooms (student scholarships) Sponsored by: iXsystems Differential Revision: https://reviews.freebsd.org/D19495	2020-05-05 00:02:04 +00:00
Conrad Meyer	47332982bc	vmm(4): Decode and emulate BEXTR Clang 10 -march=native kernels on znver1 emit BEXTR for APIC reads, apparently. Decode and emulate the instruction. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D24463	2020-04-21 21:34:24 +00:00
Conrad Meyer	cfdea69d24	vmm(4): Decode 3-byte VEX-prefixed instructions Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D24462	2020-04-21 21:33:06 +00:00
Conrad Meyer	00d3723fb4	vmm(4): Bump VM_MAX_MEMMAPS for vmgenid As a short term solution for the problem reported by Shawn Webb re: r359950, bump the maximum number of memmaps per VM. This structure is 40 bytes, and the additional four (fixed array embedded in the struct vm) members increase the size of struct vm by 3%. (The vast majority of struct vm is the embedded struct vcpu array, which accounts for 84% of the size -- over 4 kB.) Reported by: Shawn Webb <shawn.webb AT hardenedbsd.org> Reviewed by: grehan X-MFC-With: r359950 Differential Revision: https://reviews.freebsd.org/D24507	2020-04-19 23:53:47 +00:00
Conrad Meyer	b645fd4531	vmm(4): Expose instruction decode to userspace build Permit instruction decoding logic to be compiled outside of the kernel for rapid iteration and validation. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D24439	2020-04-16 16:50:33 +00:00
Jung-uk Kim	3ee58df503	Merge ACPICA 20200326.	2020-03-27 00:29:33 +00:00
Michael Reifenberger	1bc51bad2b	Untangle TPR shadowing and APIC virtualization. This speeds up Windows guests tremendously. The patch does: Add a new tuneable 'hw.vmm.vmx.use_tpr_shadowing' to disable TLP shadowing. Also add 'hw.vmm.vmx.cap.tpr_shadowing' to be able to query if TPR shadowing is used. Detach the initialization of TPR shadowing from the initialization of APIC virtualization. APIC virtualization still needs TPR shadowing, but not vice versa. Any CPU that supports APIC virtualization should also support TPR shadowing. When TPR shadowing is used, the APIC page of each vCPU is written to the VMCS_VIRTUAL_APIC field of the VMCS so that the CPU can write directly to the page without intercept. On vm exit, vlapic_update_ppr() is called to update the PPR. Submitted by: Yamagi Burmeister MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22942	2020-03-10 16:53:49 +00:00
Pawel Biernacki	b40598c539	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (4 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Reviewed by: kib Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D23625 X-Generally looks fine: jhb	2020-02-15 18:57:49 +00:00
Konstantin Belousov	caab504277	vmm: Add Hygon Dhyana support. Submitted by: Pu Wen <puwen@hygon.cn> Discussed with: grehan Reviewed by: jhb (previous version) MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23553	2020-02-13 19:03:12 +00:00
Konstantin Belousov	b837dadd87	bhyve: terminate waiting loops if thread suspension is requested. PR: 242724 Reviewed by: markj Reported and tested by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com> (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22881	2020-01-02 22:37:04 +00:00
John Baldwin	cbd03a9df2	Support software breakpoints in the debug server on Intel CPUs. - Allow the userland hypervisor to intercept breakpoint exceptions (BP#) in the guest. A new capability (VM_CAP_BPT_EXIT) is used to enable this feature. These exceptions are reported to userland via a new VM_EXITCODE_BPT that includes the length of the original breakpoint instruction. If userland wishes to pass the exception through to the guest, it must be explicitly re-injected via vm_inject_exception(). - Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH pseudo-register. Injecting a BP# on Intel requires setting this to the length of the breakpoint instruction. AMD SVM currently ignores writes to this register (but reports success) and fails to read it. - Rework the per-vCPU state tracked by the debug server. Rather than a single 'stepping_vcpu' global, add a structure for each vCPU that tracks state about that vCPU ('stepping', 'stepped', and 'hit_swbreak'). A global 'stopped_vcpu' tracks which vCPU is currently reporting an event. Event handlers for MTRAP and breakpoint exits loop until the associated event is reported to the debugger. Breakpoint events are discarded if the breakpoint is not present when a vCPU resumes in the breakpoint handler to retry submitting the breakpoint event. - Maintain a linked-list of active breakpoints in response to the GDB 'Z0' and 'z0' packets. Reviewed by: markj (earlier version) MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D20309	2019-12-13 19:21:58 +00:00
Anish Gupta	84474332d3	bhyve amd: amdvi_dump_cmds() log the command for which the command completion failed. Completion is checked in poll mode although it can be done using interrupts. No need to log all the commands in command ring but only the last one for which completion failed. Reported by: np@freebsd.org Reviewed by: np, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22566	2019-12-01 04:00:08 +00:00
Konstantin Belousov	a7af4a3e7d	amd64: move GDT into PCPU area. Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22302	2019-11-12 15:51:47 +00:00
Andriy Gapon	869dbab7ba	vmm: remove a wmb() call After removing wmb(), vm_set_rendezvous_func() became super trivial, so there was no point in keeping it. The wmb (sfence on amd64, lock nop on i386) was not needed. This can be explained from several points of view. First, wmb() is used for store-store ordering (although, the primitive is undocumented). There was no obvious subsequent store that needed the barrier. Second, x86 has a memory model with strong ordering including total store order. An explicit store barrier may be needed only when working with special memory (device, special caching mode) or using special instructions (non-temporal stores). That was not the case for this code. Third, I believe that there is a misconception that sfence "flushes" the store buffer in a sense that it speeds up the propagation of stores from the store buffer to the global visibility. I think that such propagation always happens as fast as possible. sfence only makes subsequent stores wait for that propagation to complete. So, sfence is only useful for ordering of stores and only in the situations described above. Reviewed by: jhb MFC after: 23 days Differential Revision: https://reviews.freebsd.org/D21978	2019-10-19 07:10:15 +00:00
Mark Johnston	d3588766e1	Correct the scope of several global variables. They are accessed from multiple compilation units. No functional change intended. MFC after: 1 week Sponsored by: Netflix	2019-09-27 21:04:33 +00:00
Konstantin Belousov	df08823d07	Improve MD page fault handlers. Centralize calculation of signal and ucode delivered on unhandled page fault in new function vm_fault_trap(). MD trap_pfault() now almost always uses the signal numbers and error codes calculated in consistent MI way. This introduces the protection fault compatibility sysctls to all non-x86 architectures which did not have that bug, but apparently they were already much more wrong in selecting delivered signals on protection violations. Change the delivered signal for accesses to mapped area after the backing object was truncated. According to POSIX description for mmap(2): The system shall always zero-fill any partial page at the end of an object. Further, the system shall never write out any modified portions of the last page of an object which are beyond its end. References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal. An implementation may generate SIGBUS signals when a reference would cause an error in the mapped object, such as out-of-space condition. Adjust according to the description, keeping the existing compatibility code for SIGSEGV/SIGBUS on protection failures. For situations where kernel cannot handle page fault due to resource limit enforcement, SIGBUS with a new error code BUS_OBJERR is delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on amd64 due to protection key access violation. vm_fault_hold() is renamed to vm_fault(). Fixed some nits in trap_pfault()s like mis-interpreting Mach errors as errnos. Removed unneeded truncations of the fault addresses reported by hardware. PR: 211924 Reviewed by: alc Discussed with: jilles, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21566	2019-09-27 18:43:36 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
John Baldwin	6a1e1c2c48	Simplify bhyve vlapic ESR logic. The bhyve virtual local APIC uses an instance-global flag to indicate when an error LVT is being delivered to prevent infinite recursion. Use a function argument instead to reduce the amount of instance-global state. This was inspired by reviewing the bhyve save/restore work, which saves a copy of the instance-global state for each vlapic. Smart OS bug: https://smartos.org/bugview/OS-7777 Submitted by: Patrick Mooney Reviewed by: markj, rgrimes Obtained from: SmartOS / Joyent Differential Revision: https://reviews.freebsd.org/D20365	2019-08-29 18:23:38 +00:00
John Baldwin	e08087ee43	Use get_pcpu() to fetch the current CPU's pcpu pointer. This avoids encoding knowledge about how pcpu objects are allocated and is also a few instructions shorter. MFC after: 2 weeks	2019-08-28 23:40:57 +00:00
Ed Maste	ba084c18de	sys/{x86,amd64}: remove one of doubled ;s MFC after: 1 week	2019-08-13 19:39:36 +00:00
Mark Johnston	13a7c4d478	Use designated initializers for vmm_ops. MFC after: 3 days	2019-08-07 19:45:44 +00:00
Konstantin Belousov	e550631697	bhyve: Ignore MSI/MSI-X interrupts sent to non-active vCPUs in physical destination mode. This is mostly a nop, because the vmm initializes all vCPUs up to vm_maxcpus, so even if the target CPU is not active, lapic/vlapic code still has the valid data to use. As John notes, dropping such interrupts more closely matches the real harware, which ignores all interrupts for not started APs. Reviewed by: jhb admbugs: 837 MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-08-03 16:57:14 +00:00
Ed Maste	490d56c527	vmx: use C99 bool, not boolean_t Bhyve's vmm is a self-contained modern component and thus a good candidate for use of C99 types. Reviewed by: jhb, kib, markj, Patrick Mooney MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21036	2019-08-01 02:16:48 +00:00
John Baldwin	87c39157c6	Improve the precision of bhyve's vPIT. Use 'struct bintime' instead of 'sbintime_t' to manage times in vPIT to postpone rounding to final results rather than intermediate results. In tests performed by Joyent, this reduced the error measured by Linux guests by 59 ppm. Smart OS bug: https://smartos.org/bugview/OS-6923 Submitted by: Patrick Mooney Reviewed by: rgrimes Obtained from: SmartOS / Joyent MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20335	2019-07-20 15:59:49 +00:00
Konstantin Belousov	026e450262	Fix syntax. Nod from: jhb Sponsored by: The FreeBSD Foundation	2019-07-12 19:14:52 +00:00
Scott Long	422a8a4d3a	Tie the name limit of a VM to SPECNAMELEN from devfs instead of a hard-coded value. Don't allocate space for it from the kernel stack. Account for prefix, suffix, and separator space in the name. This takes the effective length up to 229 bytes on 13-current, and 37 bytes on 12-stable. 37 bytes is enough to hold a full GUID string. PR: 234134 MFC after: 1 week Differential Revision: http://reviews.freebsd.org/D20924	2019-07-12 18:37:56 +00:00
Mark Johnston	eeacb3b02f	Merge the vm_page hold and wire mechanisms. The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247	2019-07-08 19:46:20 +00:00
Rodney W. Grimes	e4da41f932	Emulate the "TEST r/m{16,32,64}, imm{16,32,32}" instructions (opcode F7H). This adds emulation for: test r/m16, imm16 test r/m32, imm32 test r/m64, imm32 sign-extended to 64 OpenBSD guests compiled with clang 8.0.0 use TEST directly against a Local APIC register instead of separate read via MOV followed by a TEST against the register. PR: 238794 Submitted by: jhb Reported by: Jason Tubnor jason@tubnor.net Tested by: Jason Tubnor jason@tubnor.net Reviewed by: markj, Patrick Mooney patrick.mooney@joyent.com MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D20755	2019-06-26 21:19:43 +00:00

1 2 3 4 5 ...

526 Commits