freebsd-skq

Author	SHA1	Message	Date
Alexander Motin	f7c5bc2cfe	Use ahci_write_fis_d2h() for commands completion. MFC after: 2 weeks	2015-03-13 18:04:07 +00:00
Alexander Motin	0b9d25c935	Add DSM TRIM command support for virtual AHCI disks. It works only for virtual disks backed by ZVOLs and raw devices supporting BIO_DELETE. Virtual disks backed by files won't report this capability. MFC after: 2 weeks Relnotes: yes	2015-03-13 16:43:52 +00:00
Alexander Motin	f5f4836d62	Add variable initialization missed by me and clang. Reported by: grehan MFC after: 2 weeks	2015-03-05 20:29:18 +00:00
Alexander Motin	371f1d88b6	Fix error translation broken in r279658. Reported by: grehan MFC after: 2 weeks	2015-03-05 20:24:34 +00:00
Alexander Motin	2d678f1f4f	Implement cache flush for ahci-hd and for virtio-blk over device. MFC after: 2 weeks	2015-03-05 15:29:18 +00:00
Alexander Motin	d951589ddb	Add check for absent stripe size to r279652. MFC after: 2 weeks	2015-03-05 13:52:30 +00:00
Alexander Motin	94682383d9	Report logical/physical sector sizes for virtual SATA disk. MFC after: 2 weeks	2015-03-05 12:21:12 +00:00
Alexander Motin	297c4868dd	Add support for TOPOLOGY feature of virtio block device. Passing through physical block size/offset from underlying storage allows guest to manage proper data and I/O alignment to improve performance. MFC after: 2 weeks	2015-03-05 10:40:45 +00:00
Baptiste Daroussin	c2e2d02cbe	Make FreeBSD-bhyve an indivual package	2015-03-05 07:30:48 +00:00
Neel Natu	12f91c70a3	Emulate MSR 0xC0011024 when running on AMD processors. OpenBSD guests test bit 0 of this MSR to detect whether the workaround for erratum 721 has been applied. Reported by: Jason Tubnor (jason@tubnor.net) MFC after: 1 week	2015-02-24 05:15:40 +00:00
Neel Natu	c974767896	Add "-u" option to bhyve(8) to indicate that the RTC should maintain UTC time. The default remains localtime for compatibility with the original device model in bhyve(8). This is required for OpenBSD guests which assume that the RTC keeps UTC time. Reviewed by: grehan Pointed out by: Jason Tubnor (jason@tubnor.net) MFC after: 2 weeks	2015-02-24 02:04:16 +00:00
Peter Grehan	65392c66a5	Don't close a block context if it couldn't be opened, for example if the backing file doesn't exist, avoiding a null deref. Reviewed by: neel MFC after: 1 week.	2015-02-23 22:31:39 +00:00
Neel Natu	d087a39935	Simplify instruction restart logic in bhyve. Keep track of the next instruction to be executed by the vcpu as 'nextrip'. As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should start execution. Also, instruction restart happens implicitly via 'vm_inject_exception()' or explicitly via 'vm_restart_instruction()'. The APIs behave identically in both kernel and userspace contexts. The main beneficiary is the instruction emulation code that executes in both contexts. bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length' as readonly: - Restarting an instruction is now done by calling 'vm_restart_instruction()' as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout()) - Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch()) Differential Revision: https://reviews.freebsd.org/D1526 Reviewed by: grehan MFC after: 2 weeks	2015-01-18 03:08:30 +00:00
Neel Natu	0dafa5cd4b	Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko. The new RTC emulation supports all interrupt modes: periodic, update ended and alarm. It is also capable of maintaining the date/time and NVRAM contents across virtual machine reset. Also, the date/time fields can now be modified by the guest. Since bhyve now emulates both the PIT and the RTC there is no need for "Legacy Replacement Routing" in the HPET so get rid of it. The RTC device state can be inspected via bhyvectl as follows: bhyvectl --vm=vm --get-rtc-time bhyvectl --vm=vm --set-rtc-time=<unix_time_secs> bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value> Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D1385 MFC after: 2 weeks	2014-12-30 22:19:34 +00:00
Baptiste Daroussin	c6db8143ed	Convert usr.sbin to LIBADD Reduce overlinking	2014-11-25 16:57:27 +00:00
Edward Tomasz Napierala	aca4343c62	Fix improper .Fx macro usage. Differential Revision: https://reviews.freebsd.org/D1158 Reviewed by: wblock@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2014-11-19 18:19:21 +00:00
Tycho Nightingale	48a9d8f214	To allow a request to be submitted from within the callback routine of a completing one increase the total by 1 but don't advertise it. Reviewed by: grehan	2014-11-09 21:08:52 +00:00
Tycho Nightingale	ae45750d6c	Improve the ability to cancel an in-flight request by using an interrupt, via SIGCONT, to force the read or write system call to return prematurely. Reviewed by: grehan	2014-11-04 01:06:33 +00:00
Tycho Nightingale	26bf96112b	If the start bit, PxCMD.ST, is cleared and nothing is in-flight then PxCI, PxSACT, PxCMD.CCS and PxCMD.CR should be 0. Reviewed by: grehan	2014-11-03 12:55:31 +00:00
Neel Natu	c17d4a83b8	Add a comment explaining the intent behind the I/O reservation [0x72-0x77].	2014-10-26 21:17:44 +00:00
Neel Natu	160ef77abf	Move the ACPI PM timer emulation into vmm.ko. This reduces variability during timer calibration by keeping the emulation "close" to the guest. Additionally having all timer emulations in the kernel will ease the transition to a per-VM clock source (as opposed to using the host's uptime keep track of time). Discussed with: grehan	2014-10-26 04:44:28 +00:00
Neel Natu	e1a172e1c2	IFC @r273214	2014-10-20 02:57:30 +00:00
Neel Natu	592cd7d3be	Don't advertise the "OS visible workarounds" feature in cpuid.80000001H:ECX. bhyve doesn't emulate the MSRs needed to support this feature at this time. Don't expose any model-specific RAS and performance monitoring features in cpuid leaf 80000007H. Emulate a few more MSRs for AMD: TSEG base address, TSEG address mask and BIOS signature and P-state related MSRs. This eliminates all the unimplemented MSRs accessed by Linux/x86_64 kernels 2.6.32, 3.10.0 and 3.17.0.	2014-10-19 21:38:58 +00:00
Tycho Nightingale	3ef05c4677	Support stopping and restarting the AHCI command list via toggling PxCMD.ST from '1' to '0' and back. This allows the driver a chance to recover if for instance a timeout occurred due to activity on the host. Reviewed by: grehan	2014-10-17 11:37:50 +00:00
Neel Natu	2688a818a3	Don't advertise the Instruction Based Sampling feature because it requires emulating a large number of MSRs. Ignore writes to a couple more AMD-specific MSRs and return 0 on read. This further reduces the unimplemented MSRs accessed by a Linux guest on boot.	2014-10-17 06:23:04 +00:00
Neel Natu	02904c45ab	Hide extended PerfCtr MSRs on AMD processors by clearing bits 23, 24 and 28 in CPUID.80000001H:ECX. Handle accesses to PerfCtrX and PerfEvtSelX MSRs by ignoring writes and returning 0 on reads. This further reduces the number of unimplemented MSRs hit by a Linux guest during boot.	2014-10-17 03:04:38 +00:00
Neel Natu	913d54b96e	Emulate the "Hardware Configuration" MSR when running on an AMD host. This gets rid of the "TSC doesn't count with P0 frequency!" message when booting a Linux guest. Tested on an "AMD Opteron 6320" courtesy of Ben Perrault.	2014-10-16 19:27:26 +00:00
Neel Natu	ed6aacb51f	IFC @r272887	2014-10-10 23:52:56 +00:00
Neel Natu	5295c3e61d	Support Intel-specific MSRs that are accessed when booting up a linux in bhyve: - MSR_PLATFORM_INFO - MSR_TURBO_RATIO_LIMITx - MSR_RAPL_POWER_UNIT Reviewed by: grehan MFC after: 1 week	2014-10-09 19:13:33 +00:00
Neel Natu	02c282e862	iasl(8) expects integer fields in data tables to be specified as hexadecimal values. Therefore the bit width of the "PM Timer Block" was actually being interpreted as 50-bits instead of the expected 32-bit. This eliminates an error message emitted by a Linux 3.17 guest during boot: "Invalid length for FADT/PmTimerBlock: 50, using default 32" Reviewed by: grehan MFC after: 1 week	2014-10-09 19:02:32 +00:00
Neel Natu	8ccb28efcd	Implement the FLUSH operation in the virtio-block emulation. This gets rid of the following error message during FreeBSD guest bootup: "vtbd0: hard error cmd=flush fsbn 0" Reported by: rodrigc Reviewed by: grehan	2014-10-07 17:08:53 +00:00
Neel Natu	107af8f2ed	IFC @r272481	2014-10-05 01:28:21 +00:00
Peter Grehan	8b58e6af3c	Add new fields in the FADT, required by IASL 20140926-64. The new IASL from the recent acpi-ca import will error out if it doesn't see these new fields, which were previously reserved. Reported by: lme Reviewed by: neel	2014-10-03 17:27:30 +00:00
Neel Natu	970388bf8d	IFC @r272185	2014-09-27 22:15:50 +00:00
Peter Grehan	5ed6ab5baa	Correct display of bhyve SMBIOS UUIDs with dmidecode by bumping the version. The mixed little/big-endianness of SMBIOS UUIDs was clarified in v2.6 of the SMBIOS spec. dmidecode uses the reported version of SMBIOS to determine the layout and what to byte-swap. bhyve's SMBIOS reported as 2.4 though it implemented the 2.6-style of memory layout. This resulted in dmidecode reporting a different UUID than one passed in via the -U option. Fix by exporting a version of 2.6. Reviewed by: tychon Reported by: julian MFC after: 1 day	2014-09-23 01:17:22 +00:00
Neel Natu	8f02c5e456	IFC r271888. Restructure MSR emulation so it is all done in processor-specific code.	2014-09-20 21:46:31 +00:00
Neel Natu	b6cf6c8ca6	IFC @r271887	2014-09-20 06:27:37 +00:00
Neel Natu	c3498942a5	Restructure the MSR handling so it is entirely handled by processor-specific code. There are only a handful of MSRs common between the two so there isn't too much duplicate functionality. The VT-x code has the following types of MSRs: - MSRs that are unconditionally saved/restored on every guest/host context switch (e.g., MSR_GSBASE). - MSRs that are restored to guest values on entry to vmx_run() and saved before returning. This is an optimization for MSRs that are not used in host kernel context (e.g., MSR_KGSBASE). - MSRs that are emulated and every access by the guest causes a trap into the hypervisor (e.g., MSR_IA32_MISC_ENABLE). Reviewed by: grehan	2014-09-20 02:35:21 +00:00
Neel Natu	4e27d36d38	IFC @r271694	2014-09-17 18:46:51 +00:00
Glen Barber	7fca1ad503	Update the bhyve(8) manual to reflect that it is no longer considered 'experimental.' Reviewed by: grehan MFC after: 3 days Sponsored by: The FreeBSD Foundation	2014-09-17 16:45:20 +00:00
Neel Natu	bbadcde418	Set the 'vmexit->inst_length' field properly depending on the type of the VM-exit and ultimately on whether nRIP is valid. This allows us to update the %rip after the emulation is finished so any exceptions triggered during the emulation will point to the right instruction. Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability is available. The effective segment field in EXITINFO1 is not valid without this capability. Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging. Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2 fields in bhyve(8). Reviewed by: Anish Gupta (akgupt3@gmail.com) Reviewed by: grehan	2014-09-14 04:39:04 +00:00
Neel Natu	1aba8e7ff8	Initialize 'bc_rdonly' to the right value. Note that independent of this change a readonly disk file would still be opened O_RDONLY and protected from writes by the guest. Reviewed by: grehan	2014-09-11 21:15:20 +00:00
Peter Grehan	82560f19d0	Allow vtnet operation without merged rx buffers. NetBSD's virtio-net implementation doesn't negotiate the merged rx-buffers feature. To support this, check to see if the feature was negotiated, and then adjust the operation of the receive path accordingly by using a larger iovec, and a smaller rx header. In addition, ignore writes to the (read-only) status byte. Tested with NetBSD/amd64 5.2.2, 6.1.4 and 7-beta. Reviewed by: neel, tychon Phabric: D745 MFC after: 3 days	2014-09-09 22:35:02 +00:00
Peter Grehan	e18f344b9b	Add a callback to be notified about negotiated features. Submitted by: luigi Obtained from: Vincenzo Maffione, Universita` di Pisa MFC after: 3 days	2014-09-09 04:11:54 +00:00
Neel Natu	04da7226c4	Set the 'inst_length' to '0' early on before any error conditions are detected in the emulation of the task switch. If any exceptions are triggered then the guest %rip should point to instruction that caused the task switch as opposed to the one after it.	2014-08-30 18:35:16 +00:00
Tycho Nightingale	b297e71ede	Fix a recursive lock acquisition in vi_reset_dev(). Reviewed by: grehan	2014-08-22 13:01:22 +00:00
Neel Natu	33424543f2	Minor cleanup: - Set 'pirq_cold' to '0' on the first PIRQ allocation. - Make assertions stronger. Reviewed by: jhb CR: https://phabric.freebsd.org/D592	2014-08-13 00:14:26 +00:00
Neel Natu	12a6eb99a1	Support PCI extended config space in bhyve. Add the ACPI MCFG table to advertise the extended config memory window. Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted or moved in the guest's address space. The PCI extended config space is an example of an immutable memory range. Add emulation for the "movzw" instruction. This instruction is used by FreeBSD to read a 16-bit extended config space register. CR: https://phabric.freebsd.org/D505 Reviewed by: jhb, grehan Requested by: tychon	2014-08-08 03:49:01 +00:00
Tycho Nightingale	42404fae46	Commands which encounter a fatal error shouldn't be marked as completed. Furthermore, provide an indication of the current command so it can be determined which one actually failed. Reviewed by: grehan	2014-07-30 18:47:31 +00:00
Neel Natu	afd5e8ba88	Simplify the meaning of return values from the inout handlers. After this change 0 means success and non-zero means failure. This also helps to eliminate VMEXIT_POWEROFF and VMEXIT_RESET as return values from VM-exit handlers. CR: D480 Reviewed by: grehan, jhb	2014-07-25 20:18:35 +00:00
Neel Natu	e84d8ebfcc	Reduce the proliferation of VMEXIT_RESTART in task_switch.c. This is in preparation for further simplification of the return values from VM exit handlers in bhyve(8).	2014-07-24 05:31:57 +00:00
Neel Natu	d37f2adb38	Fix fault injection in bhyve. The faulting instruction needs to be restarted when the exception handler is done handling the fault. bhyve now does this correctly by setting 'vmexit[vcpu].inst_length' to zero so the %rip is not advanced. A minor complication is that the fault injection APIs are used by instruction emulation code that is shared by vmm.ko and bhyve. Thus the argument that refers to 'struct vm ' in kernel or 'struct vmctx ' in userspace needs to be loosely typed as a 'void *'.	2014-07-24 01:38:11 +00:00
Neel Natu	d665d229ce	Emulate instructions emitted by OpenBSD/i386 version 5.5: - CMP REG, r/m - MOV AX/EAX/RAX, moffset - MOV moffset, AX/EAX/RAX - PUSH r/m	2014-07-23 04:28:51 +00:00
Neel Natu	091d453222	Handle nested exceptions in bhyve. A nested exception condition arises when a second exception is triggered while delivering the first exception. Most nested exceptions can be handled serially but some are converted into a double fault. If an exception is generated during delivery of a double fault then the virtual machine shuts down as a result of a triple fault. vm_exit_intinfo() is used to record that a VM-exit happened while an event was being delivered through the IDT. If an exception is triggered while handling the VM-exit it will be treated like a nested exception. vm_entry_intinfo() is used by processor-specific code to get the event to be injected into the guest on the next VM-entry. This function is responsible for deciding the disposition of nested exceptions.	2014-07-19 20:59:08 +00:00
Neel Natu	3d5444c864	Add emulation for legacy x86 task switching mechanism. FreeBSD/i386 uses task switching to handle double fault exceptions and this change enables that to work. Reported by: glebius	2014-07-16 21:26:26 +00:00
Peter Grehan	ad15140ee7	Use the blockif CHS routine to create fake CHS values, and then populate them in the identity page. This fixes a divide-by-zero error at probe time with NetBSD. MFC after: 1 week.	2014-07-15 00:27:08 +00:00
Peter Grehan	c4813fadf1	Add a call to synthesize a C/H/S value for block emulations that require it (ahci). The algorithm used is from the VHD specification.	2014-07-15 00:25:54 +00:00
Peter Grehan	18e32ebc89	Extend capabilities to 64-bits in preparation for some API changes. The v1.0 virtio spec supports an extended size for guest/host caps, but in practice 64-bits should last for a long time.	2014-07-05 02:38:53 +00:00
Peter Grehan	f23a8ac1b9	Use correct flag for event index. Submitted by: luigi Obtained from: Vincenzo Maffione, Universita` di Pisa MFC after: 1 week	2014-07-03 00:23:14 +00:00
Neel Natu	64fe72354c	Add post-mortem debugging for "EPT Misconfiguration" VM-exit. This error is hard to reproduce so try to collect all the breadcrumbs when it happens. Reviewed by: grehan	2014-06-27 18:00:38 +00:00
John Baldwin	cde1f5b8a0	Sort command flags in usage output and the manpages.	2014-06-27 15:20:34 +00:00
Peter Grehan	62f17e92fe	Set the version and date to fixed fields rather than using preprocessor macros that don't allow reproducible builds. As a side-effect, the date string is now spec-compliant. root@bhyve:~ # dmidecode # dmidecode 2.12 SMBIOS 2.4 present. 12 structures occupying 514 bytes. Table at 0x000F101F. Handle 0x0001, DMI type 0, 24 bytes BIOS Information Vendor: BHYVE Version: 1.0 Release Date: 03/14/2014 Submitted by: des (original version) Reviewed by: tychon MFC after: 1 week	2014-06-27 05:27:37 +00:00
John Baldwin	5749449d9b	- Document -b to enable the bvmcons console (but mark it as deprecated similar to -g.) - Document -U to set the SMBIOS UUID. - Add missing options to the usage output and to the manpage Synopsis. - Don't claim that bvmdebug is amd64-only (it is also a device, not an option).	2014-06-26 20:12:38 +00:00
Neel Natu	be679db4cd	Provide APIs to directly get 'lowmem' and 'highmem' size directly. Previously the sizes were inferred indirectly based on the size of the mappings at 0 and 4GB respectively. This works fine as long as size of the allocation is identical to the size of the mapping in the guest's address space. However, if the mapping is disjoint then this assumption falls apart (e.g., due to the legacy BIOS hole between 640KB and 1MB).	2014-06-24 02:02:51 +00:00
Baptiste Daroussin	01c2b8ac0d	use .Mt to mark up email addresses consistently (part2) PR: 191174 Submitted by: Franco Fichtner <franco@lastsummer.de>	2014-06-20 09:57:27 +00:00
Neel Natu	79aad80d3c	Fix typo and rename macro KDB_SYS_FLAG to KBD_SYS_FLAG. Reviewed by: tychon	2014-06-18 17:20:02 +00:00
Tycho Nightingale	67b6ffaad6	r267169 should apply to 64-bit BARs as well. Reviewed by: neel	2014-06-09 19:55:50 +00:00
Joel Dahl	087129d22c	Remove blank lines.	2014-06-09 19:29:10 +00:00
Tycho Nightingale	b6ae8b050b	Some devices (e.g. Intel AHCI and NICs) support quad-word access to register pairs where two 32-bit registers make up a larger logical size. Support those access by splitting the quad-word into two double-words. Reviewed by: grehan	2014-06-06 16:18:37 +00:00
Neel Natu	26cdcdbebb	Use MIN(a,b) from <sys/param.h> instead of rolling our own version. Pointed out by: grehan	2014-06-01 02:47:09 +00:00
Neel Natu	0be3798af5	Limit the maximum number of back-to-back iterations of a "rep; ins/outs" to 16. This is arbitrary and is used to ensure that a vcpu goes back into the vm_run() loop to process interrupts or rendezvous events in a timely fashion. Found with: Coverity Scan CID: 1216436	2014-06-01 02:13:07 +00:00
Neel Natu	95ebc360ef	Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing it implicitly in vmm.ko. Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and "--get-suspended-cpus" options. This is in preparation for being able to reset virtual machine state without having to destroy and recreate it.	2014-05-31 23:37:34 +00:00
Neel Natu	65ffa035a7	Add segment protection and limits violation checks in vie_calculate_gla() for 32-bit x86 guests. Tested using ins/outs executed in a FreeBSD/i386 guest.	2014-05-27 04:26:22 +00:00
Neel Natu	6303b65d35	Fix issue with restarting an "insb/insw/insl" instruction because of a page fault on the destination buffer. Prior to this change a page fault would be detected in vm_copyout(). This was done after the I/O port access was done. If the I/O port access had side-effects (e.g. reading the uart FIFO) then restarting the instruction would result in incorrect behavior. Fix this by validating the guest linear address before doing the I/O port emulation. If the validation results in a page fault exception being injected into the guest then the instruction can now be restarted without any side-effects.	2014-05-26 18:21:08 +00:00
Neel Natu	5382c19d81	Do the linear address calculation for the ins/outs emulation using a new API function 'vie_calculate_gla()'. While the current implementation is simplistic it forms the basis of doing segmentation checks if the guest is in 32-bit protected mode.	2014-05-25 00:57:24 +00:00
Neel Natu	da11f4aa1d	Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out of the guest linear address space. These APIs in turn use a new ioctl 'VM_GLA2GPA' to convert the guest linear address to guest physical. Use the new copyin/copyout APIs when emulating ins/outs instruction in bhyve(8).	2014-05-24 23:12:30 +00:00
Neel Natu	e813a87350	Consolidate all the information needed by the guest page table walker into 'struct vm_guest_paging'. Check for canonical addressing in vmm_gla2gpa() and inject a protection fault into the guest if a violation is detected. If the page table walk is restarted in vmm_gla2gpa() then reset 'ptpphys' to point to the root of the page tables.	2014-05-24 20:26:57 +00:00
Neel Natu	a7424861fb	Check for alignment check violation when processing in/out string instructions.	2014-05-23 19:59:14 +00:00
Neel Natu	d17b5104a9	Add emulation of the "outsb" instruction. NetBSD guests use this to write to the UART FIFO. The emulation is constrained in a number of ways: 64-bit only, doesn't check for all exception conditions, limited to i/o ports emulated in userspace. Some of these constraints will be relaxed in followup commits. Requested by: grehan Reviewed by: tychon (partially and a much earlier version)	2014-05-23 05:15:17 +00:00
John Baldwin	b3e9732a76	Implement a PCI interrupt router to route PCI legacy INTx interrupts to the legacy 8259A PICs. - Implement an ICH-comptabile PCI interrupt router on the lpc device with 8 steerable pins configured via config space access to byte-wide registers at 0x60-63 and 0x68-6b. - For each configured PCI INTx interrupt, route it to both an I/O APIC pin and a PCI interrupt router pin. When a PCI INTx interrupt is asserted, ensure that both pins are asserted. - Provide an initial routing of PCI interrupt router (PIRQ) pins to 8259A pins (ISA IRQs) and initialize the interrupt line config register for the corresponding PCI function with the ISA IRQ as this matches existing hardware. - Add a global _PIC method for OSPM to select the desired interrupt routing configuration. - Update the _PRT methods for PCI bridges to provide both APIC and legacy PRT tables and return the appropriate table based on the configured routing configuration. Note that if the lpc device is not configured, no routing information is provided. - When the lpc device is enabled, provide ACPI PCI link devices corresponding to each PIRQ pin. - Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A pins via the ELCR. - Mark the power management SCI as level triggered. - Don't hardcode the number of elements in Packages in the source for the DSDT. iasl(8) will fill in the actual number of elements, and this makes it simpler to generate a Package with a variable number of elements. Reviewed by: tycho	2014-05-15 14:16:55 +00:00
Neel Natu	0dd10c0047	Don't include the guest memory segments in the bhyve(8) process core dump. This has not added a lot of value when debugging bhyve issues while greatly increasing the time and space required to store the core file. Passing the "-C" option to bhyve(8) will change the default and dump guest memory in the core dump. Requested by: grehan Reviewed by: grehan	2014-05-13 16:40:27 +00:00
Neel Natu	ee2dbd023c	abort(3) the process in response to a VMEXIT_ABORT. This usually happens in response to an unhandled VM exit or an unexpected error so a core is useful. Remove unused macro VMEXIT_SWITCH. Reviewed by: grehan	2014-05-12 23:35:10 +00:00
Neel Natu	2bd073e13b	Disable the 'uart_drain()' callback when the emulated receive FIFO is full. Failing to do this will cause the kevent(2) notification to trigger continuously and the bhyve(8) mevent thread will hog the cpu until the characters on the backend tty device are drained. Also, make the uart backend file descriptor non-blocking to avoid a select(2) before every byte read from that backend. Reviewed by: grehan	2014-05-05 23:54:13 +00:00
Neel Natu	9b6155a20c	Modify the "-p" option to be more flexible when associating a 'vcpu' with a 'hostcpu'. The new format of the argument string is "vcpu:hostcpu". This allows pinning a subset of the vcpus if desired. It also allows pinning a vcpu to more than a single 'hostcpu'. Submitted by: novel (initial version)	2014-05-05 18:06:35 +00:00
Neel Natu	067824256f	Remove misleading "addcpu" in an error message emitted by fbsdrun_deletecpu(). Pointed out by: novel	2014-05-05 16:35:37 +00:00
Neel Natu	09fd42cb88	Re-adding an event to a kqueue modifies the parameters of the original event. However, if the original knote had been disabled then it is not automatically re-enabled. Fix this by using EV_ADD to create an mevent and EV_ENABLE to enable it. Adding a kevent for the first time implicitly enables it so existing callers of mevent_add() don't need to change. Reviewed by: grehan	2014-05-05 16:30:03 +00:00
Neel Natu	b100acf254	Don't allow MPtable generation if there are multiple PCI hierarchies. This is because there isn't a standard way to relay this information to the guest OS. Add a command line option "-Y" to bhyve(8) to inhibit MPtable generation. If the virtual machine is using PCI devices on buses other than 0 then it can still use ACPI tables to convey this information to the guest. Discussed with: grehan@	2014-05-02 04:51:31 +00:00
Neel Natu	e50ce2aa06	Add logic in the HLT exit handler to detect if the guest has put all vcpus to sleep permanently by executing a HLT with interrupts disabled. When this condition is detected the guest with be suspended with a reason of VM_SUSPEND_HALT and the bhyve(8) process will exit. Tested by executing "halt" inside a RHEL7-beta guest. Discussed with: grehan@ Reviewed by: jhb@, tychon@	2014-05-02 00:33:56 +00:00
Neel Natu	2cb97c9dd6	Ignore writes to microcode update MSR. This MSR is accessed by RHEL7 guest. Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.	2014-04-30 02:08:27 +00:00
Neel Natu	c6a0cc2e21	Some Linux guests will implement a 'halt' by disabling the APIC and executing the 'HLT' instruction. This condition was detected by 'vm_handle_hlt()' and converted into the SPINDOWN_CPU exitcode . The bhyve(8) process would exit the vcpu thread in response to a SPINDOWN_CPU and when the last vcpu was spun down it would reset the virtual machine via vm_suspend(VM_SUSPEND_RESET). This functionality was broken in r263780 in a way that made it impossible to kill the bhyve(8) process because it would loop forever in vm_handle_suspend(). Unbreak this by removing the code to spindown vcpus. Thus a 'halt' from a Linux guest will appear to be hung but this is consistent with the behavior on bare metal. The guest can be rebooted by using the bhyvectl options '--force-reset' or '--force-poweroff'. Reviewed by: grehan@	2014-04-29 18:42:56 +00:00
Neel Natu	f0fdcfe247	Allow a virtual machine to be forcibly reset or powered off. This is done by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF. The disposition of VM_SUSPEND is also made available to the exit handler via the 'u.suspended' member of 'struct vm_exit'. This capability is exposed via the '--force-reset' and '--force-poweroff' arguments to /usr/sbin/bhyvectl. Discussed with: grehan@	2014-04-28 22:06:40 +00:00
Peter Grehan	67e1705297	Implement legacy interrupts for the AHCI device emulation according to the method outlined in the AHCI spec. Tested with FreeBSD 9/10/11 with MSI disabled, and also NetBSD/amd64 (lightly). Reviewed by: neel, tychon MFC after: 3 weeks	2014-04-28 18:41:25 +00:00
Peter Grehan	fcbec69157	Respect and track the enable bit in the PCI configuration address word. Ignore writes, and return 0xff's, on config accesses when not set. Behaviour now matches that seen on h/w. Found with a NetBSD/amd64 guest. Reviewed by: tychon MFC after: 3 weeks	2014-04-25 17:35:34 +00:00
Tycho Nightingale	d42ea5731e	Provide a very basic stub for the 8042 PS/2 keyboard controller. Reviewed by: jhb Approved by: neel (co-mentor)	2014-04-25 13:38:18 +00:00
Xin LI	994f858a8b	Use calloc() in favor of malloc + memset. Reviewed by: neel	2014-04-22 18:55:21 +00:00
Tycho Nightingale	82c2c89084	Factor out common ioport handler code for better hygiene -- pointed out by neel@. Approved by: neel (co-mentor)	2014-04-22 16:13:56 +00:00
Tycho Nightingale	1d6be92ac6	Fix ACPI DSDT indentation cosmetic breakage introduced in r264631 -- pointed out by jhb@. Approved by: grehan (co-mentor)	2014-04-18 16:01:19 +00:00
Tycho Nightingale	d6aa08c3ef	Respect the destination operand size of the 'Input from Port' instruction. Approved by: grehan (co-mentor)	2014-04-18 15:22:56 +00:00
Tycho Nightingale	79d6ca331e	Add support for reading the PIT Counter 2 output signal via the NMI Status and Control register at port 0x61. Be more conservative about "catching up" callouts that were supposed to fire in the past by skipping an interrupt if it was scheduled too far in the past. Restore the PIT ACPI DSDT entries and add an entry for NMISC too. Approved by: neel (co-mentor)	2014-04-18 00:02:06 +00:00
Tycho Nightingale	b96be57a2d	Add support for emulating the slave PIC. Reviewed by: grehan, jhb Approved by: grehan (co-mentor)	2014-04-14 19:00:20 +00:00
Tycho Nightingale	8b4a7f857b	Constrain the amount of data returned to what is actually available not the size of the buffer. Approved by: grehan (co-mentor)	2014-04-09 14:50:55 +00:00
John Baldwin	2cf4f7ef79	Handle single-byte reads from the bvmcons port (0x220) by returning 0xff. Some guests may attempt to read from this port to identify psuedo-PNP ISA devices. (The ie(4) driver in FreeBSD/i386 is one example.) Reviewed by: grehan	2014-04-08 21:02:03 +00:00
Peter Grehan	9d0c4e17d9	Add support for the virtio RNG entropy-source device. Call through to /dev/random synchronously to fill virtio buffers with RNG data. Tested with FreeBSD-CURRENT and Ubuntu guests. Submitted by: Leon Dang Discussed with: markm MFC after: 3 weeks Sponsored by: Nahanni Systems	2014-04-02 20:18:17 +00:00
Neel Natu	b15a09c05e	Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called from any context i.e., it is not required to be called from a vcpu thread. The ioctl simply sets a state variable 'vm->suspend' to '1' and returns. The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The suspend handler waits until all 'vm->active_cpus' have transitioned to 'vm->suspended_cpus' before returning to userspace. Discussed with: grehan	2014-03-26 23:34:27 +00:00
Tycho Nightingale	e883c9bb40	Move the atpit device model from userspace into vmm.ko for better precision and lower latency. Approved by: grehan (co-mentor)	2014-03-25 19:20:34 +00:00
Neel Natu	0826d045cc	Use 'cpuset_t' to represent the vcpus active in a virtual machine.	2014-03-20 18:15:37 +00:00
Tycho Nightingale	4feac03f2c	Don't reissue in-flight commands. Approved by: neel (co-mentor)	2014-03-18 23:25:35 +00:00
Tycho Nightingale	7292923b49	Though there currently isn't a way to insert new media into an ATAPI drive, at least pretend to support Asynchronous Notification (AN) to avoid a guest needlessly polling for it. Approved by: grehan (co-mentor)	2014-03-16 12:33:40 +00:00
Tycho Nightingale	113d84c11d	Support the bootloader's single 16-bit 'outw' access to the Divisor Latch MSB and LSB registers. Approved by: neel (co-mentor)	2014-03-16 12:31:28 +00:00
Tycho Nightingale	762fd20804	Replace the userspace atpic stub with a more functional vmm.ko model. New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ can be used to manipulate the pic, and optionally the ioapic, pin state. Reviewed by: jhb, neel Approved by: neel (co-mentor)	2014-03-11 16:56:00 +00:00
Peter Grehan	f4959d3537	Open the uart emulation's backing tty in non-blocking mode. This fixes the issue of bhyve appearing to halt when using nmdm ports for the console, until a connection is made to the other end. bhyveload already does this. Reported by: Many. MFC after: 3 weeks.	2014-03-07 06:23:37 +00:00
Tycho Nightingale	af5bfc53b8	Add SMBIOS support. A new option, -U, can be used to set the UUID in the System Information (Type 1) structure. Manpage fix to follow. Approved by: grehan (co-mentor)	2014-03-04 17:12:06 +00:00
Neel Natu	9777ca203c	Document the "-a" and "-x" options to match the changes in r262236. Reviewed by: grehan	2014-02-26 19:14:54 +00:00
Neel Natu	dc50650607	Queue pending exceptions in the 'struct vcpu' instead of directly updating the processor-specific VMCS or VMCB. The pending exception will be delivered right before entering the guest. The order of event injection into the guest is: - hardware exception - NMI - maskable interrupt In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt window-exiting and inject it as soon as possible after the hardware exception is injected. Also since interrupts are inherently asynchronous, injecting them after the hardware exception should not affect correctness from the guest perspective. Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict it to only deliver x86 hardware exceptions. This new ioctl is now used to inject a protection fault when the guest accesses an unimplemented MSR. Discussed with: grehan, jhb Reviewed by: jhb	2014-02-26 00:52:05 +00:00
Peter Grehan	4258c52e29	Fix virtio spec URL. Submitted by: lwhsu MFC after: 1 week	2014-02-21 22:45:35 +00:00
Tycho Nightingale	182d7debb9	Avoid clobbering the counter mode when issuing a latch command. Approved by: grehan (co-mentor)	2014-02-21 01:15:26 +00:00
Neel Natu	52e5c8a2ec	Simplify APIC mode switching from MMIO to x2APIC. In part this is done to simplify the implementation of the x2APIC virtualization assist in VT-x. Prior to this change the vlapic allowed the guest to change its mode from xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked when the virtual machine is created. This is not very constraining because operating systems already have to deal with BIOS setting up the APIC in x2APIC mode at boot. Fix a bug in the CPUID emulation where the x2APIC capability was leaking from the host to the guest. Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore MSR accesses to the vlapic when it is in xAPIC mode. The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8) can be used to change the mode to x2APIC instead. Discussed with: grehan@	2014-02-20 01:48:25 +00:00
Neel Natu	7a902ec0ec	Add a check to validate that memory BARs of passthru devices are 4KB aligned. Also, the MSI-x table offset is not required to be 4KB aligned so take this into account when computing the pages occupied by the MSI-x tables.	2014-02-18 19:00:15 +00:00
John Baldwin	a96b8b801a	Tweak the handling of PCI capabilities in emulated devices to remove the non-standard zero capability list terminator. Instead, track the start and end of the most recently added capability and use that to adjust the previous capability's next pointer when a capability is added and to determine the range of config registers belonging to PCI capability registers. Reviewed by: neel	2014-02-18 03:00:20 +00:00
Neel Natu	06db1b4a59	Update bhyve(8) man page to describe the usage of the "-s" option to assign bus numbers to emulated devices. Also add the restriction that the LPC bridge emulation can only be configured on bus 0. Reviewed by: grehan@	2014-02-14 21:46:04 +00:00
Neel Natu	d84882ca8f	Allow PCI devices to be configured on all valid bus numbers from 0 to 255. This is done by representing each bus as root PCI device in ACPI. The device implements the _BBN method to return the PCI bus number to the guest OS. Each PCI bus keeps track of the resources that is decodes for devices configured on the bus: i/o, mmio (32-bit) and mmio (64-bit). These windows are advertised to the guest via the _CRS object of the root device. Bus 0 is treated specially since it consumes the I/O ports to access the PCI config space [0xcf8-0xcff]. It also decodes the legacy I/O ports that are consumed by devices on the LPC bus. For this reason the LPC bridge can be configured only on bus 0. The bus number can be specified using the following command line option to bhyve(8): "-s <bus>:<slot>:<func>,<emul>[,<config>]" Discussed with: grehan@ Reviewed by: jhb@	2014-02-14 21:34:08 +00:00
Tycho Nightingale	2a261121af	Provide an indication a "PIO Setup Device to Host FIS" occurred while executing the IDENTIFY DEVICE and IDENTIFY PACKET DEVICE commands. Also, provide an indication a "D2H Register FIS" occurred during a SET FEATURES command. Approved by: grehan (co-mentor)	2014-02-12 00:32:14 +00:00
John Baldwin	1f82944f35	Mark the I/O ports used by the bhyve console and debug devices as system resources. MFC after: 1 week	2014-02-07 20:53:41 +00:00
John Baldwin	3cbf3585cb	Enhance the support for PCI legacy INTx interrupts and enable them in the virtio backends. - Add a new ioctl to export the count of pins on the I/O APIC from vmm to the hypervisor. - Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for ISA interrupts. - Populate the MP Table with I/O interrupt entries for any PCI INTx interrupts. - Create a _PRT table under the PCI root bridge in ACPI to route any PCI INTx interrupts appropriately. - Track which INTx interrupts are in use per-slot so that functions that share a slot attempt to distribute their INTx interrupts across the four available pins. - Implicitly mask INTx interrupts if either MSI or MSI-X is enabled and when the INTx DIS bit is set in a function's PCI command register. Either assert or deassert the associated I/O APIC pin when the state of one of those conditions changes. - Add INTx support to the virtio backends. - Always advertise the MSI capability in the virtio backends. Submitted by: neel (7) Reviewed by: neel MFC after: 2 weeks	2014-01-29 14:56:48 +00:00
John Baldwin	d2bc4816c5	Remove support for legacy PCI devices. These haven't been needed since support for LPC uart devices was added and it conflicts with upcoming patches to add PCI INTx support. Reviewed by: neel	2014-01-27 22:26:15 +00:00
Tycho Nightingale	4e5f86e009	Fix issue with stale fields from a recycled request pulled off the freelist. Approved by: grehan (co-mentor)	2014-01-22 01:57:52 +00:00
Tycho Nightingale	40eb53f232	Increase the block-layer backend maximum number of requests to match the AHCI command queue depth. This allows a slew of commands issued by a Linux guest to be absorbed without error. Approved by: grehan (co-mentor)	2014-01-22 01:56:49 +00:00
Peter Grehan	d68f0bd618	Fix issue with the virtio descriptor region being truncated if it was above 4GB. This was seen with CentOS 6.5 guests with large RAM, since the block drivers are loaded late in the boot sequence and end up allocating descriptor memory from high addresses. Reported by: Michael Dexter MFC after: 3 days	2014-01-09 07:17:21 +00:00
Remko Lodder	a8be8e5ee3	virtio-block does not exist, the correct name is virtio-blk. PR: 185573 Submitted by: Allan Jude Facilitated by: Snow B.V. MFC after: 3 days	2014-01-08 08:37:30 +00:00
Peter Grehan	b1843e712e	Cosmetic change - switch over to vertical SRCS to make it easier to keep files in alpha order. Reviewed by: neel	2014-01-03 19:31:40 +00:00
John Baldwin	e6c8bc291a	Rework the DSDT generation code a bit to generate more accurate info about LPC devices. Among other things, the LPC serial ports now appear as ACPI devices. - Move the info for the top-level PCI bus into the PCI emulation code and add ResourceProducer entries for the memory ranges decoded by the bus for memory BARs. - Add a framework to allow each PCI emulation driver to optionally write an entry into the DSDT under the \_SB_.PCI0 namespace. The LPC driver uses this to write a node for the LPC bus (\_SB_.PCI0.ISA). - Add a linker set to allow any LPC devices to write entries into the DSDT below the LPC node. - Move the existing DSDT block for the RTC to the RTC driver. - Add DSDT nodes for the AT PIC, the 8254 ISA timer, and the LPC UART devices. - Add a "SuperIO" device under the LPC node to claim "system resources" aling with a linker set to allow various drivers to add IO or memory ranges that should be claimed as a system resource. - Add system resource entries for the extended RTC IO range, the registers used for ACPI power management, the ELCR, PCI interrupt routing register, and post data register. - Add various helper routines for generating DSDT entries. Reviewed by: neel (earlier version)	2014-01-02 21:26:59 +00:00
Neel Natu	0492757c70	Restructure the VMX code to enter and exit the guest. In large part this change hides the setjmp/longjmp semantics of VM enter/exit. vmx_enter_guest() is used to enter guest context and vmx_exit_guest() is used to transition back into host context. Fix a longstanding race where a vcpu interrupt notification might be ignored if it happens after vmx_inject_interrupts() but before host interrupts are disabled in vmx_resume/vmx_launch. We now called vmx_inject_interrupts() with host interrupts disabled to prevent this. Suggested by: grehan@	2014-01-01 21:17:08 +00:00
John Baldwin	058e24d34b	Extend the ACPI power management support to wire a virtual power button up to SIGTERM when ACPI is enabled. Sending SIGTERM to the hypervisor when an ACPI-aware OS is running will now trigger a soft-off allowing for a graceful shutdown of the guest. - Move constants for ACPI-related registers to acpi.h. - Implement an SMI_CMD register with commands to enable and disable ACPI. Currently the only change when ACPI is enabled is to enable the virtual power button via SIGTERM. - Implement a fixed-feature power button when ACPI is enabled by asserting PWRBTN_STS in PM1_EVT when SIGTERM is received. - Add support for EVFILT_SIGNAL events to mevent. - Implement support for the ACPI system command interrupt (SCI) and assert it when needed based on the values in PM1_EVT. Mark the SCI as active-low and level triggered in the MADT and MP Table. - Mark PCI interrupts in the MP Table as active-low in addition to level triggered. Reviewed by: neel	2013-12-28 04:01:05 +00:00
John Baldwin	cf952fe841	Use pthread_once() to replace a static integer initted flag. Reviewed by: neel	2013-12-28 03:21:15 +00:00
John Baldwin	6450da0774	Support soft power-off via the ACPI S5 state for bhyve guests. - Implement the PM1_EVT and PM1_CTL registers required by ACPI. The PM1_EVT register is mostly a dummy as bhyve doesn't support any of the hardware-initiated events. The only bit of PM1_CNT that is implemented are the sleep request bits (SPL_EN and SLP_TYP) which request a graceful power off for S5. In particular, for S5, bhyve exits with a non-zero value which terminates the loop in vmrun.sh. - Emulate the Reset Control register at I/O port 0xcf9 and advertise it as the reset register via ACPI. - Advertise an _S5 package. - Extend the in/out interface to allow an in/out handler to request that the hypervisor trigger a reset or power-off. - While here, note that all vCPUs in a guest support C1 ("hlt"). Reviewed by: neel (earlier version)	2013-12-24 16:14:19 +00:00
John Baldwin	330baf58c6	Extend the support for local interrupts on the local APIC: - Add a generic routine to trigger an LVT interrupt that supports both fixed and NMI delivery modes. - Add an ioctl and bhyvectl command to trigger local interrupts inside a guest. In particular, a global NMI similar to that raised by SERR# or PERR# can be simulated by asserting LINT1 on all vCPUs. - Extend the LVT table in the vCPU local APIC to support CMCI. - Flesh out the local APIC error reporting a bit to cache errors and report them via ESR when ESR is written to. Add support for asserting the error LVT when an error occurs. Raise illegal vector errors when attempting to signal an invalid vector for an interrupt or when sending an IPI. - Ignore writes to reserved bits in LVT entries. - Export table entries the MADT and MP Table advertising the stock x86 config of LINT0 set to ExtInt and LINT1 wired to NMI. Reviewed by: neel (earlier version)	2013-12-23 19:29:07 +00:00
Joel Dahl	6081b93c89	mdoc: nuke whitespace.	2013-12-23 15:00:15 +00:00
Neel Natu	f80330a820	Add a parameter to 'vcpu_set_state()' to enforce that the vcpu is in the IDLE state before the requested state transition. This guarantees that there is exactly one ioctl() operating on a vcpu at any point in time and prevents unintended state transitions. More details available here: http://lists.freebsd.org/pipermail/freebsd-virtualization/2013-December/001825.html Reviewed by: grehan Reported by: Markiyan Kushnir (markiyan.kushnir at gmail.com) MFC after: 3 days	2013-12-22 20:29:59 +00:00
Neel Natu	851d84f1b5	Add an option to ignore accesses by the guest to unimplemented MSRs. Also, ignore a couple of SandyBridge uncore PMC MSRs that Centos 6.4 writes to during boot. Reviewed by: grehan	2013-12-19 22:27:28 +00:00
Neel Natu	55888cfaa2	Rename the ambiguously named 'vm_setup_msi()' and 'vm_setup_msix()' to 'vm_setup_pptdev_msi()' and 'vm_setup_pptdev_msix()' respectively. It should now be clear that these functions operate on passthru devices.	2013-12-18 03:58:51 +00:00
Neel Natu	4f8be175d5	Add an API to deliver message signalled interrupts to vcpus. This allows callers treat the MSI 'addr' and 'data' fields as opaque and also lets bhyve implement multiple destination modes: physical, flat and clustered. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com) Reviewed by: grehan@	2013-12-16 19:59:31 +00:00
Joel Dahl	05f7cd8bce	mdoc: sort SEE ALSO.	2013-12-15 08:52:16 +00:00
Peter Grehan	b13e60da56	bhyve(8) man page. mdoc formatting and much input and review from Warren Block (wblock@). Reviewed by: many MFC after: 3 days	2013-12-13 08:31:13 +00:00
Neel Natu	1c05219285	If a vcpu disables its local apic and then executes a 'HLT' then spin down the vcpu and destroy its thread context. Also modify the 'HLT' processing to ignore pending interrupts in the IRR if interrupts have been disabled by the guest. The interrupt cannot be injected into the guest in any case so resuming it is futile. With this change "halt" from a Linux guest works correctly. Reviewed by: grehan@ Tested by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-12-07 22:18:36 +00:00
John Baldwin	c71f0d951a	Fix the processor table entry structure to use a fixed-width type for 32-bit fields so it is the correct size on amd64. Remove a workaround for the broken structure from bhyve(8). MFC after: 1 week	2013-12-05 21:51:54 +00:00
Neel Natu	b5b28fc9dc	Add support for level triggered interrupt pins on the vioapic. Prior to this commit level triggered interrupts would work as long as the pin was not shared among multiple interrupt sources. The vlapic now keeps track of level triggered interrupts in the trigger mode register and will forward the EOI for a level triggered interrupt to the vioapic. The vioapic in turn uses the EOI to sample the level on the pin and re-inject the vector if the pin is still asserted. The vhpet is the first consumer of level triggered interrupts and advertises that it can generate interrupts on pins 20 through 23 of the vioapic. Discussed with: grehan@	2013-11-27 22:18:08 +00:00
Peter Grehan	6380102c7f	Allow bhyve and bhyveload to attach to tty devices. bhyveload: introduce the -c <device> parameter to select a tty for output (or "stdio") bhyve: allow the puc and lpc-com backends to accept a tty in addition to "stdio" When used in conjunction with the null-modem device, nmdm(4), this allows attach/detach to the guest console and multiple concurrent serial ports. kgdb on a serial port is now functional. Reviewed by: neel Requested by: Almost everyone that has used bhyve MFC after: 10.0	2013-11-27 00:21:37 +00:00
Peter Grehan	4b48ea6ab2	The Data Byte Count (DBC) field of a Physical Region Descriptor Table is 22 bits, with the bit 31 being the interrupt-on-completion bit. OpenBSD and UEFI set this bit, resulting in large block i/o lengths being sent to bhyve and coredumping the process. Fix by masking off the relevant 22 bits when using the DBC field as a length. Reviewed by: Zhixiang Yu Discussed with: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com) MFC after: 10.0	2013-11-26 03:00:54 +00:00
Neel Natu	d6fe268fdd	Fix discrepancy between the IOAPIC ID advertised by firmware tables and the actual value read by the guest from the device. The IOAPIC ID is now set to zero in both MPtable/ACPI tables as well as in the ioapic device emulation. Pointed out by: grehan@	2013-11-25 23:31:00 +00:00
Neel Natu	08e3ff329a	Add HPET device emulation to bhyve. bhyve supports a single timer block with 8 timers. The timers are all 32-bit and capable of being operated in periodic mode. All timers support interrupt delivery using MSI. Timers 0 and 1 also support legacy interrupt routing. At the moment the timers are not connected to any ioapic pins but that will be addressed in a subsequent commit. This change is based on a patch from Tycho Nightingale (tycho.nightingale@pluribusnetworks.com).	2013-11-25 19:04:51 +00:00
Neel Natu	ac7304a758	Add an ioctl to assert and deassert an ioapic pin atomically. This will be used to inject edge triggered legacy interrupts into the guest. Start using the new API in device models that use edge triggered interrupts: viz. the 8254 timer and the LPC/uart device emulation. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-11-23 03:56:03 +00:00
Neel Natu	565bbb8698	Move the ioapic device model from userspace into vmm.ko. This is needed for upcoming in-kernel device emulations like the HPET. The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to manipulate the ioapic pin state. Discussed with: grehan@ Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-11-12 22:51:03 +00:00
Neel Natu	ec096ed5dd	x86 platforms that use an IOAPIC route the legacy timer interrupt (IRQ0) to pin 2 of the IOAPIC. Add an 'Interrupt Source Override' entry to the MADT to describe this and start asserting interrupts on pin 2 in the 8254 device model. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-11-11 00:45:17 +00:00
Neel Natu	c8afb9bc3f	Fix an off-by-one error when iterating over the emulated PCI BARs. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-11-06 22:35:52 +00:00
Peter Grehan	7f5487aca1	Add the VM name to the process name with setproctitle(). Remove the VM name from some of the thread-naming calls since it is now in the proc title. Slightly modify the thread-naming for the net and block threads. This improves readability when using top/ps with the -a and -H options on a system with a large number of bhyve VMs. Requested by: Michael Dexter Reviewed by: neel MFC after: 4 weeks	2013-11-06 00:25:17 +00:00
Neel Natu	a1a4cbea58	Make the virtual ioapic available unconditionally in a bhyve virtual machine. This is in preparation for moving the ioapic device model from userspace to vmm.ko. Reviewed by: grehan	2013-10-31 05:44:45 +00:00
Neel Natu	3ee2d14f66	Update copyright to include the author of the LPC bridge emulation code.	2013-10-29 17:31:16 +00:00
Neel Natu	ea7f1c8cd2	Add support for PCI-to-ISA LPC bridge emulation. If the LPC bus is attached to a virtual machine then we implicitly create COM1 and COM2 ISA devices. Prior to this change the only way of attaching a COM port to the virtual machine was by presenting it as a PCI device that is mapped at the legacy I/O address 0x3F8 or 0x2F8. There were some issues with the original approach: - It did not work at all with UEFI because UEFI will reprogram the PCI device BARs and remap the COM1/COM2 ports at non-legacy addresses. - OpenBSD GENERIC kernel does not create a /dev/console because it expects the uart device at the legacy 0x3F8/0x2F8 address to be an ISA device. - It was functional with a FreeBSD guest but caused the console to appear on /dev/ttyu2 which was not intuitive. The uart emulation is now independent of the bus on which it resides. Thus it is possible to have uart devices on the PCI bus in addition to the legacy COM1/COM2 devices behind the LPC bus. The command line option to attach ISA COM1/COM2 ports to a virtual machine is "-s <bus>,lpc -l com1,stdio". The command line option to create a PCI-attached uart device is: "-s <bus>,uart[,stdio]" The command line option to create PCI-attached COM1/COM2 device is: "-S <bus>,uart[,stdio]". This style of creating COM ports is deprecated. Discussed with: grehan Reviewed by: grehan Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com) M share/examples/bhyve/vmrun.sh AM usr.sbin/bhyve/legacy_irq.c AM usr.sbin/bhyve/legacy_irq.h M usr.sbin/bhyve/Makefile AM usr.sbin/bhyve/uart_emul.c M usr.sbin/bhyve/bhyverun.c AM usr.sbin/bhyve/uart_emul.h M usr.sbin/bhyve/pci_uart.c M usr.sbin/bhyve/pci_emul.c M usr.sbin/bhyve/inout.c M usr.sbin/bhyve/pci_emul.h M usr.sbin/bhyve/inout.h AM usr.sbin/bhyve/pci_lpc.c AM usr.sbin/bhyve/pci_lpc.h	2013-10-29 00:18:11 +00:00
Peter Grehan	8f1db961f9	Fix bug in the ioapic emulation for level-triggered interrupts, where a pin assertion while a source was masked would result in the interrupt being lost, with the symptom being a console hang. The condition is now recorded, and the interrupt generated when the source is unmasked. Discovered by: OpenBSD 5.4 MP Reviewed by: neel MFC after: 3 days	2013-10-25 03:18:56 +00:00
Neel Natu	b5331f4d88	Tidy usage messages for bhyve and bhyveload. Submitted by: jhb	2013-10-23 21:42:53 +00:00
Peter Grehan	fce0413b0a	Export the block size capability to guests. - Use #defines for capability bits - Export the VTBLK_F_BLK_SIZE capability - Fix bug in calculating capacity: it is in 512-byte units, not the underlying sector size This allows virtio-blk to have backing devices with non 512-byte sector sizes e.g. /dev/cd0, and 4K-block harddrives. Reviewed by: neel MFC after: 3 days	2013-10-23 18:54:58 +00:00
Peter Grehan	10016ed51c	Fix AHCI ATAPI emulation when backed with /dev/cd0 - remove assumption that the backing file/device had 512-byte sectors - fix incorrect iovec size variable that would result in a buffer overrun when an o/s issued an i/o request with more s/g elements than the blockif api Reviewed by: Zhixiang Yu (zxyu.core@gmail.com) MFC after: 3 days	2013-10-22 19:55:04 +00:00
Peter Grehan	062b878f58	Changes required for OpenBSD/amd64: - Allow a hostbridge to be created with AMD as a vendor. This passes the OpenBSD check to allow the use of MSI on a PCI bus. - Enable the i/o interrupt section of the mptable, and populate it with unity ISA mappings. This allows the 'legacy' IRQ mappings of the PCI serial port to be set up. Delete unused print routine that was obscuring code. - Use the '-W' option to enable virtio single-vector MSI rather than an environment variable. Update the virtio net/block drivers to query this flag when setting up interrupts.: bhyverun.c - Fix the arithmetic used to derive the century byte in RTC CMOS, as well as encoding it in BCD. Reviewed by: neel MFC after: 3 days	2013-10-17 22:01:17 +00:00
Peter Grehan	7b8d7047af	Eliminate unconditional debug printfs. Linux writes to these nominally read-only registers, so avoid having bhyve write warning messages to stdout when the reg writes can be safely ignored. Change the WPRINTF to DPRINTF which is conditional. Reviewed by: mav Discussed with: mav, Zhixiang Yu MFC after: 3 days	2013-10-17 21:56:39 +00:00
Neel Natu	49cc03da31	Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose 'invpcid' instruction to the guest. Currently bhyve will try to enable this capability unconditionally if it is available. Consolidate code in bhyve to set the capabilities so it is no longer duplicated in BSP and AP bringup. Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid' instruction is available. Reviewed by: grehan MFC after: 3 days	2013-10-16 18:20:27 +00:00
Peter Grehan	64945a9e27	Implement the virtio block 'get-ident' operation. This eliminates the annoying verbose boot error of the form g_handleattr: vtbd0 bio_length 24 len 28 -> EFAULT The ident returned by bhyve is a text string 'BHYVE-XXXX-XXXX', where the X's are the first bytes of the md5 hash of the backing filename. Reviewed by: neel Approved by: re (gjb)	2013-10-12 19:31:19 +00:00
Peter Grehan	2a8d400a2e	Allow a 4-byte write to PCI config space to overlap the 2 read-only bytes at the start of a PCI capability. This is the sequence that OpenBSD uses when enabling MSI interrupts, and works fine on real h/w. In bhyve, convert the 4 byte write to a 2-byte write to the r/w area past the first 2 r/o bytes of a capability. Reviewed by: neel Approved by: re@ (blanket)	2013-10-09 23:53:21 +00:00
Neel Natu	200758f114	Parse the memory size parameter using expand_number() to allow specifying the memory size more intuitively (e.g. 512M, 4G etc). Submitted by: rodrigc Reviewed by: grehan Approved by: re (blanket)	2013-10-09 03:56:07 +00:00
Dimitry Andric	cdb9cd7ad2	In usr.sbin/bhyve/pci_ahci.c, fix several gcc warnings of the form "assignment makes pointer from integer without a cast", by changing the cmd_lst and rbis members of struct ahci_port from integers to pointers. Also surround a pow-of-2 test expression with parentheses to clarify it, and avoid another gcc warning. Approved by: re (glebius) Reviewed by: grehan, mav	2013-10-08 19:39:21 +00:00
Dimitry Andric	e70cb911b6	After r256062, the static function fbsdrun_get_next_cpu() in usr.sbin/bhyve/bhyverun.c is no longer used, so remove it to silence a gcc warning. Approved by: re (glebius)	2013-10-08 18:09:00 +00:00
Neel Natu	4a06a0fe79	Change the behavior of bhyve such that the gdb listening port is opt-in rather than opt-out. Prior to this change if the "-g" option was not specified then a listening socket for tunneling gdb packets would be opened at port 6466. If a second virtual machine is fired up, also without the "-g" option, then that would fail because there is already a listener on port 6466. After this change if a gdb tunnel port needs to be created it needs to be explicitly specified with a "-g <portnum>" command line option. Reviewed by: grehan@ Approved by: re@ (blanket)	2013-10-08 16:36:17 +00:00
Neel Natu	318224bbe6	Merge projects/bhyve_npt_pmap into head. Make the amd64/pmap code aware of nested page table mappings used by bhyve guests. This allows bhyve to associate each guest with its own vmspace and deal with nested page faults in the context of that vmspace. This also enables features like accessed/dirty bit tracking, swapping to disk and transparent superpage promotions of guest memory. Guest vmspace: Each bhyve guest has a unique vmspace to represent the physical memory allocated to the guest. Each memory segment allocated by the guest is mapped into the guest's address space via the 'vmspace->vm_map' and is backed by an object of type OBJT_DEFAULT. pmap types: The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT. The PT_X86 pmap type is used by the vmspace associated with the host kernel as well as user processes executing on the host. The PT_EPT pmap is used by the vmspace associated with a bhyve guest. Page Table Entries: The EPT page table entries as mostly similar in functionality to regular page table entries although there are some differences in terms of what bits are used to express that functionality. For e.g. the dirty bit is represented by bit 9 in the nested PTE as opposed to bit 6 in the regular x86 PTE. Therefore the bitmask representing the dirty bit is now computed at runtime based on the type of the pmap. Thus PG_M that was previously a macro now becomes a local variable that is initialized at runtime using 'pmap_modified_bit(pmap)'. An additional wrinkle associated with EPT mappings is that older Intel processors don't have hardware support for tracking accessed/dirty bits in the PTE. This means that the amd64/pmap code needs to emulate these bits to provide proper accounting to the VM subsystem. This is achieved by using the following mapping for EPT entries that need emulation of A/D bits: Bit Position Interpreted By PG_V 52 software (accessed bit emulation handler) PG_RW 53 software (dirty bit emulation handler) PG_A 0 hardware (aka EPT_PG_RD) PG_M 1 hardware (aka EPT_PG_WR) The idea to use the mapping listed above for A/D bit emulation came from Alan Cox (alc@). The final difference with respect to x86 PTEs is that some EPT implementations do not support superpage mappings. This is recorded in the 'pm_flags' field of the pmap. TLB invalidation: The amd64/pmap code has a number of ways to do invalidation of mappings that may be cached in the TLB: single page, multiple pages in a range or the entire TLB. All of these funnel into a single EPT invalidation routine called 'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and sends an IPI to the host cpus that are executing the guest's vcpus. On a subsequent entry into the guest it will detect that the EPT has changed and invalidate the mappings from the TLB. Guest memory access: Since the guest memory is no longer wired we need to hold the host physical page that backs the guest physical page before we can access it. The helper functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose. PCI passthru: Guest's with PCI passthru devices will wire the entire guest physical address space. The MMIO BAR associated with the passthru device is backed by a vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that have one or more PCI passthru devices attached to them. Limitations: There isn't a way to map a guest physical page without execute permissions. This is because the amd64/pmap code interprets the guest physical mappings as user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U shares the same bit position as EPT_PG_EXECUTE all guest mappings become automatically executable. Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews as well as their support and encouragement. Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing object for pci passthru mmio regions. Special thanks to Peter Holm for testing the patch on short notice. Approved by: re Discussed with: grehan Reviewed by: alc, kib Tested by: pho	2013-10-05 21:22:35 +00:00
Peter Grehan	94c3b3bffc	Remove obsolete cmd-line options and code associated with these. The mux-vcpus option may return at some point, given it's utility in finding bhyve (and FreeBSD) bugs. Approved by: re@ (blanket) Discussed with: neel@	2013-10-04 23:29:07 +00:00
Peter Grehan	54b70fdcae	Hook up the AHCI and blockif code to the build. Approved by: re@ (blanket)	2013-10-04 18:44:47 +00:00
Peter Grehan	c354c096d3	Import Zhixiang Yu's GSoC'13 AHCI emulation: https://wiki.freebsd.org/SummerOfCode2013/bhyveAHCI This provides ICH8 SATA disk and ATAPI ports, selectable via the bhyve slot command-line parameter: SATA -s <slot>,ahci-hd,<image-file> ATAPI -s <slot>,ahci-cd,<image-file> Slight modifications by: grehan@ Approved by: re@ (blanket) Obtained from: FreeBSD GSoC'13	2013-10-04 18:31:38 +00:00
Peter Grehan	7cf5a7eeb0	Block-layer backend interface for bhyve block-io device emulations. Approved by: re@ (blanket)	2013-10-04 16:52:03 +00:00
Peter Grehan	6a77884d08	Fix incorrect assertion on the minimum side. ZFS would trigger this. Reported by: Chris Torek, Allan Jude Approved by: re@ (blanket)	2013-09-26 16:25:06 +00:00
Peter Grehan	aa8cb5f311	Implement support for the interrupt-on-terminal-count and s/w-strobe timer modes. These are commonly used by non-FreeBSD o/s's. Approved by: re@ (blanket)	2013-09-19 04:59:44 +00:00
Peter Grehan	151dba4a87	Add simplistic periodic timer support to mevent using kqueue's timer support. This should be enough for the emulation of h/w periodic timers (and no more) e.g. some of the 8254's more esoteric modes that happen to be used by non-FreeBSD o/s's. Approved by: re@ (blanket)	2013-09-19 04:48:26 +00:00
Peter Grehan	4458253e97	Allow the alarm hours/mins/seconds registers to be read/written, though without any action. This avoids a hypervisor exit when o/s's access these regs (Linux). Reviewed by: neel Approved by: re@ (blanket)	2013-09-19 04:29:03 +00:00
Peter Grehan	c20d3f633a	Use correct offset for the high byte of high memory written to RTC NVRAM. Submitted by: Bela Lubkin bela dot lubkin at tidalscale dot com Approved by: re@ (blanket)	2013-09-19 04:20:18 +00:00
Peter Grehan	aaa3016924	Pass the number of supported vectors to pci_emul_add_msicap() and not the actual PCI BAR number. Reviewed by: neel Approved by: re@ (blanket)	2013-09-17 18:42:13 +00:00
Peter Grehan	8d39ed16c2	Go way past 11 and bump bhyve's max vCPUs to 16. This should be sufficient for 10.0 and will do until forthcoming work to avoid limitations in this area is complete. Thanks to Bela Lubkin at tidalscale for the headsup on the apic/cpu id/io apic ASL parameters that are actually hex values and broke when written as decimal when 11 vCPUs were configured. Approved by: re@	2013-09-10 03:48:18 +00:00
Peter Grehan	fa48032049	Fix spelling.	2013-09-06 05:58:10 +00:00
Peter Grehan	841caa4090	Allow level-triggered interrupt sources. While this isn't precisely emulated, it is good enough for the single consumer i.e. irq4, the serial port on Linux.	2013-09-06 05:55:43 +00:00
Neel Natu	6a52209f9c	Allow single byte reads of the emulated MSI-X tables. This is not required by the PCI specification but needed to dump MMIO space from "ddb" in the guest.	2013-08-27 16:50:48 +00:00
Peter Grehan	000f0835b2	Fix off-by-1 error in assert. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)	2013-08-27 03:49:47 +00:00
Peter Grehan	50dc0db3f0	Fix ordering of legacy IRQ reservations. Submitted by: Jeremiah Lott jlott at averesystems dot com	2013-08-16 00:35:20 +00:00
Peter Grehan	8b271170d1	Sanity-check the vm exitcode, and exit the process if it's out-of-bounds or there is no registered handler. Submitted by: Bela Lubkin bela dot lubkin at tidalscale dot com	2013-07-18 18:40:54 +00:00
Peter Grehan	ba41c3c13f	Major rework of the virtio code. Split out common parts, and modify the net/block devices accordingly. Submitted by: Chris Torek torek at torek dot net Reviewed by: grehan	2013-07-17 23:37:33 +00:00
Peter Grehan	9d6be09f8a	Implement RTC CMOS nvram. Init some fields that are used by FreeBSD and UEFI. Tested with nvram(4). Reviewed by: neel	2013-07-11 03:54:35 +00:00
Peter Grehan	a38e2a64dc	Support an optional "mac=" parameter to virtio-net config, to allow users to set the MAC address for a device. Clean up some obsolete code in pci_virtio_net.c Allow an error return from a PCI device emulation's init routine to be propagated all the way back to the top-level and result in the process exiting. Submitted by: Dinakar Medavaram dinnu sun at gmail (original version)	2013-07-04 05:35:56 +00:00
Peter Grehan	34d244edb2	Fix up option parsing to allow a colon in the config section. Clean up some other unnecessary code. Submitted by: Dinakar Medavaram dinnu sun at gmail Reviewed by: neel	2013-07-01 23:53:22 +00:00
Peter Grehan	4dfaf1bc08	Allow 8259 registers to be read. This is a transient condition during Linux boot. Submitted by: tycho nightingale at pluribusnetworks com Reviewed by: neel	2013-06-28 06:25:04 +00:00
Peter Grehan	7554303627	Allow the PCI config address register to be read. The Linux kernel does this. Also remove an unused header file. Submitted by: tycho nightingale at pluribusnetworks com Reviewed by: neel	2013-06-28 05:01:25 +00:00
Neel Natu	b1f3124565	Implement the NOTIFY_ON_EMPTY capability in the virtio-net device. If this capability is negotiated by the guest then the device will generate an interrupt when it runs out of available tx/rx descriptors. Reviewed by: grehan Obtained from: NetApp	2013-05-03 01:16:18 +00:00
Neel Natu	3b207d1e34	Reset some more softc state when the guest resets the virtio network device. Obtained from: NetApp	2013-04-30 01:14:54 +00:00
Neel Natu	2a80be7b2b	Use a separate mutex for the receive path instead of overloading the softc mutex for this purpose. Reviewed by: grehan	2013-04-30 00:36:16 +00:00
Neel Natu	88d1272e3c	Get rid of the 'vsc_rxpend' state - it doesn't serve any purpose because we drop any frames that arrive while the device is starved for receive buffers. This makes the receive path to only execute in context of the receive thread and allows for further simplification. Reviewed by: grehan	2013-04-28 01:02:59 +00:00
Peter Grehan	199fee4ea3	Use a thread for the processing of virtio tx descriptors rather than blocking the vCPU thread. This improves bulk data performance by ~30-40% and doesn't harm req/resp time for stock netperf runs. Future work will use a thread pool rather than a thread per tx queue. Submitted by: Dinakar Medavaram Reviewed by: neel, grehan Obtained from: NetApp	2013-04-26 05:13:48 +00:00

... 2 3 4 5 6 ...

421 Commits