Commit Graph

21 Commits

Author SHA1 Message Date
Warner Losh
1d386b48a5 Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:42 -06:00
Warner Losh
b3e7694832 Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:16 -06:00
Warner Losh
4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Mark Johnston
98d920d9cf bhyve: Annotate unused function parameters
MFC after:	1 week
2022-10-08 11:33:21 -04:00
Toomas Soome
c2fa905cf6 bhyve: clean up trailing whitespaces
Clean up trailing whitespaces. No functional changes.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33681
2021-12-27 19:58:10 +02:00
John Baldwin
621b509048 Refactor configuration management in bhyve.
Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs.  The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key.  Values are always stored as strings.  The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.

The configuration values are stored in a tree using nvlists.  Leaf
nodes hold string values.  Configuration values are permitted to
reference other configuration values using '%(name)'.  This permits
constructing template configurations.

All existing command line arguments now set configuration values.  For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.

A new '-o' command line option permits setting an individual
configuration variable.  The key name is always given as a full path
of dot-separated components.

A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable.  Lines
starting with a '#' are comments.

In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values.  Once this is
complete, bhyve then begins initializing its state based on the
configuration values.  This means that subsequent configuration
options or files may override or supplement previously given settings.

A special 'config.dump' configuration value can be set to true to help
debug configuration issues.  When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.

Most command line argments map to a single configuration variable,
e.g.  '-w' sets the 'x86.strictmsr' value to false.  A few command
line arguments have less obvious effects:

- Multiple '-p' options append their values (as a comma-seperated
  list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).

- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
  The first argument to '-s' (the device type) is used as the value of
  a "device" variable.  Additional comma-separated arguments are then
  parsed into 'key=value' pairs and used to set additional variables
  under the device node.  A PCI device emulation driver can provide
  its own hook to override the parsing of the additonal '-s' arguments
  after the device type.

  After the configuration phase as completed, the init_pci hook
  then walks the "pci.<bus>.<slot>.<func>" nodes.  It uses the
  "device" value to find the device model to use.  The device
  model's init routine is passed a reference to its nvlist node
  in the configuration tree which it can query for specific
  variables.

  The result is that a lot of the string parsing is removed from
  the device models and centralized.  In addition, adding a new
  variable just requires teaching the model to look for the new
  variable.

- For '-l' options, a similar model is used where the string is
  parsed into values that are later read during initialization.
  One key note here is that the serial ports use the commonly
  used lowercase names from existing documentation and examples
  (e.g. "lpc.com1") instead of the uppercase names previously
  used internally in bhyve.

Reviewed by:	grehan
MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D26035
2021-03-18 16:30:26 -07:00
Marcelo Araujo
f7224b709f Fix style(9) space vs tab.
Reviewed by:	jhb
MFC after:	3 weeks.
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D15768
2018-06-14 01:34:53 +00:00
Pedro F. Giffuni
1de7b4b805 various: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:37:16 +00:00
Neel Natu
c974767896 Add "-u" option to bhyve(8) to indicate that the RTC should maintain UTC time.
The default remains localtime for compatibility with the original device model
in bhyve(8). This is required for OpenBSD guests which assume that the RTC
keeps UTC time.

Reviewed by:	grehan
Pointed out by:	Jason Tubnor (jason@tubnor.net)
MFC after:	2 weeks
2015-02-24 02:04:16 +00:00
Neel Natu
0dafa5cd4b Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.
The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.

Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.

The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>

Reviewed by:	tychon
Discussed with:	grehan
Differential Revision:	https://reviews.freebsd.org/D1385
MFC after:	2 weeks
2014-12-30 22:19:34 +00:00
Neel Natu
c17d4a83b8 Add a comment explaining the intent behind the I/O reservation [0x72-0x77]. 2014-10-26 21:17:44 +00:00
Neel Natu
be679db4cd Provide APIs to directly get 'lowmem' and 'highmem' size directly.
Previously the sizes were inferred indirectly based on the size of the mappings
at 0 and 4GB respectively. This works fine as long as size of the allocation is
identical to the size of the mapping in the guest's address space. However, if
the mapping is disjoint then this assumption falls apart (e.g., due to the
legacy BIOS hole between 640KB and 1MB).
2014-06-24 02:02:51 +00:00
John Baldwin
e6c8bc291a Rework the DSDT generation code a bit to generate more accurate info about
LPC devices.  Among other things, the LPC serial ports now appear as
ACPI devices.
- Move the info for the top-level PCI bus into the PCI emulation code and
  add ResourceProducer entries for the memory ranges decoded by the bus
  for memory BARs.
- Add a framework to allow each PCI emulation driver to optionally write
  an entry into the DSDT under the \_SB_.PCI0 namespace.  The LPC driver
  uses this to write a node for the LPC bus (\_SB_.PCI0.ISA).
- Add a linker set to allow any LPC devices to write entries into the
  DSDT below the LPC node.
- Move the existing DSDT block for the RTC to the RTC driver.
- Add DSDT nodes for the AT PIC, the 8254 ISA timer, and the LPC UART
  devices.
- Add a "SuperIO" device under the LPC node to claim "system resources"
  aling with a linker set to allow various drivers to add IO or memory
  ranges that should be claimed as a system resource.
- Add system resource entries for the extended RTC IO range, the registers
  used for ACPI power management, the ELCR, PCI interrupt routing register,
  and post data register.
- Add various helper routines for generating DSDT entries.

Reviewed by:	neel (earlier version)
2014-01-02 21:26:59 +00:00
Peter Grehan
062b878f58 Changes required for OpenBSD/amd64:
- Allow a hostbridge to be created with AMD as a vendor.
  This passes the OpenBSD check to allow the use of MSI
  on a PCI bus.
- Enable the i/o interrupt section of the mptable, and
  populate it with unity ISA mappings. This allows the
  'legacy' IRQ mappings of the PCI serial port to be
  set up. Delete unused print routine that was obscuring code.
- Use the '-W' option to enable virtio single-vector MSI
  rather than an environment variable. Update the virtio
  net/block drivers to query this flag when setting up
  interrupts.: bhyverun.c
- Fix the arithmetic used to derive the century byte in
  RTC CMOS, as well as encoding it in BCD.

Reviewed by:	neel
MFC after:	3 days
2013-10-17 22:01:17 +00:00
Neel Natu
318224bbe6 Merge projects/bhyve_npt_pmap into head.
Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
               Bit Position           Interpreted By
PG_V               52                 software (accessed bit emulation handler)
PG_RW              53                 software (dirty bit emulation handler)
PG_A               0                  hardware (aka EPT_PG_RD)
PG_M               1                  hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by:	re
Discussed with:	grehan
Reviewed by:	alc, kib
Tested by:	pho
2013-10-05 21:22:35 +00:00
Peter Grehan
4458253e97 Allow the alarm hours/mins/seconds registers to be read/written,
though without any action. This avoids a hypervisor exit when
o/s's access these regs (Linux).

Reviewed by:	neel
Approved by:	re@ (blanket)
2013-09-19 04:29:03 +00:00
Peter Grehan
c20d3f633a Use correct offset for the high byte of high memory written to
RTC NVRAM.

Submitted by:	Bela Lubkin   bela dot lubkin at tidalscale dot com
Approved by:	re@ (blanket)
2013-09-19 04:20:18 +00:00
Peter Grehan
9d6be09f8a Implement RTC CMOS nvram. Init some fields that are used
by FreeBSD and UEFI.
Tested with nvram(4).

Reviewed by:	neel
2013-07-11 03:54:35 +00:00
Peter Grehan
2838f343c8 Improve correctness of rtc register implementation.
Submitted by:	tycho nightingale at pluribusnetworks com
2013-01-25 22:43:20 +00:00
Peter Grehan
1f3025e133 Changes to allow the GENERIC+bhye kernel built from this branch to
run as a 1/2 CPU guest on an 8.1 bhyve host.

bhyve/inout.c
      inout.h
      fbsdrun.c
 - Rather than exiting on accesses to unhandled i/o ports, emulate
   hardware by returning -1 on reads and ignoring writes to unhandled
   ports. Support the previous mode by allowing a 'strict' parameter
   to be set from the command line.
   The 8.1 guest kernel was vastly cut down from GENERIC and had no
   ISA devices. Booting GENERIC exposes a massive amount of random
   touching of i/o ports (hello syscons/vga/atkbdc).

bhyve/consport.c
dev/bvm/bvm_console.c
 - implement a simplistic signature for the bvm console by returning
   'bv' for an inw on the port. Also, set the priority of the console
   to CN_REMOTE if the signature was returned. This works better in
   an environment where multiple consoles are in the kernel (hello syscons)

bhyve/rtc.c
 - return 0 for the access to RTC_EQUIPMENT (yes, you syscons)

amd64/vmm/x86.c
          x86.h
 - hide a bunch more CPUID leaf 1 bits from the guest to prevent
   cpufreq drivers from probing.
   The next step will be to move CPUID handling completely into
   user-space. This will allow the full spectrum of changes from
   presenting a lowest-common-denominator CPU type/feature set, to
   exposing (almost) everything that the host can support.

Reviewed by:	neel
Obtained from:	NetApp
2011-05-19 21:53:25 +00:00
Peter Grehan
366f60834f Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
  bhyve  - user-space sequencer and i/o emulation
  vmmctl - dump of hypervisor register state
  libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
	Joe CaraDonna
	Peter Snyder
	Jeff Heller
	Sandeep Mann
	Steve Miller
	Brian Pawlowski
2011-05-13 04:54:01 +00:00