Many 8-byte entries have zero at byte 4, so the second 4-byte part is
skipped as a 4-byte padding entry. But not all 8-byte entries have that
property and they get misinterpreted.
A real example:
48 00 00 00 ff 01 00 01
This an 8-byte ACPI_IVRS_TYPE_SPECIAL entry for IOAPIC with ID 255 (bogus).
It is reported as:
ivhd0: Unknown dev entry:0xff
Fortunately, it was completely harmless.
Also, bail out early if we encounter an entry of a variable length type.
We do not have proper handling for those yet.
Reviewed by: anish
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
This is needed for the HDA emulation with FreeBSD guests.
Reviewed by: marcelo
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D12832
timer frequency a power of two. This changes the frequency from 10 to
16.7 MHz (2 ^ 24 HZ). Using a power of two avoids roundoff errors when
doing arithmetic in sbintime_t units.
Testing shows this can fix erratic ntpd behavior in guests using the
hpet timer (which is the default for multicore guests).
Reported by: bsam@
Specifically, devices that do not support PCI-e FLR and were not
gracefully shutdown by the guest OS could continue to issue DMA
requests after the VM was terminated. The changes in r305485 meant
that those DMA requests were completed against the host's memory which
could result in random memory corruption. Instead, leave ppt devices
that are not attached to a VM disabled in the IOMMU and only restore
the devices to the host domain if the ppt(4) driver is detached from a
device.
As an added safety belt, disable busmastering for a pass-through device
when before adding it to the host domain during ppt(4) detach.
PR: 222937
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12661
amdvi_cmp_wait: gcc complained about a malformed string behind an ifdef.
struct amdvi_dte: widen the type of the first reserved bitfield so that
the packed representation would not cross an alignment boundary for that
type. Apparently that causes in-tree gcc (4.2) to insert padding
(despite packed, resulting in a wrong structure definition), and causes
more modern gcc to emit a warning.
ivrs_hdr_iterate_tbl: delete a misleading check about header length
being less than 0 (the type is unsigned) and replace it with a check
that the length doesn't exceed the table size.
Reviewed by: anish, grehan
Approved by: markj (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11485
The change introduced a dependency between genassym.c and header files
generated from .m files, but that dependency is not specified in the
make files.
Also, the change could be not as useful as I thought it was.
Reported by: dchagin, Manfred Antar <null@pozo.com>, and many others
The change is more intrusive than I would like because the feature
requires that a vector number is written to a special register.
Thus, now the vector number has to be provided to lapic_eoi().
It was readily available in the IO-APIC and MSI cases, but the IPI
handlers required more work.
Also, we now store the VMM IPI number in a global variable, so that it
is available to the justreturn handler for the same reason.
Reviewed by: kib
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D9880
Otherwise, recent Linux guests will use these instructions, resulting
in #UD exceptions since bhyve doesn't implement MONITOR/MWAIT exits.
This fixes boot-time hangs in recent Linux guests on Ryzen CPUs
(and probably Bulldozer aka AMD FX as well).
Reviewed by: kib
MFC after: 1 week
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
The error is:
vmm_dev.c: In function 'alloc_memseg':
vmm_dev.c:261:11: error: null argument where non-null required (argument 1) [-Werror=nonnull]
Apparently, the gcc is unable to figure out that if a ternary operator
produced a non-NULL value once, then the operator with exactly the same
operands would produce the same value again.
MFC after: 1 week
Just using vm_paddr_t value with all bits set.
That should work as long as the type is unsigned.
While there, fix a couple of whitespace issues nearby.
MFC after: 1 week
X-MFC with: r307903
To achieve that the whole svm_softc is allocated with contigmalloc now.
It would be more effient to de-embed those arrays and allocate only them
with contigmalloc.
Previously, if malloc(9) used non-contiguous pages for the arrays, then
random bits in physical pages next to the first page would be used to
determine permissions for I/O port and MSR accesses. That could result
in a guest dangerously modifying the host hardware configuration.
One example is that sometimes NMI watchdog driver in a Linux guest
would be able to configure a performance counter on a host system.
The counter would generate an interrupt and if hwpmc(4) driver is loaded
on the host, then the interrupt would be delivered as an NMI.
Discussed with: jhb
Reviewed by: grehan
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D8321
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register. This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.
Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.
Reviewed by: imp, wblock (earlier version)
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7751
When the I/O MMU is active in bhyve, all PCI devices need valid entries
in the DMAR context tables. The I/O MMU code does a single enumeration
of the available PCI devices during initialization to add all existing
devices to a domain representing the host. The ppt(4) driver then moves
pass through devices in and out of domains for virtual machines as needed.
However, when new PCI devices were added at runtime either via SR-IOV or
HotPlug, the I/O MMU tables were not updated.
This change adds a new set of EVENTHANDLERS that are invoked when PCI
devices are added and deleted. The I/O MMU driver in bhyve installs
handlers for these events which it uses to add and remove devices to
the "host" domain.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7667
This allows a pass through device to be reset to a normal device driver
on the host and reused on the host. ppt devices are now always active in
some I/O MMU domain when the I/O MMU is active, either the host domain
or the domain of a VM they are attached to.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7666
Rather than enabling the I/O MMU when the vmm module is loaded,
defer initialization until the first attempt to pass a PCI device
through to a guest. If the I/O MMU fails to initialize or is not
present, than fail the attempt to pass a PCI device through to a
guest.
The hw.vmm.force_iommu tunable has been removed since the I/O MMU is
no longer enabled during boot. However, the I/O MMU support can be
disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent
use of the I/O MMU on any systems where it is buggy.
Reviewed by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D7448
need to include it explicitly when <vm/vm_param.h> is already included.
Suggested by: alc
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D5379
Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual
machines on the host. These tools break if the devmem device nodes also
appear in /dev/vmm.
Requested by: grehan
Previously this was done by the caller of 'svm_launch()' after it returned.
This works fine as long as no code is executed in the interim that depends
on pcpu data.
The dtrace probe 'fbt:vmm:svm_launch:return' broke this assumption because
it calls 'dtrace_probe()' which in turn relies on pcpu data.
Reported by: avg
MFC after: 1 week
devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.
devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.
Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks
execution control and writing the difference between the host TSC and
the guest TSC into the TSC offset in the VMCS upon encountering a
write.
Reviewed by: neel
after decoding the instruction matches the one provided by hardware.
Prior to r283293 'vie->num_valid' used to contain the actual length of
the instruction whereas now it contains the maximum instruction length
possible. This introduced a bug when calculating a RIP-relative base address.
Fix this by using 'vie->num_processed' rather than 'vie->num_valid' as the
length of the emulated instruction.
Reported and tested by: tychon
MFC after: 1 week
"sleeping" state. This is done by forcing the vcpu to transition to "idle"
by returning to userspace with an exit code of VM_EXITCODE_REQIDLE.
MFC after: 2 weeks
have an accurate length on an EPT violation. This is not needed by the
instruction decoding code because it also has to work with AMD/SVM that
does not provide a valid instruction length on a Nested Page Fault.
In collaboration with: Leon Dang (ldang@nahannisys.com)
Discussed with: grehan
MFC after: 1 week
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks
Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.
The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.
Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.
Reviewed by: tychon
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D2428
- Must-Be-Zero bits cannot be set.
- EFER_LME and EFER_LMA should respect the long mode consistency checks.
- EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities.
- Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce
segment limits in 64-bit mode.
MFC after: 2 weeks