Now that the atomic macros are always genuinely atomic on x86, they can
be used for synchronization with Xen. A single core VM isn't too
unusual, but actual single core hardware is uncommon.
Replace an open-coding of evtchn_clear_port() with the inline.
Substantially inspired by work done by Julien Grall <julien@xen.org>,
2014-01-13 17:40:58.
Reviewed by: royger
MFC after: 1 week
While unusual, intr_register_source() can return failure. A likely
cause might be another device grabbing from Xen's interrupt range.
This should NOT happen, but could happen due to a bug. As such check
for this and fail if it occurs.
This theoretical situation also effects xen_intr_find_unused_isrc().
There, .is_pic must be tested to ensure such an intrusion doesn't cause
misbehavior.
Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31995
Consistently use ~0 instead of 0 when clearing xenisrc structures.
0 is a valid event channel number, even though it is reserved by Xen.
Whereas ~0 is guaranteed invalid.
Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
In xen_intr_release_isrc(), the isrc should only be removed if it is
assigned to a valid port. This had been mitigated by using 0 for not
having a port, but this is actually corrupting the table. Fix this bug
as modifying the code would cause this bug to manifest as kernel memory
corruption. Similar issue for the vCPU bitmap masks.
The KASSERT() doesn't need lock protection.
Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
The comparison was wrong. Hopefully this never occurred in the wild,
but now ensure the error message will occur before damage is caused.
This appears non-exploitable as exploitation would require a guest to
force Domain 0 to allocate all event channels, which a guest shouldn't
be able to do.
Adjust the error message to better describe what has occurred.
Reviewed by: royger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30743
Appears errors are uncommon since calling xen_intr_release_isrc() on a
xenisrc with xi_close in an undefined state could be bad. Fix this
problematic lurking nasty.
Reviewed by: royger
MFC after: 1 week
The Intel graphics stolen memory is used by the Intel GOP driver on
boot. When using bhyve with GPU passthrough, it's also used by the guest
GOP driver at guest boot. For that reason, bhyve needs to know the
address and size of this region to inform the guest about this region.
Exposing the variables as sysctl allows bhyve to easily read them.
In 38d1ac34ff SIGATOMIC_{MIN,MAX} were
defined in terms of LONG_{MIN,MAX}. Later, they were switched to
__LONG_{MIN,MAX} in 78fe75bc28 where an
include of machine/_limits.h was added. Switch to using fixed width
INT64_{MIN,MAX} and remove the header pollution.
No functional change.
Reviewed by: theraven, emaste
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D39196
Newer Intel CPUs/iGPUs use a new method to determine the base address of
the stolen memory. This code was ported from Linux.
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D39057
Make a pass at the various nexus implementations, fixing some very minor
style issues, obsolete comments, etc.
The method declaration section has become unwieldy in many respects.
Attempt to tame it by:
- Using generated method typedefs
- Grouping methods roughly by category, and then alphabetically.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D38495
The xenpf_dom0_console_t structure can grow as more data is added, and
hence we need to check that the fields we accesses have been filled by
Xen. The only extra field FreeBSD currently uses is the top 32 bits
for the frame buffer physical address.
Note that this field is present in all the versions that make the
information available from the platform hypercall interface, so the
check here is mostly cosmetic, and to remember us that newly added
fields require checking the size of the returned data.
Fixes: 6f80738b22 ('xen: fetch dom0 video console information from Xen')
Sponsored by: Citrix Systems R&D
It's possible for Xen to switch the video mode set by the boot loader,
so that the information passed in the kernel metadata is no longer
valid. Fetch the video mode used by Xen using an hypercall and update
the medatada for the kernel to use the correct video mode.
Sponsored by: Citrix Systems R&D
This is required for a further change that will make use of a field
that was added in version 0x00040d00.
No functional change expected.
Sponsored by: Citrix Systems R&D
This makes the detection of VMs common between platforms that
have SMBios.
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D38800
If a CPU for some reason returns 0 as CPU frequency, we currently panic
on the resulting divide by zero when trying to initialize the CPU(s) via
APIC. When this happens, we'll fallback to measuring the frequency
instead.
PR: 269767
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/664
When the cycle counter is "stable", i.e., synchronized across vCPUs by
the hypervisor, userspace can use a serialized rdtsc instead of relying
on rdtscp, just like the kernel timecounter does. This can be useful
for performance in guests where the hypervisor hides rdtscp for some
reason.
To avoid breaking compatibility with older userspace which expects
rdtscp to be usable when pvclock exports timekeeping info, hide this
feature behind a sysctl.
Reviewed by: kib
Tested by: Shrikanth R Kamath <kshrikanth@juniper.net>
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38342
Exposing the a power loss of the rtc as an sysctl makes it easier to
detect an empty cmos battery.
Reviewed by: manu
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38325
This covers all currently defined bits, adding PKRU and TILE.
Reviewed by: jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38219
The option is not even recognized and with that patched it does not
compile. Even if it did work, it would be prohibitively expensive to
use.
Interested parties can use pmcstat or dtrace instead.
Linux reads MISC_FEATURES_ENABLES to manage the CPUID faulting feature
(undocumented in the Intel SDM, but documented in 323850-004 (Intel
Virtualization Technology FlexMigration Application Note). Since bhyve
doesn't emulate this feature, we always return 0. Neither does bhyve
support the MONITOR/MWAIT fault bit also in this MSR (which is
documented in the sdm), so always return 0.
Sponsored by: Netflix
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D36602
MAX_APIC_ID must be at least twice MAXCPU. Increase it to 0x800 so that
it is possible to set MAXCPU to 512 or 1024 in a custom kernel config
file.
Note that increasing this limit does not itself cause any allocations
to be larger; it just allows madt_parse_cpu() to process higher APIC
IDs.
APIC IDs may be sparse and so we can waste memory. This is independent
of this change, but becomes more of an issue as the maximum APIC ID
grows. This should be addressed with future work.
Reviewed by: royger
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37067
They were copy-pasted when x86/include/elf.h file was merged from its
i386 and amd64 counterparts. Having the text around inclusions
significantly different is somewhat confusing.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37085
both for i386 native and compat32 amd64. We know the ld-elf.so.1 size
in advance, it fits there. Trying to push it up after the end of a
binary cannot work reliably and eventually fail for large binaries.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37085
When bus_dmamap_create is called, if bouncing might be required we
reserve enough pages for a maximum-length request, subject to the
MAX_BPAGES constraint (32 MB on amd64; 32 MB or 2 MB on i386
depending on the amount of RAM).
Since pages used for bouncing are typically non-consecutive, each
bounced page will typically constitute a busdma segment; as such, we
are unlikely to ever successfully use more pages than the nsegments
limit. Limit the number of pages reserved to nsegments.
On FreeBSD/Firecracker, this reduces bounce page memory consumption
from 32 MB to 512 kB, making VMs with 128 MB of RAM usable.
Reviewed by: imp, mav
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D37082
Now that we can PVH boot on a non-Xen hypervisor, we shouldn't set
machdep.bootmethod to "XEN". Instead, set it to "PVH"; there are
other ways to discern the hypervisor.
Reviewed by: royger
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D36191
For historical reasons, Xen kernel command lines have options
separated by commas. Every other FreeBSD platform uses whitespace;
this is also necessary in PVH in order to support the Firecracker
VMM. Allow options to be separated by any combination of commas
and whitespace.
Reviewed by: imp
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D36190
The PVH boot protocol, introduced by Xen, is now used by some non-Xen
platforms (e.g. the Firecracker VM) as well. In order to accommodate
these, we use CPUID to detect Xen and only perform Xen-specific setup
when running on that platform.
The "isxen" function duplicates some work done by identcpu.c later in
the boot process; but we need it here since this is the very first C
code which runs when PVH booting (even before hammer_time).
In many places the existing code had
xc_printf(...);
HYPERVISOR_shutdown(SHUTDOWN_crash);
making use of Xen functionality to print a message and shut down; in
the places where this idiom can be reached in the non-xen case, we
replace it idiom with a CRASH(...) macro which calls those in the Xen
case and halts in the non-Xen case.
Reviewed by: royger
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D35801
Version 0 of PVH booting uses a Xen hypercall to retrieve the system
memory map; in version 1 the memory map can be provided via the
start_info structure.
Using the memory map from the version 1 start_info structure allows
FreeBSD to use PVH booting on systems other than Xen, e.g. on the
Firecracker VM.
Reviewed by: royger
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D35800
Linux has two bugs in its handling of the x86 MP table:
1. It assumes that there is always 640 kB of base memory, and looks for
the MP table in the top kB of this even if the memory map indicates
that memory location does not exist.
2. It ignores that entry_count field and instead iterates through the
MP table by scanning until it runs out of bytes in the table.
The Firecracker VM (and probably other related VMs) relies on both of
these bugs. With the MPTABLE_LINUX_BUG_COMPAT option, we search for
the MP table at address 639k even if that isn't in the memory map; and
replace a zeroed entry_count with a value computed from scanning the
table until we run out of table bytes.
Reviewed by: imp
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D35799
On systems without a PCI bus, legacy_pcib_identify by default creates
one anyway:
legacy_pcib_identify: no bridge found, adding pcib0 anyway
This commit adds a kernel option NO_LEGACY_PCIB which disables this,
allowing systems to be fully PCI-free.
Reviewed by: imp
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D35798
GCC warns about the mismatched sizes on i386 where vm_paddr_t is 64
bits.
Reviewed by: imp, markj
Differential Revision: https://reviews.freebsd.org/D36750
This matches the return type of pmap_mapdev/bios.
Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36548
and not on the trampoline stack. This is a useful way to ensure that
we did not enabled interrupts while on user %cr3 or trampoline stack.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
It does not work with ULE, which is the default scheduler for over a
decade.
Reviewed by: emaste, kib
Differential Revision: https://reviews.freebsd.org/D36094