This fixes the issue of bhyve appearing to halt when using
nmdm ports for the console, until a connection is made to
the other end.
bhyveload already does this.
Reported by: Many.
MFC after: 3 weeks.
new command line options -W, to enable it when needed.
On my tests this change by almost ten times improves rpcbind performance.
No objections: many, net@
processor-specific VMCS or VMCB. The pending exception will be delivered right
before entering the guest.
The order of event injection into the guest is:
- hardware exception
- NMI
- maskable interrupt
In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt
window-exiting and inject it as soon as possible after the hardware exception
is injected. Also since interrupts are inherently asynchronous, injecting
them after the hardware exception should not affect correctness from the
guest perspective.
Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
it to only deliver x86 hardware exceptions. This new ioctl is now used to
inject a protection fault when the guest accesses an unimplemented MSR.
Discussed with: grehan, jhb
Reviewed by: jhb
what btxldr expects (.set MEM_DATA,start+0x1000 in btxldr.S).
This makes resulting ELF binaries bootable with grub, gptboot and boot2.
PR: 153801
Submitted by: Gleb Kurtsou <gleb.kurtsou at gmail.com>
Tested by: Ruben Kerkhof <ruben at rubenkerkhof.com>
Glanced at by: jhb, peter
MFC after: 1 month
'-m <file>' spits out the given stream into <file> (eg, /dev/stdout).
However, it only resolves the first symbol; it doesn't parse the entire
callgraph. If it fails to lookup then it doesn't print anything.
'-a' instead does a symbol and file:line lookup for each address in each
callgraph and will happily print the address itself with no lookup
information if it couldn't look things up.
This makes it much easier to pull out individual records from a
pmc data file and look at the callgraph information without having to
hand-decode the addresses.
Sponsored by: Netflix, Inc.
Modelled after the i386 zfsloader. However, with no
2nd stage zfsboot to search for a bootable dataset,
attempt a ZFS boot if there is more than one ZFS
dataset found during the disk probe.
sys/boot/userboot/zfs
- build the ZFS boot library
sys/boot/userboot/userboot/
conf.c
- Add the ZFS pool and filesystem tables
devicename.c
- correctly format ZFS devices
main.c
- increase the size of the libstand malloc pool
to account for the increased usage from ZFS buffers
- probe for a ZFS dataset, and if one is
found, attempt to boot from it.
usr.sbin/bhyveload/bhyveload.c
- allow multiple invocations of the '-d' option
to specify multiple disks e.g. a raidz set.
Up to 32 disks are supported.
Tested with various combinations of GPT, MBR, single
and multiple disks, RAID-Z, mirrors.
Reviewed by: neel
Discussed with: avg
Tested by: Michael Dexter and others
MFC after: 3 weeks
simplify the implementation of the x2APIC virtualization assist in VT-x.
Prior to this change the vlapic allowed the guest to change its mode from
xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked
when the virtual machine is created. This is not very constraining because
operating systems already have to deal with BIOS setting up the APIC in
x2APIC mode at boot.
Fix a bug in the CPUID emulation where the x2APIC capability was leaking
from the host to the guest.
Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore
MSR accesses to the vlapic when it is in xAPIC mode.
The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8)
can be used to change the mode to x2APIC instead.
Discussed with: grehan@
the non-standard zero capability list terminator. Instead, track
the start and end of the most recently added capability and use that
to adjust the previous capability's next pointer when a capability is
added and to determine the range of config registers belonging to
PCI capability registers.
Reviewed by: neel
NB: If the zfsboot variables ($ZFSBOOT_*) are set, a script is
assumed to want zfsboot module instead of scriptedpart module.
Submitted by: Loïc Brarda <loic.brarda@cern.ch>
Reviewed by: nwhitehorn@
MFC after: 3 days
This is done by representing each bus as root PCI device in ACPI. The device
implements the _BBN method to return the PCI bus number to the guest OS.
Each PCI bus keeps track of the resources that is decodes for devices
configured on the bus: i/o, mmio (32-bit) and mmio (64-bit). These windows
are advertised to the guest via the _CRS object of the root device.
Bus 0 is treated specially since it consumes the I/O ports to access the
PCI config space [0xcf8-0xcff]. It also decodes the legacy I/O ports that
are consumed by devices on the LPC bus. For this reason the LPC bridge can
be configured only on bus 0.
The bus number can be specified using the following command line option
to bhyve(8): "-s <bus>:<slot>:<func>,<emul>[,<config>]"
Discussed with: grehan@
Reviewed by: jhb@
the IDENTIFY DEVICE and IDENTIFY PACKET DEVICE commands.
Also, provide an indication a "D2H Register FIS" occurred during a SET FEATURES
command.
Approved by: grehan (co-mentor)
this could lead to the -n option effectively being ignored (in case
ac_line happened to be 0 aka SRC_AC), or other undefined behaviour.
PR: 169779
Submitted by: Alex Gonzalez <loox at e-shell.net>
Reviewed by: jhb
MFC after: 2 weeks
a dummy handler to make it interrupt an ioctl(2) or select(2).
This makes those short-lived ctld(8) zombies disappear.
Sponsored by: The FreeBSD Foundation
It doesn't change visible behaviour, as previously auth-group "default"
wasn't redefinable, so by default access was always denied.
Sponsored by: The FreeBSD Foundation
a dummy handler to make it interrupt an ioctl(2) or select(2).
This makes those short-lived iscsid(8) zombies disappear.
Sponsored by: The FreeBSD Foundation
- Similar to the hack for bootinfo32.c in userboot, define
_MACHINE_ELF_WANT_32BIT in the load_elf32 file handlers in userboot.
This allows userboot to load 32-bit kernels and modules.
- Copy the SMAP generation code out of bootinfo64.c and into its own
file so it can be shared with bootinfo32.c to pass an SMAP to the i386
kernel.
- Use uint32_t instead of u_long when aligning module metadata in
bootinfo32.c in userboot, as otherwise the metadata used 64-bit
alignment which corrupted the layout.
- Populate the basemem and extmem members of the bootinfo struct passed
to 32-bit kernels.
- Fix the 32-bit stack in userboot to start at the top of the stack
instead of the bottom so that there is room to grow before the
kernel switches to its own stack.
- Push a fake return address onto the 32-bit stack in addition to the
arguments normally passed to exec() in the loader. This return
address is needed to convince recover_bootinfo() in the 32-bit
locore code that it is being invoked from a "new" boot block.
- Add a routine to libvmmapi to setup a 32-bit flat mode register state
including a GDT and TSS that is able to start the i386 kernel and
update bhyveload to use it when booting an i386 kernel.
- Use the guest register state to determine the CPU's current instruction
mode (32-bit vs 64-bit) and paging mode (flat, 32-bit, PAE, or long
mode) in the instruction emulation code. Update the gla2gpa() routine
used when fetching instructions to handle flat mode, 32-bit paging, and
PAE paging in addition to long mode paging. Don't look for a REX
prefix when the CPU is in 32-bit mode, and use the detected mode to
enable the existing 32-bit mode code when decoding the mod r/m byte.
Reviewed by: grehan, neel
MFC after: 1 month
only if the specified option is NOT specified.' Bump version because
old config won't be able to cope with files* files that have this
construct in them.
did this only with the inner loop for the token parsing, and not the
outer loop which was understandable enough when the extra layers of
looping went away...
its proper location. Otherwise you could have 'file.c standard pci'
without an error. This construct isn't in our tree, and has no well
defined meaning.
performance by epsilon.
(Translation: elminate bogus macros that hid 'returns' making it hard
to read and moved a block of code inline rather than at the end of the
fuction where it was effectively a 'gosub' kind of goto).
r261266:
Add a jail parameter, allow.kmem, which lets jailed processes access
/dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
This in conjunction with changing the drm driver's permission check from
PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.
commit 6b569451b92c48ccf1768da32e7e89189e1aa253
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Mon Jan 27 22:50:46 2014 +0000
Always install nmtree as mtree.
For compability, link mtree to nmtree.
X-MFC after: never
Sponsored by: DARPA, AFRL
commit c1acf022c533c5ae27e0cd556977eafe3f5959eb
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Fri Jan 17 21:46:44 2014 +0000
Add an option WITHOUT_NCURSESW to suppress building and linking to
libncursesw. While wide character support it useful we'd like to
only need one ncurses library on embedded systems.
MFC after: 4 weeks
Sponsored by: DARPA, AFRL
the virtio backends.
- Add a new ioctl to export the count of pins on the I/O APIC from vmm
to the hypervisor.
- Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for
ISA interrupts.
- Populate the MP Table with I/O interrupt entries for any PCI INTx
interrupts.
- Create a _PRT table under the PCI root bridge in ACPI to route any
PCI INTx interrupts appropriately.
- Track which INTx interrupts are in use per-slot so that functions
that share a slot attempt to distribute their INTx interrupts across
the four available pins.
- Implicitly mask INTx interrupts if either MSI or MSI-X is enabled
and when the INTx DIS bit is set in a function's PCI command register.
Either assert or deassert the associated I/O APIC pin when the
state of one of those conditions changes.
- Add INTx support to the virtio backends.
- Always advertise the MSI capability in the virtio backends.
Submitted by: neel (7)
Reviewed by: neel
MFC after: 2 weeks
/dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
This in conjunction with changing the drm driver's permission check from
PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.
Submitted by: netchild
MFC after: 1 week
which cause EINVAL returned from nanosleep() which cause loop in
cron_sleep() and making all cron jobs to start about 30 seconds earlier
(which cause f.e. logfiles rotation by newsyslog delayed by 1 hour).
Use simple and proved calculations from kernel's timespecsub() instead.
MFC after: 3 days
- Store the length of each read-only VPD value since not all values are
guaranteed to be ASCII values (though most are).
- Add a new pciio ioctl to fetch VPD for a single PCI device. The values
are returned as a list of variable length records, one for the device
name and each keyword.
- Add a new -V flag to pciconf's list mode which displays VPD data for
each device.
MFC after: 1 week
device name instead of just the selector.
- Accept an optional device argument to -l to restrict the output to only
listing details about a single device. This is mostly useful in
conjunction with other flags like -e or -c to allow a user to query
details about a single device.
MFC after: 1 week
in the one-line comment associated with the dumpdev setting was not present
for the case where the user deselects the dumpdev service (restoring pre-
r256348 behaviour.
MFC After: 3 days
if it was above 4GB. This was seen with CentOS 6.5 guests with
large RAM, since the block drivers are loaded late in the
boot sequence and end up allocating descriptor memory from
high addresses.
Reported by: Michael Dexter
MFC after: 3 days
only allows basic username/password config, and does not provide the
ability to set any of the other WPA options. Regardless, this is
generally sufficient to associate.
Perhaps in the future this could allow full configuring (e.g. being able
to set "anonymous identity", and perhaps some of the more obscure WPA
options), though perhaps that will better belong in bsdconfig when that
grows wlan config ability.
MFC after: 1 week
LPC devices. Among other things, the LPC serial ports now appear as
ACPI devices.
- Move the info for the top-level PCI bus into the PCI emulation code and
add ResourceProducer entries for the memory ranges decoded by the bus
for memory BARs.
- Add a framework to allow each PCI emulation driver to optionally write
an entry into the DSDT under the \_SB_.PCI0 namespace. The LPC driver
uses this to write a node for the LPC bus (\_SB_.PCI0.ISA).
- Add a linker set to allow any LPC devices to write entries into the
DSDT below the LPC node.
- Move the existing DSDT block for the RTC to the RTC driver.
- Add DSDT nodes for the AT PIC, the 8254 ISA timer, and the LPC UART
devices.
- Add a "SuperIO" device under the LPC node to claim "system resources"
aling with a linker set to allow various drivers to add IO or memory
ranges that should be claimed as a system resource.
- Add system resource entries for the extended RTC IO range, the registers
used for ACPI power management, the ELCR, PCI interrupt routing register,
and post data register.
- Add various helper routines for generating DSDT entries.
Reviewed by: neel (earlier version)
hides the setjmp/longjmp semantics of VM enter/exit. vmx_enter_guest() is used
to enter guest context and vmx_exit_guest() is used to transition back into
host context.
Fix a longstanding race where a vcpu interrupt notification might be ignored
if it happens after vmx_inject_interrupts() but before host interrupts are
disabled in vmx_resume/vmx_launch. We now called vmx_inject_interrupts() with
host interrupts disabled to prevent this.
Suggested by: grehan@
i. e. the POSIX:5.6.1 st_ino field, which can be used to detect hard links
in the file system. This is also the default in mkisofs(8) and according to
its man page, no system only being able to cope with Rock Ridge version 1.10
is known to exist.
PR: 185138
Submitted by: Kurt Lidl
MFC after: 1 week
to SIGTERM when ACPI is enabled. Sending SIGTERM to the hypervisor when an
ACPI-aware OS is running will now trigger a soft-off allowing for a graceful
shutdown of the guest.
- Move constants for ACPI-related registers to acpi.h.
- Implement an SMI_CMD register with commands to enable and disable ACPI.
Currently the only change when ACPI is enabled is to enable the virtual
power button via SIGTERM.
- Implement a fixed-feature power button when ACPI is enabled by asserting
PWRBTN_STS in PM1_EVT when SIGTERM is received.
- Add support for EVFILT_SIGNAL events to mevent.
- Implement support for the ACPI system command interrupt (SCI) and assert
it when needed based on the values in PM1_EVT. Mark the SCI as active-low
and level triggered in the MADT and MP Table.
- Mark PCI interrupts in the MP Table as active-low in addition to level
triggered.
Reviewed by: neel
- Implement the PM1_EVT and PM1_CTL registers required by ACPI.
The PM1_EVT register is mostly a dummy as bhyve doesn't support any
of the hardware-initiated events. The only bit of PM1_CNT that is
implemented are the sleep request bits (SPL_EN and SLP_TYP) which
request a graceful power off for S5. In particular, for S5, bhyve
exits with a non-zero value which terminates the loop in vmrun.sh.
- Emulate the Reset Control register at I/O port 0xcf9 and advertise
it as the reset register via ACPI.
- Advertise an _S5 package.
- Extend the in/out interface to allow an in/out handler to request
that the hypervisor trigger a reset or power-off.
- While here, note that all vCPUs in a guest support C1 ("hlt").
Reviewed by: neel (earlier version)
- Add a generic routine to trigger an LVT interrupt that supports both
fixed and NMI delivery modes.
- Add an ioctl and bhyvectl command to trigger local interrupts inside a
guest. In particular, a global NMI similar to that raised by SERR# or
PERR# can be simulated by asserting LINT1 on all vCPUs.
- Extend the LVT table in the vCPU local APIC to support CMCI.
- Flesh out the local APIC error reporting a bit to cache errors and
report them via ESR when ESR is written to. Add support for asserting
the error LVT when an error occurs. Raise illegal vector errors when
attempting to signal an invalid vector for an interrupt or when sending
an IPI.
- Ignore writes to reserved bits in LVT entries.
- Export table entries the MADT and MP Table advertising the stock x86
config of LINT0 set to ExtInt and LINT1 wired to NMI.
Reviewed by: neel (earlier version)
state before the requested state transition. This guarantees that there is
exactly one ioctl() operating on a vcpu at any point in time and prevents
unintended state transitions.
More details available here:
http://lists.freebsd.org/pipermail/freebsd-virtualization/2013-December/001825.html
Reviewed by: grehan
Reported by: Markiyan Kushnir (markiyan.kushnir at gmail.com)
MFC after: 3 days
location of /etc/rc.local on the install media is more appropriate as it
knows serial vs. non-serial and can also do the change earlier (so that
even the initial Install dialog can benefit from the change).
MFC after: 3 days
installation to 3-4+ (depending on vdev type) vdevs would result in odd
error messages where the zpool `create' command appeared to repeat itself
(an artifact of printf when you supply too many arguments -- caused by
neglecting to properly quote the multi-word expansion of $*vdevs when
creating the pool(s)). Example error below (taken from bsdinstall_log):
DEBUG: zfs_create_boot: Creating root pool...
DEBUG: zfs_create_boot: zpool create -o altroot=/mnt -m none -f "zroot" \
ada0p3.nop ada1p3.nopzpool create ada2p3.nop "ada3p3.nop"
DEBUG: zfs_create_boot: retval=1 <output below>
cannot open 'ada1p3.nopzpool': no such GEOM provider
DEBUG: Running installation step: hostname
rm: /tmp/bsdinstall_etc/fstab: No such file or directory
The two lines are unrelated, and the rm is spurious. Let's add `-f' to
that rm(1) so it doesn't confuse us when debugging an install.
MFC after: 3 days
should not have used DIALOG_CANCEL because dialog.subr wasn't included to
define it. The effect of the error was that you could not cancel the
partition dialog. Discovered by checking bsdinstall_log for something else.
MFC after: 3 days
callers treat the MSI 'addr' and 'data' fields as opaque and also lets
bhyve implement multiple destination modes: physical, flat and clustered.
Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
Reviewed by: grehan@
required in such a case). But don't prevent the user from pointing the
gun at his/her foot -- you can disable 4k alignment after enabling geli).
MFC after: 3 days
and subsequent headaches caused by multiple pools with the same name.
Specifically, blast away any labels on the designated swap partition.
Problem was when you install to a given layout *with* swap and then turn
around and re-install the same layout *without* swap (we weren't doing a
labelclear for the swap device, so would end up with an "UNAVAIL" status
zroot pool that may only exist in the pool cache).
MFC after: 3 days
+ For GPT, always provision zfs# partition after swap [for resizability]
+ For MBR, always use a boot pool to relialy place root vdevs at EOD
NB: Fixes edge-cases where MBR combination failed boot (e.g. swap-less)
+ Generalize boot pool logic so it can be used for any scheme (namely MBR)
+ Update existing comments and some whitespace fixes
+ Change some variable names to make reading/debugging the code easier
in zfs_create_boot() (namely prepend zroot_ or bootpool_ to property)
+ Because zroot vdevs are at EOD, no longer need to calculate partsize
(vdev consumes remaining space after allocating swap)
+ Optimize processing of disks -- no reason to loop over the disks 3-4
separate times when we can logically use a single loop to do everything
Discussed on: -stable
MFC after: 3 days
+ De-obfuscate debugging to show actual values
+ Change graid(8) syntax; s/destroy/delete/ [destroy is not invalid syntax]
+ Log commands that were previously quiet
+ Added some new comemnts and updated some existing ones
+ Add missing local for `disk' used in zfs_create_boot()
+ Use $disks instead of multiply-expanding $* in zfs_create_boot()
+ Pedantically unset variable holding geli(8) passphrase after use
+ Pedantically add double-quotes around zpool names and zfs datasets
+ Fix quotation expansion for zpool_cache entries of loader.conf(5)
+ Some limited whitespace changes
MFC after: 3 days
https://communities.vmware.com/thread/107230https://communities.vmware.com/docs/DOC-11677
Basically, ignore the ``function 62'' and ``function 63'' interpretations
of the left/right command key when we're in the lengthiest portion of the
installation (initiated by the `auto' module).
The net effect is that you can now (once you've started the installer from
the media) escape the VM without prematurely terminating the current action
due to spurious escape sequence.
MFC after: 3 days
installation is cdrom. This enables bsdconfig(8) to make use
of the on-disc pkg(8) repository configuration, which fixes
package selection and installation from the dvd installer.
MFC after: 3 days
M-MFC-With: r259426
X-MFC-Before: -RC3
Sponsored by: The FreeBSD Foundation