r268427, r268428, r268521, r268638, r268639, r268701, r268777,
r268889, r268922, r269008, r269042, r269043, r269080, r269094,
r269108, r269109, r269281, r269317, r269700, r269896, r269962,
r269989.
Catch bhyve up to CURRENT.
Lightly tested with FreeBSD i386/amd64, Linux i386/amd64, and
OpenBSD/amd64. Still resolving an issue with OpenBSD/i386.
Many thanks to jhb@ for all the hard work on the prior MFCs !
r267921 - support the "mov r/m8, imm8" instruction
r267934 - document options
r267949 - set DMI vers/date to fixed values
r267959 - doc: sort cmd flags
r267966 - EPT misconf post-mortem info
r268202 - use correct flag for event index
r268276 - 64-bit virtio capability api
r268427 - invalidate guest TLB when cr3 is updated, needed for TSS
r268428 - identify vcpu's operating mode
r268521 - use correct offset in guest logical-to-linear translation
r268638 - chs value
r268639 - chs fake values
r268701 - instr emul operand/address size override prefix support
r268777 - emulation for legacy x86 task switching
r268889 - nested exception support
r268922 - fix INVARIANTS build
r269008 - emulate instructions found in the OpenBSD/i386 5.5 kernel
r269042 - fix fault injection
r269043 - Reduce VMEXIT_RESTARTs in task_switch.c
r269080 - fix issues in PUSH emulation
r269094 - simplify return values from the inout handlers
r269108 - don't return -1 from the push emulation handler
r269109 - avoid permanent sleep in vm_handle_hlt()
r269281 - list VT-x features in base kernel dmesg
r269317 - Mark AHCI fatal errors as not completed
r269700 - Support PCI extended config space in bhyve
r269896 - Minor cleanup
r269962 - use max guest memory when creating IOMMU domain
r269989 - fix interrupt mode names
Turn on interrupt window exiting unconditionally when an ExtINT is being
injected into the guest.
Add helper functions to populate VM exit information for rendezvous and
astpending exits.
Provide APIs to directly get 'lowmem' and 'highmem' size directly.
Expose the amount of resident and wired memory from the guest's vmspace
Add ioctl(VM_REINIT) to reinitialize the virtual machine state maintained
by vmm.ko. This allows the virtual machine to be restarted without having
to destroy it first.
Review pass through jail.8
Replace usage of "prison" with "jail", since that term has mostly dropped
out of use. Note once at the beginning that the "prison" term is equivalent,
but do not use it otherwise. [1]
Some grammar issues.
Some mdoc formatting fixes.
Consistently use \(em for em dashes, with spaces around it.
Avoid contractions.
Prefer ssh to telnet.
PR: 176832 [1]
Support ! operator in "files" files.
Improve error detection and reporting
Cleanup code to make it easier to maintain.
Remove mandatory keyword: it has been used for 17 years.
Bump version number (we should have bumped for -I too, but didn't)
r261501 | imp | 2014-02-04 17:26:11 -0700 (Tue, 04 Feb 2014) | 5 lines
Fix ! by not clearing not at the bottom of the loop.
Add a blank line
Submitted by: bde (blank line)
r261493 | imp | 2014-02-04 11:28:58 -0700 (Tue, 04 Feb 2014) | 5 lines
Implement the '!' operator for files* files. It means 'include this
only if the specified option is NOT specified.' Bump version because
old config won't be able to cope with files* files that have this
construct in them.
r261446 | imp | 2014-02-03 12:14:36 -0700 (Mon, 03 Feb 2014) | 5 lines
Convert the loop by gotos into a for loop to improve readability. I
did this only with the inner loop for the token parsing, and not the
outer loop which was understandable enough when the extra layers of
looping went away...
r261445 | imp | 2014-02-03 12:10:33 -0700 (Mon, 03 Feb 2014) | 4 lines
Fix a bug introduced in r261437 that failed to honor "optional
profiling-routine" to work, since profiling-routine is not really an
option or a device, but a special case elsewhere in the code.
r261444 | imp | 2014-02-03 11:56:41 -0700 (Mon, 03 Feb 2014) | 2 lines
Slight cleanup to the error messaging to compress code vertically...
r261442 | imp | 2014-02-03 11:31:51 -0700 (Mon, 03 Feb 2014) | 2 lines
Better error messages when EOF is hit in the middle of a phrase.
r261438 | imp | 2014-02-03 09:54:53 -0700 (Mon, 03 Feb 2014) | 5 lines
Move the check for standard keyword + optional inclusion specifier to
its proper location. Otherwise you could have 'file.c standard pci'
without an error. This construct isn't in our tree, and has no well
defined meaning.
r261437 | imp | 2014-02-03 09:47:10 -0700 (Mon, 03 Feb 2014) | 4 lines
Don't believe we have a requirement until after we've checked all the
known key words. This will make error messages slightly better in
weird corner cases, but should otherwise be a nop.
r261436 | imp | 2014-02-03 09:46:01 -0700 (Mon, 03 Feb 2014) | 3 lines
In the 17 years since r30796, the mandatory keyword has never been used
in any files as far as I can tell, and is currently unused. Retire it.
r261435 | imp | 2014-02-03 08:10:44 -0700 (Mon, 03 Feb 2014) | 6 lines
Slightly deobfuscate read_file() and likely pessimize the runtime
performance by epsilon.
(Translation: elminate bogus macros that hid 'returns' making it hard
to read and moved a block of code inline rather than at the end of the
fuction where it was effectively a 'gosub' kind of goto).
Added support for extra ifconfig args to jail ip4.addr & ip6.addr params
This allows for CARP interfaces to be used in jails e.g.
ip4.addr = "em0|10.10.1.20/32 vhid 1 pass MyPass advskew 100"
r269340 will not be MFC'ed as mentioned due to the slim window and the
amount of additional commits required to support it.
Sponsored by: Multiplay
Add support for VMWare dialect of EXTENDED COPY command, aka VAAI Clone.
This allows to clone VMs and move them between LUNs inside one storage
host without generating extra network traffic to the initiator and back,
and without being limited by network bandwidth.
LUNs participating in copy operation should have UNIQUE NAA or EUI IDs set.
For LUNs without these IDs VMWare will use traditional copy operations.
Beware: the above LUN IDs explicitly set to values non-unique from the VM
cluster point of view may cause data corruption if wrong LUN is addressed!
Sponsored by: iXsystems, Inc.
Omit "too many sections" warnings if the ELF file is not dynamically
linked (and is therefore skipped anyway), and otherwise output it only
once. An errant core file would previously cause kldxref to output a
number of warnings.
Also introduce a MAXSEGS #define and replace literal 2 with it, to make
comparisons clear.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
266708,266724,266934,266935,268521:
Emulation of the "ins" and "outs" instructions.
Various fixes for translating guest linear addresses to guest physical
addresses.
265941,265951,266390,266550,266910:
Various bhyve fixes:
- Don't save host's return address in 'struct vmxctx'.
- Permit non-32-bit accesses to local APIC registers.
- Factor out common ioport handler code.
- Use calloc() in favor of malloc + memset.
- Change the vlapic timer frequency to be in the ballpark of contemporary
hardware.
- Allow the guest to read the TSC via MSR 0x10.
- A VMCS is always inactive when it exits the vmx_run() loop. Remove
redundant code and the misleading comment that suggest otherwise.
- Ignore writes to microcode update MSR. This MSR is accessed by RHEL7
guest.
Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.
- Provide an alias for the userboot console and name it 'comconsole'.
- Use EV_ADD to create an mevent and EV_ENABLE to enable it.
- abort(3) the process in response to a VMEXIT_ABORT.
- Don't include the guest memory segments in the bhyve(8) process core dump.
- Make the vmx asm code dtrace-fbt-friendly.
- Allow vmx_getdesc() and vmx_setdesc() to be called for a vcpu that is in
the VCPU_RUNNING state.
- Enable VMX in the IA32_FEATURE_CONTROL MSR if it not enabled and the MSR
isn't locked.
Add an ioctl to suspend a virtual machine (VM_SUSPEND).
Add logic in the HLT exit handler to detect if the guest has put all vcpus
to sleep permanently by executing a HLT with interrupts disabled.
When this condition is detected the guest with be suspended with a reason of
VM_SUSPEND_HALT and the bhyve(8) process will exit.
This logic can be disabled via the tunable 'hw.vmm.halt_detection'.
- Add a very simple virtio_random(4) driver for FreeBSD guests to harvest
entropy from hypervisors.
- Add support to bhyve for the virtio RNG entropy-source device to provide
entry to bhyve guests.
Fixes for vcpu management in bhyve:
- Use 'cpuset_t' to represent the vcpus active in a virtual machine.
- Modify the "-p" option to be more flexible when associating a 'vcpu' with
a 'hostcpu'.
Various uart fixes:
- Open the uart emulation's backing tty in non-blocking mode.
- Support 16-bit register access.
- Disable the 'uart_drain()' callback when the emulated receive FIFO
is full.
Various PCI fixes:
- Allow PCI devices to be configured on all valid bus numbers from 0 to 255.
- Tweak the handling of PCI capabilities in emulated devices to remove
the non-standard zero capability list terminator.
- Add a check to validate that memory BARs of passthru devices are 4KB
aligned.
- Respect and track the enable bit in the PCI configuration address word.
- Handle quad-word access to 32-bit register pairs.
Make iSCSI initiator keep Initiator Session ID (ISID) across reconnects.
Previously ISID was changed every time, that made impossible correct
persistent reservation, because reconnected session was identified as
completely new one.
Close race in r268291 between port destruction, delayed by sessions
teardown, and new port creation during `service ctld restart`.
Close it by returning iSCSI port internal state, that allows to identify
dying ports, which should not be counted as existing, from really alive.
Pass through iSCSI session ISID from LOGIN request to the CTL frontend.
ISID is an important part of initiator transport ID for iSCSI. It is not
used now, but should be to properly implement persistent reservation.
Burry devid port method, which was a gross hack.
Instead make ports provide wanted port and target IDs, and LUNs provide
wanted LUN IDs. After that core Device ID VPD code only had to link all
of them together and add relative port and port group numbers.
LUN ID for iSCSI LUNs no longer created by CTL, but by ctld, and passed
to CTL as "scsiname" LUN option. This makes LUNs to report the same set
of IDs, independently from the port through which it is accessed, as
required by SCSI specifications.
Create separate CTL port for every iSCSI target (and maybe portal group).
Having single port for all iSCSI connections makes problematic implementing
some more advanced SCSI functionality in CTL, that require proper ports
enumeration and identification.
This change extends CTL iSCSI API, making ctld daemon to control list of
iSCSI ports in CTL. When new target is defined in config fine, ctld will
create respective port in CTL. When target is removed -- port will be
also removed after all active commands through that port properly aborted.
This change require ctld to be rebuilt to match the kernel.
As a minor side effect, this allows to have iSCSI targets without LUNs.
While that may look odd and not very useful, that is not incorrect.
Rename vt(4) vga module to dismiss interference with syscons(4) vga module.
267623 Log:
Remove stale link to deleted vt(4) xboxfb driver.
267624 Log:
syscons(4) and vt(4) can be built together now.
267625 Log:
Allow to disable syscons(4) if "hw.syscons.disable" kenv is set.
267626 Log:
Suspend vt(4) initialization if "kern.vt.disable" kenv is set.
267965 by emaste@ Log:
Use a common tunable to choose between vt(4)/sc(4)
With this change and previous work from ray@ it will be possible to put
both in GENERIC, and have one enabled by default, but allow the other to
be selected via the loader.
(The previous implementation had separate kern.vt.disable and
hw.syscons.disable tunables, and would panic if both drivers were
compiled in and neither was explicitly disabled.)
268175 by emaste@ Log:
Fix vt(4) detection in kbdcontrol and vidcontrol
As sc(4) and vt(4) coexist and are both enabled in GENERIC, the existence
of a vt(4) sysctl is not sufficient to determine that vt(4) is in use.
Reported by: Trond Endrestøl
268045 by emaste@ Log:
Add vt(4) to GENERIC and retire the separate VT config
vt(4) and sc(4) can now coexist in the same kernel. To choose the vt
driver, set the loader tunable kern.vty=vt .
Sponsored by: The FreeBSD Foundation
serial_num and device_id fields are not necessarily null-terminated.
Before this it was impossible to use all 16 bytes of serial number, and
client always got serial number NULL-terminated, that is not required.
Add the possibility to specify ecx when performing cpuid calls.
MFC r267673:
Restore the ABI of the cpuctl(4) ioctl request CPUCTL_CPUID.
MFC r267814:
Make cpuctl_do_cpuid() and cpuctl_do_cpuid_count() return void.
Fix issues in config parser relating to lun serial numbers.
Without this fix some serial numbers needed to be quoted
to avoid the config parser bailing out.
Submitted by: delphij
Sponsored by: iXsystems
Handle single-byte reads from the bvmcons port (0x220) by returning
0xff. Some guests may attempt to read from this port to identify
psuedo-PNP ISA devices. (The ie(4) driver in FreeBSD/i386 is one
example.)
vt(4) support for vidcontrol(1).
o Teach vidcontrol(1) how to load vt(4) font.
o Teach vidcontrol(1) to distinct which virtual terminal system is running now.
o Load vt(4) fonts from different location.
o Add $FreeBSD$ tag for path.h.
vt(4) support for kbdcontrol(1).
Enable kbdcontrol(1) to use maps from vt(4) keymaps dir /usr/share/vt/keymaps
if vt(4) is present.
Sponsored by: The FreeBSD Foundation
Various x2APIC fixes and enhancements:
- Use spinlocks for the vioapic.
- Handle the SELF_IPI MSR.
- Simplify the APIC mode switching between MMIO and x2APIC. The guest is
no longer allowed to switch modes at runtime. Instead, the desired mode
is set when the virtual machine is created.
- Disallow MMIO access in x2APIC mode and MSR access in xAPIC mode.
- Add support for x2APIC virtualization assist in Intel VT-x.
Add virtualized XSAVE support to bhyve which permits guests to use XSAVE and
XSAVE-enabled features like AVX.
- Store a per-cpu guest xcr0 register and handle xsetbv VM exits by emulating
the instruction.
- Only expose XSAVE to guests if XSAVE is enabled in the host. Only expose
a subset of XSAVE features currently supported by the guest and for which
the proper emulation of xsetbv is known. Currently this includes X87, SSE,
AVX, AVX-512, and Intel MPX.
- Add support for injecting hardware exceptions into the guest and use this
to trigger exceptions in the guest for invalid xsetbv operations instead
of potentially faulting in the host.
- Queue pending exceptions in the 'struct vcpu' instead of directly updating
the processor-specific VMCS or VMCB. The pending exception will be delivered
right before entering the guest.
- Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
it to only deliver x86 hardware exceptions. This new ioctl is now used to
inject a protection fault when the guest accesses an unimplemented MSR.
- Expose a subset of known-safe features from leaf 0 of the structured
extended features to guests if they are supported on the host including
RDFSBASE/RDGSBASE, BMI1/2, AVX2, AVX-512, HLE, ERMS, and RTM. Aside
from AVX-512, these features are all new instructions available for use
in ring 3 with no additional hypervisor changes needed.
Expand the support for PCI INTx interrupts including providing interrupt
routing information for INTx interrupts to I/O APIC pins and enabling
INTx interrupts in the virtio and AHCI backends.
Remove support for legacy PCI devices. These haven't been needed since
support for LPC uart devices was added and it conflicts with upcoming
patches to add PCI INTx support.
Approved by: grehan
Various AHCI fixes:
- Fix issue with stale fields from a recycled request pulled off the
freelist.
- Provide an indication a "PIO Setup Device to Host FIS" occurred while
executing the IDENTIFY DEVICE and IDENTIFY PACKET DEVICE commands.
- Provide an indication a "D2H Register FIS" occurred during a SET FEATURES
command.
- Though there currently isn't a way to insert new media into an ATAPI
drive, at least pretend to support Asynchronous Notification (AN) to
avoid a guest needlessly polling for it.
- Don't reissue in-flight commands.
- Constrain the amount of data returned to what is actually available
not the size of the buffer.
an embedded newline appearing within the options string surrounded by
double-quotes. Rework the logic that goes into setting dataset options on
the root pool dataset while we're here -- added two new variables (which
can be altered via scripting) ZFSBOOT_POOL_CREATE_OPTIONS and also
ZFSBOOT_BOOT_POOL_CREATE_OPTIONS for setting pool/dataset attributes at
the time of pool creation. The former is for setting options on the root
pool (zroot) and the latter is for setting options on the optional separate
boot pool (bootpool) implicitly enabled when using either GELI or MBR. The
default value for the root pool variable (ZFSBOOT_POOL_CREATE_OPTIONS) is
"-O compress=lz4 -O atime=off" and the default value for separate boot pool
variable (ZFSBOOT_BOOT_POOL_CREATE_OPTIONS) is NULL (no additional options
for the separate boot pool dataset).
Reviewed by: allanjude
Here is a patch for the bsdinstall root-on-zfs stuff that adds optional
encryption for swap, and optional gmirror for swap (which can be combined)
Updates to the datasets created by zfsboot.
Set compress=lz4 for the entire pool, removing it from the individual
datasets
Remove exec=no from /usr/src, breaks the test suite.
Fix the "disks" variable reuse.
It starts off being used to track the grammar for the number of disks
(singular vs plural) and then it is reused as the list of available disks.
Replace the variable with disks_grammar and move 'disk' and 'disks' to
msg_ vars so they can be translated in the future.
Submitted by: Allan Jude <freebsd@allanjude.com>
Reviewed by: roberto
Sponsored by: ScaleEngine Inc.
nothing to compare.
Because of the change to find in SVN r253886, the entire temproot would be
deleted if it became empty, leading to a confusing message "*** FATAL ERROR:
The temproot directory ${TEMPROOT} has disappeared!"
Note that mergemaster does not do anything useful in this situation anyway
(e.g. put IGNORE_FILES="/etc/group /etc/master.passwd" in
/etc/mergemaster.rc and run mergemaster -p).
As noted in that commit, add -mindepth 1.
PR: bin/188485
Submitted by: David Boyd
Add a command line argument (-l) to end event collection after some
number of seconds. The number of seconds may be a fraction.
Submitted by: Julien Charbon <jcharbon@versign.com>
Relnotes: yes
In one case generating callgraph output from a 24MB system-wide sampling
data file took 17.4 seconds on average. Profiling showed pmcstat
spending a lot of time in strcmp, due to hash collisions.
Replacing the XOR-only hash with FNV-1a reduces the run time for my
test by 40%.
- Don't claim the adapter is idle if it is clearing a drive.
- Fix an off by one error when checking for the stop event. This
resulted in not showing the most recent event by default.
- When the stop even is hit, break out of the outer loop to stop
fetching more events.
Fix a couple of issues with vcpu state:
- Add a parameter to 'vcpu_set_state()' to enforce that the vcpu is in the
IDLE state before the requested state transition. This guarantees that
there is exactly one ioctl() operating on a vcpu at any point in time and
prevents unintended state transitions.
- Fix a race between VMRUN() and vcpu_notify_event() due to 'vcpu->hostcpu'
being updated outside of the vcpu_lock().
260531,260532,260550,260619,261170,261453,261621,263280,263290,264516:
Add support for local APIC hardware-assist.
- Restructure vlapic access and register handling to support hardware-assist
for the local APIC.
- Use the 'Virtual Interrupt Delivery' and 'Posted Interrupt Processing'
feature of Intel VT-x if supported by hardware.
- Add an API to rendezvous all active vcpus in a virtual machine and use
it to support level triggered interrupts with VT-x 'Virtual Interrupt
Delivery'.
- Use a cheaper IPI handler than IPI_AST for nested page table shootdowns
and avoid doing unnecessary nested TLB invalidations.
Reviewed by: neel
Adds gpioiic.4 and gpioled.4 man pages. Moves some of the information that
was previously available on gpio.4 to their respectives pages. Add the
cross references on gpioctl.8.
Add gpiobus(4) as a link to gpio(4).
Fork a child process and wait until the process terminates when the -P
option is specified. This behavior is documented on the manual page.
PR: bin/187265
Add the -a option to pmcstat. This produces a full stack track on the
sampled points. See the man page for details on how this works.
Obtained from: Netflix, Inc.
Make it possible for the initiator side to operate in both proxy
and normal mode; this makes it possible to compile with the former
by default, but use it only when neccessary. That's especially
important for the userland part.
Sponsored by: The FreeBSD Foundation