Commit Graph

83 Commits

Author SHA1 Message Date
grehan
dc702c2d98 Sanity-check the vm exitcode, and exit the process if it's out-of-bounds
or there is no registered handler.

Submitted by:	Bela Lubkin   bela dot lubkin at tidalscale dot com
2013-07-18 18:40:54 +00:00
grehan
a6cf66c6cf Major rework of the virtio code. Split out common parts, and modify
the net/block devices accordingly.

Submitted by:	Chris Torek   torek at torek dot net
Reviewed by:	grehan
2013-07-17 23:37:33 +00:00
grehan
db9a28132c Implement RTC CMOS nvram. Init some fields that are used
by FreeBSD and UEFI.
Tested with nvram(4).

Reviewed by:	neel
2013-07-11 03:54:35 +00:00
grehan
4afc69bc76 Support an optional "mac=" parameter to virtio-net config, to allow
users to set the MAC address for a device.

Clean up some obsolete code in pci_virtio_net.c

Allow an error return from a PCI device emulation's init routine
to be propagated all the way back to the top-level and result in
the process exiting.

Submitted by:	Dinakar Medavaram    dinnu sun at gmail (original version)
2013-07-04 05:35:56 +00:00
grehan
dcadb4f390 Fix up option parsing to allow a colon in the config section.
Clean up some other unnecessary code.

Submitted by:	Dinakar Medavaram    dinnu sun at gmail
Reviewed by:	neel
2013-07-01 23:53:22 +00:00
grehan
be00d9ee47 Allow 8259 registers to be read. This is a transient condition
during Linux boot.

Submitted by:	tycho nightingale at pluribusnetworks com
Reviewed by:	neel
2013-06-28 06:25:04 +00:00
grehan
535e6386b1 Allow the PCI config address register to be read. The Linux
kernel does this. Also remove an unused header file.

Submitted by:	tycho nightingale at pluribusnetworks com
Reviewed by:	neel
2013-06-28 05:01:25 +00:00
neel
2ae63ff174 Implement the NOTIFY_ON_EMPTY capability in the virtio-net device.
If this capability is negotiated by the guest then the device will
generate an interrupt when it runs out of available tx/rx descriptors.

Reviewed by:	grehan
Obtained from:	NetApp
2013-05-03 01:16:18 +00:00
neel
7f49c6dcce Reset some more softc state when the guest resets the virtio network device.
Obtained from:	NetApp
2013-04-30 01:14:54 +00:00
neel
773a4e04f4 Use a separate mutex for the receive path instead of overloading the softc
mutex for this purpose.

Reviewed by:	grehan
2013-04-30 00:36:16 +00:00
neel
7b0846b1c8 Get rid of the 'vsc_rxpend' state - it doesn't serve any purpose because we
drop any frames that arrive while the device is starved for receive buffers.

This makes the receive path to only execute in context of the receive thread
and allows for further simplification.

Reviewed by:	grehan
2013-04-28 01:02:59 +00:00
grehan
771ee43899 Use a thread for the processing of virtio tx descriptors rather
than blocking the vCPU thread. This improves bulk data performance
by ~30-40% and doesn't harm req/resp time for stock netperf runs.

Future work will use a thread pool rather than a thread per tx queue.

Submitted by:	Dinakar Medavaram
Reviewed by:	neel, grehan
Obtained from:	NetApp
2013-04-26 05:13:48 +00:00
neel
e682d80073 Gripe if some <slot,function> tuple is specified more than once instead of
silently overwriting the previous assignment.

Gripe if the emulation is not recognized instead of silently ignoring the
emulated device.

If an error is detected by pci_parse_slot() then exit from the command line
parsing loop in main().

Submitted by (initial version):	Chris Torek (chris.torek@gmail.com)
2013-04-26 02:24:50 +00:00
neel
826be154b0 Teach the virtio block device to deal with direct as well as indirect
descriptors. Prior to this change the device would only work with guests
that chose to use indirect descriptors.

Modify the device reset callback to actually reset the device state.

Submitted by:	Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
2013-04-23 16:40:39 +00:00
neel
3ab51705b7 Setup accesses to the memory hole below 4GB to return all 1's on read and
consume all writes without any side effects.

Obtained from:	NetApp
2013-04-17 02:03:12 +00:00
neel
48f321f98e Need to call init_mem() to really initialize the MMIO range lookups.
This was working by accident because:
- the RB_HEADs were being initialized to zero as part of BSS
- the pthread_rwlock functions were implicitly initializing the lock object

Obtained from:	NetApp
2013-04-10 18:59:20 +00:00
neel
6ca7b417ee Remove obsolete comment about lack of locking for MMIO range lookup.
Pointed out by:	Tycho Nightingale (tycho.nightingale@plurisbusnetworks.com)
2013-04-10 18:53:14 +00:00
neel
3bab173b64 Unsynchronized TSCs on the host require special handling in bhyve:
- use clock_gettime(2) as the time base for the emulated ACPI timer instead
  of directly using rdtsc().

- don't advertise the invariant TSC capability to the guest to discourage it
  from using the TSC as its time base.

Discussed with:	jhb@ (about making 'smp_tsc' a global)
Reported by:	Dan Mack on freebsd-virtualization@
Obtained from:	NetApp
2013-04-10 05:59:07 +00:00
neel
fc5a92987d Change name of variable from 'rwlock' to more descriptive 'mmio_rwlock'
Requested by:	grehan
Obtained from:	NetApp
2013-04-10 02:18:17 +00:00
neel
c350fded20 Improve PCI BAR emulation:
- Respect the MEMEN and PORTEN bits in the command register
- Allow the guest to reprogram the address decoded by the BAR

Submitted by:	Gopakumar T
Obtained from:	NetApp
2013-04-10 02:12:39 +00:00
grehan
f220c315bb Remove dangling ISA uart stubs.
Obtained from:	NetApp
2013-04-05 22:19:02 +00:00
grehan
9b0ba3c866 config checksum is over the entire fixed portion, not just the
config header. FreeBSD doesn't check this but other o/s's do.

Obtained from:	NetApp
2013-04-05 22:14:07 +00:00
neel
8d05d984e8 Simplify the assignment of memory to virtual machines by requiring a single
command line option "-m <memsize in MB>" to specify the memory size.

Prior to this change the user needed to explicitly specify the amount of
memory allocated below 4G (-m <lowmem>) and the amount above 4G (-M <highmem>).

The "-M" option is no longer supported by 'bhyveload' and 'bhyve'.

The start of the PCI hole is fixed at 3GB and cannot be directly changed
using command line options. However it is still possible to change this in
special circumstances via the 'vm_set_lowmem_limit()' API provided by
libvmmapi.

Submitted by:	Dinakar Medavaram (initial version)
Reviewed by:	grehan
Obtained from:	NetApp
2013-03-18 22:38:30 +00:00
neel
6a88d4ed82 Change the type of 'ndesc' from 'int' to 'uint16_t' so that descriptor index
wraparound is handled correctly.

The gory details are available here:
http://lists.freebsd.org/pipermail/freebsd-virtualization/2013-March/001119.html

This fixes a regression introduced in r247871.

Pointed out by:	Bruce Evans, Chris Torek
2013-03-16 05:40:29 +00:00
neel
c92442accd Convert the offset into the bar that contains the MSI-X table to an offset
into the MSI-X table before using it to calculate the table index.

In the common case where the MSI-X table is located at the begining of the
BAR these two offsets are identical and thus the code was working by accident.

This change will fix the case where the MSI-X table is located in the middle
or at the end of the BAR that contains it.

Obtained from:	NetApp
2013-03-11 17:36:37 +00:00
grehan
2527cbb869 Simplify virtio ring num-available calculation.
Submitted by:	Chris Torek, torek at torek dot net
2013-03-06 07:28:20 +00:00
grehan
f5b9af8949 Reorder code to avoid the stat buffer being used uninitialized.
Obtained from:	NetApp
2013-03-06 06:24:09 +00:00
neel
9c2aecf6da Specify the length of the mapping requested from 'paddr_guest2host()'.
This seems prudent to do in its own right but it also opens up the possibility
of not having to mmap the entire guest address space in the 'bhyve' process
context.

Discussed with:	grehan
Obtained from:	NetApp
2013-03-01 02:26:28 +00:00
neel
97e63956d8 Ignore the BARRIER flag in the virtio block header.
This capability is not advertised by the host so ignore it even if the guest
insists on setting the flag.

Reviewed by:	grehan
Obtained from:	NetApp
2013-02-26 20:02:17 +00:00
neel
7025a86fb3 Get rid of unused struct member.
Pointed out by:	Gopakumar T
Obtained from:	NetApp
2013-02-25 20:31:47 +00:00
grehan
8da5ddc4a5 Add the ability to have a 'fallback' search for memory ranges.
These set of ranges will be looked at if a standard memory
range isn't found, and won't be installed in the cache.
Use this to implement the memory behaviour of the PCI hole on
x86 systems, where writes are ignored and reads always return -1.
This allows breakpoints to be set when issuing a 'boot -d', which
has the side effect of accessing the PCI hole when changing the
PTE protection on kernel code, since the pmap layer hasn't been
initialized (a bug, but present in existing FreeBSD releases so
has to be handled).

Reviewed by:	neel
Obtained from:	NetApp
2013-02-22 00:46:32 +00:00
neel
d8ccba8d32 Advertise PCI-E capability in the hostbridge device presented to the guest.
FreeBSD wants to see this capability in at least one device in the PCI
hierarchy before it allows use of MSI or MSI-X.

Obtained from:	NetApp
2013-02-15 18:41:36 +00:00
neel
3a9eeaa765 Implement guest vcpu pinning using 'pthread_setaffinity_np(3)'.
Prior to this change pinning was implemented via an ioctl (VM_SET_PINNING)
that called 'sched_bind()' on behalf of the user thread.

The ULE implementation of 'sched_bind()' bumps up 'td_pinned' which in turn
runs afoul of the assertion '(td_pinned == 0)' in userret().

Using the cpuset affinity to implement pinning of the vcpu threads works with
both 4BSD and ULE schedulers and has the happy side-effect of getting rid
of a bunch of code in vmm.ko.

Discussed with:	grehan
2013-02-11 20:36:07 +00:00
jhb
b313f550e1 Install <dev/agp/agpreg.h> and <dev/pci/pcireg.h> as userland headers
in /usr/include.

MFC after:	2 weeks
2013-02-05 18:55:09 +00:00
neel
da1ce8990c Add support for MSI-X interrupts in the virtio block device and make that
the default.

The current behavior of advertising a single MSI vector can be requested by
setting the environment variable "BHYVE_USE_MSI" to "yes". The use of MSI
is not compliant with the virtio specification and will be eventually phased
out.

Submitted by:	Gopakumar T
Obtained from:	NetApp
2013-02-01 16:58:59 +00:00
neel
81de6f5cc4 Fix a broken assumption in the passthru implementation that the MSI-X table
can only be located at the beginning or the end of the BAR.

If the MSI-table is located in the middle of a BAR then we will split the
BAR into two and create two mappings - one before the table and one after
the table - leaving a hole in place of the table so accesses to it can be
trapped and emulated.

Obtained from:	NetApp
2013-02-01 03:49:09 +00:00
neel
803db8c37c Fix a bug in the passthru implementation where it would assume that all
devices are MSI-X capable. This in turn would lead it to treat bar 0 as
the MSI-X table bar even if the underlying device did not support MSI-X.

Fix this by providing an API to query the MSI-X table index of the emulated
device. If the underlying device does not support MSI-X then this API will
return -1.

Obtained from:	NetApp
2013-02-01 02:41:47 +00:00
neel
6b8dd85cc6 Add support for MSI-X interrupts in the virtio network device and make that
the default.

The current behavior of advertising a single MSI vector can be requested by
setting the environment variable "BHYVE_USE_MSI" to "true". The use of MSI
is not compliant with the virtio specification and will be eventually phased
out.

Submitted by:	Gopakumar T
Obtained from:	NetApp
2013-01-30 04:30:36 +00:00
grehan
6112ba9b18 Improve correctness of rtc register implementation.
Submitted by:	tycho nightingale at pluribusnetworks com
2013-01-25 22:43:20 +00:00
neel
069e512501 Use the correct type (uint64_t) to retrieve sysctl machdep.tsc_freq.
Simplify the function a bit by falling through after initialization and
return via the normal code path.

Reviewed by:	grehan
Obtained from:	NetApp
2013-01-25 06:27:03 +00:00
neel
c6282fca6a Allocate the memory for the MSI-X table dynamically instead of allocating 32KB
statically. In most cases the number of table entries will be far less than
the maximum of 2048 allowed by the PCI specification.

Reuse macros from pcireg.h to interpret the MSI-X capability instead of rolling
our own.

Obtained from:	NetApp
2013-01-21 22:07:05 +00:00
neel
b3baed220e Get rid of redundant 'table_size' field in struct pi_msix. If needed it can
always be calculated from the number of entries in the MSI-X table.

Obtained from:	NetApp
2013-01-21 08:12:59 +00:00
neel
0e4197fb05 Use <vmname> in a consistent manner in usage messages output by 'bhyve',
'bhyveload' and 'bhyvectl'.

Pointed out by:	joel@
2013-01-20 03:47:13 +00:00
grehan
6b4735f33a Don't completely drain the read file descriptor. Instead, only
fill up to the uart's rx fifo size, and leave any remaining input
for when the rx fifo is read. This allows cut'n'paste of long lines
to be done into the bhyve console without truncation.

Also, introduce a mutex since the file input will run in the mevent
thread context and may corrupt state accessed by a vCPU thread.

Reviewed by:	neel
Approved by:	NetApp
2013-01-07 07:33:48 +00:00
grehan
540789547f Use 64-bit arithmetic throughout, and lock accesses to globals.
With this change, dbench with >= 4 processes runs without getting
weird jumps forward in time when the APCI pmtimer is the default
timecounter.

Obtained from:	NetApp
2013-01-07 04:51:43 +00:00
neel
01173b0b4a The "unrestricted guest" capability is a feature of Intel VT-x that allows
the guest to execute real or unpaged protected mode code - bhyve relies on
this feature to execute the AP bootstrap code.

Get rid of the hack that allowed bhyve to support SMP guests on processors
that do not have the "unrestricted guest" capability. This hack was entirely
FreeBSD-specific and would not work with any other guest OS.

Instead, limit the number of vcpus to 1 when executing on processors without
"unrestricted guest" capability.

Suggested by:	grehan
Obtained from:	NetApp
2013-01-04 02:04:41 +00:00
grehan
5bbdfeda58 Change thread name for the main kqueue event loop to "<vmname> mevent" so
it can be easily distinguished from other non-vCPU threads in forthcoming
changes.

Obtained from:	NetApp
2012-12-20 23:01:53 +00:00
grehan
70b50f1646 Rename fbsdrun.* -> bhyverun.*
bhyve is intended to be a generic hypervisor, and not FreeBSD-specific.

(renaming internal routines will come later)

Reviewed by:	neel
Obtained from:	NetApp
2012-12-13 01:58:11 +00:00
grehan
56990e9146 Properly reset the tx/rx rings when a guest requests a device reset.
Obtained from:	NetApp
2012-12-12 19:45:36 +00:00
grehan
da36959339 Create unique MAC addresses for virtio devices that are
created with non-zero PCI function numbers.

Remove obsolete reference to CFE.

Obtained from:	NetApp
2012-12-12 19:25:48 +00:00