Commit Graph

6640 Commits

Author SHA1 Message Date
kib
1ab68ccb64 The ia32_get_mcontext() does not need to set PCB_FULL_IRET. The
usermode context state is not changed by the get operation, and
get_mcontext() does not require full iret as well.

Tested by:	dim, pgj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-27 18:31:15 +00:00
kib
b48b0e8ec0 When reporting the fault details, also print %rsp.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-27 18:29:20 +00:00
kib
da32b7c314 When handling an exception from the attempt from loading the faulting
context on return from the trap handler, re-enable the interrupts on
i386 and amd64.  The trap return path have to disable interrupts since
the sequence of loading the machine state is not atomic.  The trap()
function which transfers the control to the special handler would
enable the interrupt, but an iret loads the previous eflags with PSL_I
clear.  Then, the special handler calls trap() on its own, which now
sees the original eflags with PSL_I set and does not enable
interrupts.

The end result is that signal delivery and process exiting code could
be executed with interrupts disabled, which is generally wrong and
triggers several assertions.

For amd64, the interrupts are enabled conditionally based on PSL_I in
the eflags of the outer frame, as it is already done for
doreti_iret_fault.  For i386, the interrupts are enabled
unconditionally, the ast loop could have opened a window with
interrupts enabled just before the iret anyway.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-27 18:26:08 +00:00
achim
10ab667100 Driver 'aacraid' added. Supports Adaptec by PMC RAID controller families Series 6, 7, 8 and upcoming products. Older Adaptec RAID controller families are supported by the 'aac' driver.
Approved by:	scottl (mentor)
2013-05-24 09:22:43 +00:00
attilio
fdf82ef9cf o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
  to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
  operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
  mostl of the times only with readlocks.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-05-21 20:38:19 +00:00
kib
b2b2f0b782 Fix the hardware watchpoints on SMP amd64. Load the updated %dr
registers also on other CPUs, besides the CPU which happens to execute
the ddb.  The debugging registers are stored in the pcpu area,
together with the command which is executed by the IPI stop handler
upon resume.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-21 11:24:32 +00:00
kib
5ee265a48b Add amd64-specific ddb command 'show phys2dmap', which calculates the
address in the direct map corresponding to the given physical address.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-21 11:07:12 +00:00
marcel
65b2bbd1ff Add basic support for FDT to i386 & amd64. This change includes:
1.  Common headers for fdt.h and ofw_machdep.h under x86/include
    with indirections under i386/include and amd64/include.
2.  New modinfo for loader provided FDT blob.
3.  Common x86_init_fdt() called from hammer_time() on amd64 and
    init386() on i386.
4.  Split-off FDT specific low-level console functions from FDT
    bus methods for the uart(4) driver. The low-level console
    logic has been moved to uart_cpu_fdt.c and is used for arm,
    mips & powerpc only. The FDT bus methods are shared across
    all architectures.
5.  Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
    fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from:	Juniper Networks, Inc.
2013-05-21 03:05:49 +00:00
ed
f489ad2fec Improve readability of static assertions for OFFSET_* macros.
Instead of doing all sorts of weird casting of constants to
pointer-pointers, simply use the standard C offsetof() macro to obtain
the offset of the respective fields in the structures.
2013-05-13 21:47:17 +00:00
peter
cec511ab65 Tidy up some CVS workarounds. 2013-05-12 01:53:47 +00:00
rpaulo
73e04ffd11 Fix several standard extended feature bits.
Submitted by:	Oliver Pinter <oliver.pntr at gmail.com>
2013-05-11 01:31:51 +00:00
neel
c5e619b651 Support array-type of stats in bhyve.
An array-type stat in vmm.ko is defined as follows:
VMM_STAT_ARRAY(IPIS_SENT, VM_MAXCPU, "ipis sent to vcpu");

It is incremented as follows:
vmm_stat_array_incr(vm, vcpuid, IPIS_SENT, array_index, 1);

And output of 'bhyvectl --get-stats' looks like:
ipis sent to vcpu[0]     3114
ipis sent to vcpu[1]     0

Reviewed by:	grehan
Obtained from:	NetApp
2013-05-10 02:59:49 +00:00
dchagin
b6dc2308e1 Retire write-only PCB_GS32BIT pcb flag on amd64. 2013-05-09 21:42:43 +00:00
kib
e51caee784 Correct the type for the literal used on the left side of the shift up
to 63 bit positions.

Do not fill the save area and do not set the saved bit in the xstate
bit vector for the state which is not marked as enabled in xsave_mask.

Reported and tested by:	Jim Ohlstein <jim@ohlste.in>
MFC after:	3 days
2013-05-09 17:25:29 +00:00
attilio
b24a52ec9e Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept.  The change should also be useful
for consolidation and consistency.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	alc
2013-05-07 22:46:24 +00:00
emaste
607cfd414a Switch to standard copyright license text
The initial version of this came from Sandvine but had "PROVIDED BY NETAPP,
INC" in the copyright text, presuambly because the license block was copied
from another file.  Replace it with standard "AUTHOR AND CONTRIBUTORS" form.

Approvided by: grehan@
2013-05-02 12:35:15 +00:00
kib
19e5409088 Partially saved extended state must be handled always, i.e. for both
fpu-owned context, and for pcb-saved one.  More, the XSAVE could do
partial save, same as XSAVEOPT, so qualifier for the handler should be
use_xsave and not use_xsaveopt.

Since xsave_area_desc is now needed regardless of the XSAVEOPT use,
remove the write-only use_xsaveopt variable.

In collaboration with:	jhb
MFC after:	1 week
2013-05-01 20:08:33 +00:00
kib
f2ccbf32fb The check to ensure that xstate_bv always has XFEATURE_ENABLED_X87 and
XFEATURE_ENABLED_SSE bits set is not needed.  CPU correctly handles
any bitmask which is subset of the enabled bits in %XCR0.

More, CPU instructions XSAVE and XSAVEOPT could write the mask without
e.g. XFEATURE_ENABLED_SSE, after the VZEROALL.  The check prevents the
restoration of the otherwise valid FPU save area.

In collaboration with:	jhb
MFC after:	1 week
2013-05-01 20:03:50 +00:00
carl
542feb6d72 Add a new driver to support the Intel Non-Transparent Bridge(NTB).
The NTB allows you to connect two systems with this device using a PCI-e
link. The driver is made of two modules:
 - ntb_hw which is a basic hardware abstraction layer for the device.
 - if_ntb which implements the ntb network device and the communication
   protocol.

The driver is limited at the moment to CPU memcpy instead of using DMA, and
only Back-to-Back mode is supported. Also the network device isn't full
featured yet. These changes will be coming soon. The DMA change will also
bring in the ioat driver from the project branch it is on now.

This is an initial port of the GPL/BSD Linux driver contributed by Jon Mason
from Intel. Any bugs are my contributions.

Sponsored by: Intel
Reviewed by: jimharris, joel (man page only)
Approved by: jimharris (mentor)
2013-04-29 22:48:53 +00:00
grehan
d46f135643 Add RIP-relative addressing to the instruction decoder.
Rework the guest register fetch code to allow the RIP to
be extracted from the VMCS while the kernel decoder is
functioning.

Hit by the OpenBSD local-apic code.

Submitted by:	neel
Reviewed by:	grehan
Obtained from:	NetApp
2013-04-25 04:56:43 +00:00
rpaulo
ba95edba6f Print RDSEED, ADX, and SMAP.
Pointed out by:	kib
2013-04-18 01:21:44 +00:00
gabor
5635c6550b - Correct spelling in comments
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:56:11 +00:00
rpaulo
1f4363857d Print more bits from the standard extended features CPUID which will be
available in the Haswell architecture (c.f. Intel Document #319433-012A).
2013-04-17 06:51:17 +00:00
neel
df29079506 Create sysctl node 'hw.vmm.vmx' and populate it with oids that expose the VMX
hardware capabilities.

Obtained from:	NetApp
2013-04-13 21:41:51 +00:00
kib
7117978a81 Fix the name of the pcb member in the comments.
Submitted by:	Oliver Pinter <oliver.pntr@gmail.com>
MFC after:	3 days
2013-04-13 15:20:33 +00:00
neel
f80ac5cabf Use the MAKEDEV_CHECKNAME flag to check for an invalid device name and return
an error instead of panicking.

Obtained from:	NetApp
2013-04-13 05:11:21 +00:00
trasz
80b8b2f779 Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE'
and kern.cam.ctl.disable tunable; those were introduced as a workaround
to make it possible to boot GENERIC on low memory machines.

With ctl(4) being built as a module and automatically loaded by ctladm(8),
this makes CTL work out of the box.

Reviewed by:	ken
Sponsored by:	FreeBSD Foundation
2013-04-12 16:25:03 +00:00
neel
cbd874121f If vmm.ko could not be initialized correctly then prevent the creation of
virtual machines subsequently.

Submitted by:	Chris Torek
2013-04-12 01:16:52 +00:00
neel
ea145a8678 Make the code to check if VMX is enabled more readable by using macros
instead of magic numbers.

Discussed with:	Chris Torek
2013-04-11 04:29:45 +00:00
neel
3bab173b64 Unsynchronized TSCs on the host require special handling in bhyve:
- use clock_gettime(2) as the time base for the emulated ACPI timer instead
  of directly using rdtsc().

- don't advertise the invariant TSC capability to the guest to discourage it
  from using the TSC as its time base.

Discussed with:	jhb@ (about making 'smp_tsc' a global)
Reported by:	Dan Mack on freebsd-virtualization@
Obtained from:	NetApp
2013-04-10 05:59:07 +00:00
glebius
9cf64d6c35 Merge from projects/counters: counter(9).
Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.

See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.

In collaboration with:	kib
Reviewed by:		luigi
Tested by:		ae, ray
Sponsored by:		Nginx, Inc.
2013-04-08 19:40:53 +00:00
glebius
8c6eba117e Merge from projects/counters:
Pad struct pcpu so that its size is denominator of PAGE_SIZE. This
is done to reduce memory waste in UMA_PCPU_ZONE zones.

Sponsored by:	Nginx, Inc.
2013-04-08 19:19:10 +00:00
grehan
25f8e746ab Don't panic when a valid divisor of 1 has been requested.
Obtained from:	NetApp
2013-04-05 22:16:31 +00:00
mav
7c2b81b0e9 Remove all legacy ATA code parts, not used since options ATA_CAM enabled in
most kernels before FreeBSD 9.0.  Remove such modules and respective kernel
options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam.  Remove the
atacontrol utility and some man pages.  Remove useless now options ATA_CAM.

No objections:	current@, stable@
MFC after:	never
2013-04-04 07:12:24 +00:00
neel
39fb54304e Add counter to keep track of the number of timer interrupts generated by
the local apic for each virtual cpu.
2013-03-31 03:56:48 +00:00
neel
9282fffb30 Add some more stats to keep track of all the reasons that a vcpu is exiting. 2013-03-30 17:46:03 +00:00
neel
29b9bf0372 Allow caller to skip 'guest linear address' validation when doing instruction
decode. This is to accomodate hardware assist implementations that do not
provide the 'guest linear address' as part of nested page fault collateral.

Submitted by:	Anish Gupta (akgupt3 at gmail dot com)
2013-03-28 21:26:19 +00:00
kib
7c26a038f9 Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA.  The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer.  For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer.  The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings.  Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags.  Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by:	The FreeBSD Foundation
Discussed with:	jeff (previous version)
Tested by:	pho, scottl (previous version), jhb, bf
MFC after:	2 weeks
2013-03-19 14:13:12 +00:00
attilio
d500d6361a MFC 2013-03-17 23:39:52 +00:00
alc
b346e448af Simplify the interface to vm_radix_insert() by eliminating the parameter
"index".  The content of a radix tree leaf, or at least its "key", is not
opaque to the other radix tree operations.  Specifically, they know how to
extract the "key" from a leaf.  So, eliminating the parameter "index" isn't
breaking the abstraction.  Moreover, eliminating the parameter "index"
effectively prevents the caller from passing an inconsistent "index" and
leaf to vm_radix_insert().

Reviewed by:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-03-17 16:06:03 +00:00
neel
4bbf2e81a6 Fix the '-Wtautological-compare' warning emitted by clang for comparing the
unsigned enum type with a negative value.

Obtained from:	NetApp
2013-03-16 22:53:05 +00:00
neel
8089ec6659 Allow vmm stats to be specific to the underlying hardware assist technology.
This can be done by using the new macros VMM_STAT_INTEL() and VMM_STAT_AMD().
Statistic counters that are common across the two are defined using VMM_STAT().

Suggested by:	Anish Gupta
Discussed with:	grehan
Obtained from:	NetApp
2013-03-16 22:40:20 +00:00
kib
63efc821c3 Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination.  Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations.  For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used.  The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after:	2 weeks
2013-03-14 20:18:12 +00:00
alc
7c12a136b6 When a superpage promotion occurs, the page table page that the superpage
mapping replaces is added to an ordered collection of page table pages.
Rather than preserving the code that implements the splay tree of pages
in the pmap for just this one purpose, use the new MI radix tree.  The
extra overhead of using a radix tree for this purpose is small enough,
about 4% added run-time to pmap_promote_pde(), that I don't see the point
of preserving the splay tree code.
2013-03-12 02:12:47 +00:00
attilio
32a3275e77 MFC 2013-03-11 10:49:02 +00:00
alc
762178138a The kernel pmap is statically allocated, so there is really no need to
explicitly initialize its pm_root field to zero.

Sponsored by:	EMC / Isilon Storage Division
2013-03-10 21:07:44 +00:00
attilio
16a80466e5 MFC 2013-03-09 02:51:51 +00:00
attilio
bf1dc90446 MFC 2013-03-08 00:03:07 +00:00
attilio
640e058da3 MFC 2013-03-07 23:43:03 +00:00
bryanv
840f21fc49 Remove the virtio dependency entry for the VirtIO device drivers. This
will prevent the kernel from linking if the device driver are included
without the virtio module. Remove pci and scbus for the same reason.

Also explain the relationship and necessity of the virtio and virtio_pci
modules. Currently in FreeBSD, we only support VirtIO PCI, but it could
be replaced with a different interface (like MMIO) and the device
(network, block, etc) will still function.

Requested by:	luigi
Approved by:	grehan (mentor)
MFC after:	3 days
2013-03-06 07:17:53 +00:00