Commit Graph

322 Commits

Author SHA1 Message Date
neel
79b96fdbcb MFC r282209:
Emulate the 'bit test' instruction.

MFC r282259:
Re-implement RTC current time calculation to eliminate the possibility of
losing time.

MFC r282281:
Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.

MFC r282284:
When an instruction cannot be decoded just return to userspace so bhyve(8)
can dump the instruction bytes.

MFC r282287:
Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.

MFC r282296:
Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are
enabled.

MFC r282301:
Relax limits when transitioning a vector from the IRR to the ISR and also
when extinguishing it from the ISR in response to an EOI.

MFC r282335:
Advertise an additional memory BAR in the "dummy" device emulation.

MFC r282336:
Emulate machine check related MSRs to allow guest OSes like Windows to boot.

MFC r282351:
Don't advertise the Intel SMX capability to the guest.

MFC r282407:
Emulate the 'CMP r/m8, imm8' instruction.

MFC r282519:
Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE.

MFC r282520:
Emulate guest writes to EFER_MSR properly.

MFC r282558:
Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().

MFC r282571:
Check 'td_owepreempt' and yield the vcpu thread if it is set.

MFC r282595:
Allow byte reads of AHCI registers.

MFC r282784:
Handling indirect descriptors is a capability of the host and not one that
needs to be negotiated. Use the host capabilities field and not the negotiated
field when verifying that indirect descriptors are supported.

MFC r282788:
Allow configuration of the sector size advertised to the guest.

MFC r282865:
Set the subvendor field in config space to the vendor ID. This is required
by the Windows virtio drivers to correctly match a device.

MFC r282922:
Bump the size of the blockif scatter-gather list to 67.

MFC r283075:
Fix off-by-one in array index bounds check. bhyveload would allow you to
create 33 entries on an array that only has 32 slots

MFC r283168:
Temporarily revert r282922 which bumped the max descriptors.

MFC r283255:
Emulate the "CMP r/m, reg" instruction (opcode 39H).

MFC r283256:
Add an option "--get-vmcs-exit-inst-length" to display the instruction length
of the instruction that caused the VM-exit.

MFC r283264:
Change the header type of the emulated host-bridge from type 1 to type 0.

MFC r283293:
Don't rely on the 'VM-exit instruction length' field in the VMCS to always
have an accurate length on an EPT violation.

MFC r283299:
Remove bogus verification of instruction length after instruction decode.

MFC r283308:
Exceptions don't deliver an error code in real mode.

MFC r283657:
Fix non-deterministic delays when accessing a vcpu that was in "running" or
"sleeping" state.

MFC r283973:
Use tunable 'hw.vmm.svm.features' to disable specific SVM features even
though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids'
to limit the number of ASIDs used by the hypervisor.

MFC r284046:
Fix regression in 'verify_gla()' with the RIP-relative addressing mode.

MFC r284174:
Support guest writes to the TSC by enabling the "use TSC offsetting"
execution control.
2015-06-28 03:22:26 +00:00
kib
f014bfc33c MFC r284104:
Updates from SDM rev. 55.
2015-06-13 07:31:50 +00:00
kib
8adebdc1fb MFC r283735:
Remove several write-only variables.
2015-06-05 08:36:25 +00:00
kib
c5849be284 MFC r283692:
Explicitely enable queued invalidation completion interrupt.
2015-06-05 08:23:33 +00:00
jhb
017f11d1f3 MFC 281887:
Reassign copyright statements on several files from Advanced
Computing Technologies LLC to Hudson River Trading LLC.
2015-06-02 19:20:39 +00:00
jhb
cb2edec922 MFC 281266:
Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h>
and export them to userland.
- Define __HAVE_REG32 on platforms that define a reg32 structure and check
  for this in <sys/procfs.h> to control when to export prstatus32, etc.
- Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures.
  libbfd looks for these types, and having them fixes 'gcore' in gdb of a
  32-bit process on a 64-bit platform.
- Use the structure definitions from <sys/procfs.h> in gcore's elf32 core
  dump code instead of duplicating the definitions.
2015-06-02 14:54:53 +00:00
hselasky
fd490e69db MFC r282120:
The add_bounce_page() function can be called when loading physical
pages which pass a NULL virtual address. If the BUS_DMA_KEEP_PG_OFFSET
flag is set, use the physical address to compute the page offset
instead. The physical address should always be valid when adding
bounce pages and should contain the same page offset like the virtual
address.

Submitted by:	Svatopluk Kraus <onwahe@gmail.com>
Reviewed by:	jhb@
2015-05-05 19:47:17 +00:00
kib
facaa68fb9 MFC r281495:
Add config option PAE_TABLES for the i386 kernel.  It switches pmap to
use PAE format for the page tables, but does not incur other
consequences of the full PAE config.  In particular, vm_paddr_t and
bus_addr_t are left 32bit, and max supported memory is still limited
by 4GB.

The option allows to have nx permissions for memory mappings on i386
kernel, while keeping the usual i386 KBI and avoiding the kernel data
sizing problems typical for the PAE config.
2015-04-27 08:02:12 +00:00
jkim
c90c234f29 MFC: r281396, r281475
Merge ACPICA 20150410.

Relnotes:	yes
2015-04-18 08:01:12 +00:00
jhb
5967aacd0b MFC 278325,280866:
Revert the IPI startup sequence to match what is described in the
Intel Multiprocessor Specification v1.4.  The Intel SDM claims that

278325:
Revert the IPI startup sequence to match what is described in the
Intel Multiprocessor Specification v1.4.  The Intel SDM claims that
the INIT IPIs here are invalid, but other systems follow the MP
spec instead.

While here, fix the IPI wait routine to accept a timeout in microseconds
instead of a raw spin count, and don't spin forever during AP startup.
Instead, panic if a STARTUP IPI is not delivered after 20 us.

280866:
Wait 100 microseconds for a local APIC to dispatch each startup-related IPI
rather than 20.  The MP 1.4 specification states in Appendix B.2:

  "A period of 20 microseconds should be sufficient for IPI dispatch to
   complete under normal operating conditions".

(Note that this appears to be separate from the 10 millisecond (INIT) and
200 microsecond (STARTUP) waits after the IPIs are dispatched.)  The
Intel SDM is silent on this issue as far as I can tell.

At least some hardware requires 60 microseconds as noted in the PR, so
bump this to 100 to be on the safe side.

PR:		196542, 197756
2015-04-15 16:52:34 +00:00
kib
57545f1ca4 MFC r281254:
Account for the offset of the page run when allocating the
dmar_map_entry.
2015-04-15 06:56:51 +00:00
jhb
b09b758bf2 MFC 276724:
On some Intel CPUs with a P-state but not C-state invariant TSC the TSC
may also halt in C2 and not just C3 (it seems that in some cases the BIOS
advertises its C3 state as a C2 state in _CST).  Just play it safe and
disable both C2 and C3 states if a user forces the use of the TSC as the
timecounter on such CPUs.

PR:		192316
2015-04-02 01:02:42 +00:00
jhb
5fdf8ec777 MFC 261790:
Add support for managing PCI bus numbers.  As with BARs and PCI-PCI bridge
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
  PCI domain/segment.  Host-PCI bridge drivers use this API to allocate
  bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
  their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
  full range of bus numbers from secbus to subbus from their parent bridge.
  The drivers also always program their primary bus register.  The bridge
  drivers also support growing their bus range by extending the bus resource
  and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
  used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.

PR:		197076
2015-04-01 21:48:54 +00:00
jhb
971b9a0eeb MFC 260973:
- Reuse legacy_pcib_(read|write)_config() methods in the QPI pcib driver.
- Reuse legacy_pcib_alloc_msi{,x}() methods in the QPI and mptable pcib
  drivers.
2015-04-01 21:16:33 +00:00
kib
cff2ee95c4 MFC r280435:
When mapping an allocated entry, use the entry size, instead of the
requested size.  If tag restrictions caused split entry, its size is
less then requsted.
2015-03-31 00:57:25 +00:00
kib
2f3af4a1a4 MFC r280434:
Assert that the mapping loop makes progress.
2015-03-31 00:55:12 +00:00
kib
6414d2bba4 MFC r280254:
Provide definitions for all descriptors types in the DMAR invalidation
queue.
2015-03-26 10:44:16 +00:00
kib
81e69ec09e MFC r280196:
Recheck that boundary is not crossed after the move to satisfy boundary
restriction.
2015-03-24 08:21:36 +00:00
kib
3fef6ad686 MFC r280195:
When inserting new entry into the address map, ensure that not only
next entry does not intersect with the tail of the new entry, but also
that previous entry is also before new entry start.
2015-03-24 08:18:24 +00:00
kib
5186484c7d MFC r280253:
Fix syntax error.
2015-03-22 09:12:44 +00:00
scottl
565f091b88 MFC r271889, 272799, 272800, 274976
This brings in bus_get_domain() and the related reporting via devinfo,
dmesg, and sysctl.

Obtained from:	adrian, jhb
Sponsored by:	Netflix, Inc.
2015-03-12 07:07:41 +00:00
kib
76ac5da69f MFC r276949:
(only to ease merging of r279117).

MFC r279117:
Revert r276949 and redo the fix for PCIe/PCI bridges, which do not
follow specification and do not provide PCIe capability.
2015-03-01 10:39:19 +00:00
kib
861937e4a4 MFC r276948:
Print rid when announcing DMAR context creation.  Print sid when fault
occurs.
2015-03-01 10:35:54 +00:00
kib
28c8a09abb MFC r276867:
Fix DMAR context allocations for the devices behind PCIe->PCI bridges
after dmar driver was converted to use rids.  The bus component to
calculate context page must be taken from the requestor rid, which is
a bridge, and not from the device bus number.
2015-03-01 10:29:48 +00:00
rstone
0b55a8c80a MFC r264007,r264008,r264009,r264011,r264012,r264013
MFC support for PCI Alternate RID Interpretation.  ARI is an optional PCIe
feature that allows PCI devices to present up to 256 functions on a bus.
This is effectively a prerequisite for PCI SR-IOV support.

r264007:
   Add a method to get the PCI RID for a device.

   Reviewed by:  kib
   MFC after:    2 months
   Sponsored by: Sandvine Inc.

r264008:
   Re-implement the DMAR I/O MMU code in terms of PCI RIDs

   Under the hood the VT-d spec is really implemented in terms of
   PCI RIDs instead of bus/slot/function, even though the spec makes
   pains to convert back to bus/slot/function in examples.  However
   working with bus/slot/function is not correct when PCI ARI is
   in use, so convert to using RIDs in most cases.  bus/slot/function
   will only be used when reporting errors to a user.

   Reviewed by:  kib
   MFC after:    2 months
   Sponsored by: Sandvine Inc.

r264009:
   Re-write bhyve's I/O MMU handling in terms of PCI RID.

   Reviewed by:  neel
   MFC after:    2 months
   Sponsored by: Sandvine Inc.

r264011:
   Add support for PCIe ARI

   PCIe Alternate RID Interpretation (ARI) is an optional feature that
   allows devices to have up to 256 different functions.  It is
   implemented by always setting the PCI slot number to 0 and
   re-purposing the 5 bits used to encode the slot number to instead
   contain the function number.  Combined with the original 3 bits
   allocated for the function number, this allows for 256 functions.

   This is enabled by default, but it's expected to be a no-op on currently
   supported hardware.  It's a prerequisite for supporting PCI SR-IOV, and
   I want the ARI support to go in early to help shake out any bugs in it.
   ARI can be disabled by setting the tunable hw.pci.enable_ari=0.

   Reviewed by:  kib
   MFC after:    2 months
   Sponsored by: Sandvine Inc.

r264012:
   Print status of ARI capability in pciconf -c

   Teach pciconf how to print out the status (enabled/disabled) of the ARI
   capability on PCI Root Complexes and Downstream Ports.

   MFC after:    2 months
   Sponsored by: Sandvine Inc.

r264013:
   Add missing copyright date.

   MFC after:    2 months
2015-03-01 04:22:06 +00:00
jhb
4ee9c49971 MFC 274817,274878,276801,276840,278976:
Improve support for XSAVE with debuggers.
- Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed
  to match what Linux does in that 1) it dumps the entire XSAVE area
  including the fxsave state, and 2) it stashes a copy of the current
  xsave mask in the unused padding between the fxsave state and the
  xstate header at the same location used by Linux.
- Teach readelf() to recognize NT_X86_XSTATE notes.
- Change PT_GET/SETXSTATE to take the entire XSAVE state instead of
  only the extra portion. This avoids having to always make two
  ptrace() calls to get or set the full XSAVE state.
- Add a PT_GET_XSTATE_INFO which returns the length of the current
  XSTATE save area (so the size of the buffer needed for PT_GETXSTATE)
  and the current XSAVE mask (%xcr0).
2015-02-23 18:38:41 +00:00
kib
6aee714821 MFC r278606:
Registers definitions for the new capabilities.
2015-02-18 08:06:36 +00:00
kib
f387fc2f7e MFC r278605:
vm_page_lookup() accepts read-locked object.
2015-02-18 08:04:03 +00:00
jhb
9809511c44 MFC 273800:
Rework virtual machine hypervisor detection.
- Move the existing code to x86/x86/identcpu.c since it is x86-specific.
- If the CPUID2_HV flag is set, assume a hypervisor is present and query
  the 0x40000000 leaf to determine the hypervisor vendor ID.  Export the
  vendor ID and the highest supported hypervisor CPUID leaf via
  hv_vendor[] and hv_high variables, respectively.  The hv_vendor[]
  array is also exported via the hw.hv_vendor sysctl.
- Merge the VMWare detection code from tsc.c into the new probe in
  identcpu.c.  Add a VM_GUEST_VMWARE to identify vmware and use that in
  the TSC code to identify VMWare.
2015-02-10 16:34:42 +00:00
jhb
4d3ec3464b MFC 272666: Fix build for i386 kernels with out 'I686_CPU'.
Reported by:	Mike Tancsa <mike@sentex.net>
2015-01-21 17:59:32 +00:00
kib
da566e85be MFC r277047:
For x86, read MAXPHYADDR into variable cpu_maxphyaddr.
2015-01-19 10:52:55 +00:00
kib
5846d19730 MFC r277023:
Avoid excessive flushing and do missed neccessary flushing in the IOMMU
page table update code.
2015-01-18 09:49:32 +00:00
neel
649535f73c MFC r273748
Output a summary of optional SVM features in dmesg similar to CPU features.
If bootverbose is enabled, a detailed list is provided; otherwise, a
single-line summary is displayed.

Requested by:	jhb
2014-12-31 22:15:28 +00:00
neel
3b591af2d9 MFC 261321
Rename the AMD MSR_PERFCTR[0-3] so the Pentium Pro MSR_PERFCTR[0-1] aren't
redefined.

MFC r273214
Fix build to not bogusly always rebuild vmm.ko.

MFC r273338
Add support for AMD's nested page tables in pmap.c:
- Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit)
  for a pmap of type PT_RVI.
- Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of
  type PT_EPT or PT_RVI.

Add CPU_SET_ATOMIC_ACQ(num, cpuset):
This is used when activating a vcpu in the nested pmap. Using the 'acquire'
variant guarantees that the load of the 'pm_eptgen' will happen only after
the vcpu is activated in 'pm_active'.

Add defines for various AMD-specific MSRs.

Discussed with:	kib (r261321)
2014-12-30 00:00:42 +00:00
neel
88c1adb417 MFC r270326
Fix a recursive lock acquisition in vi_reset_dev().

MFC r270434
Return the spurious interrupt vector (IRQ7 or IRQ15) if the atpic cannot find
any unmasked pin with an interrupt asserted.

MFC r270436
Fix a bug in the emulation of CPUID leaf 0x4.

MFC r270437
Add "hw.vmm.topology.threads_per_core" and "hw.vmm.topology.cores_per_package"
tunables to modify the default cpu topology advertised by bhyve.

MFC r270855
Set the 'inst_length' to '0' early on before any error conditions are detected
in the emulation of the task switch. If any exceptions are triggered then the
guest %rip should point to instruction that caused the task switch as opposed
to the one after it.

MFC r270857
The "SUB" instruction used in getcc() actually does 'x -= y' so use the
proper constraint for 'x'. The "+r" constraint indicates that 'x' is an
input and output register operand.

While here generate code for different variants of getcc() using a macro
GETCC(sz) where 'sz' indicates the operand size.

Update the status bits in %rflags when emulating AND and OR opcodes.

MFC r271439
Initialize 'bc_rdonly' to the right value.

MFC r271451
Optimize the common case of injecting an interrupt into a vcpu after a HLT
by explicitly moving it out of the interrupt shadow.

MFC r271888
Restructure the MSR handling so it is entirely handled by processor-specific
code.

MFC r271890
MSR_KGSBASE is no longer saved and restored from the guest MSR save area. This
behavior was changed in r271888 so update the comment block to reflect this.

MFC r271891
Add some more KTR events to help debugging.

MFC r272197
mmap(2) requires either MAP_PRIVATE or MAP_SHARED for non-anonymous mappings.

MFC r272395
Get rid of code that dealt with the hardware not being able to save/restore
the PAT MSR on guest exit/entry. This workaround was done for a beta release
of VMware Fusion 5 but is no longer needed in later versions.

All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT
in the VM exit and entry controls.

MFC r272670
Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'.

MFC r272710
Implement the FLUSH operation in the virtio-block emulation.

MFC r272838
iasl(8) expects integer fields in data tables to be specified as hexadecimal
values. Therefore the bit width of the "PM Timer Block" was actually being
interpreted as 50-bits instead of the expected 32-bit.

This eliminates an error message emitted by a Linux 3.17 guest during boot:
"Invalid length for FADT/PmTimerBlock: 50, using default 32"

MFC r272839
Support Intel-specific MSRs that are accessed when booting up a linux in bhyve:
 - MSR_PLATFORM_INFO
 - MSR_TURBO_RATIO_LIMITx
 - MSR_RAPL_POWER_UNIT

MFC r273108
Emulate "POP r/m". This is needed to boot OpenBSD/i386 MP kernel in bhyve.

MFC r273212
Support stopping and restarting the AHCI command list via toggling PxCMD.ST
from '1' to '0' and back.  This allows the driver a chance to recover if
for instance a timeout occurred due to activity on the host.
2014-12-28 21:27:13 +00:00
kib
d2f5d57eaa MFC r271208:
Add a define for index of IA32_XSS MSR.
2014-12-23 12:04:23 +00:00
kib
10220d0061 MFC r271206:
Adjust the definition of struct xstate_hdr according to SDM rev. 50.
2014-12-23 12:00:41 +00:00
kib
5a03dd693d MFC r271197:
Add more bits for the XSAVE features from CPUID 0xd, sub-function 1
%eax report.  Print the XSAVE features 0xd/1 in the boot banner.
2014-12-23 11:55:53 +00:00
jhb
5ae50f92a8 MFC 273988,273989,273995,274057:
MFamd64: Add support for extended FPU states on i386.  This includes
support for AVX on i386.
2014-12-22 21:32:39 +00:00
jhb
71f9e38fa2 MFC 271405,271408,271409,272658:
MFamd64: Use initializecpu() to set various model-specific registers on
AP startup and AP resume (it was already used for BSP startup and BSP
resume).
2014-12-22 19:53:55 +00:00
jhb
2b345a08ed MFC 260557,271076,271077,271082,271083,271098:
- Remove spaces from boot messages when we print the CPU ID/Family/Stepping
- Move prototypes for various functions into out of C files and into
  <machine/md_var.h>.
- Reduce diffs between i386 and amd64 initcpu.c and identcpu.c files.
- Move blacklists of broken TSCs out of the printcpuinfo() function
  and into the TSC probe routine.
- Merge the amd64 and i386 identcpu.c into a single x86 implementation.
2014-12-22 18:40:59 +00:00
hselasky
1f41d295fb MFC r263710, r273377, r273378, r273423 and r273455:
- De-vnet hash sizes and hash masks.
- Fix multiple issues related to arguments passed to SYSCTL macros.

Sponsored by:	Mellanox Technologies
2014-10-27 14:38:00 +00:00
jhb
09bff45095 MFC 270850,271053,271192,271717:
Save and restore FPU state across suspend and resume on i386.
- Create a separate structure for per-CPU state saved across suspend and
  resume that is a superset of a pcb.
- Store the FPU state for suspend and resume in the new structure
  (for amd64, this moves it out of the PCB)
- On both i386 and amd64, all of the FPU suspend/resume handling is now
  done in C.

Approved by:	re (hrs)
2014-09-22 20:34:36 +00:00
akiyama
bf3d63ee28 MFC r263859:
Change default logic to CONFORM because this routine is shared
  with SCI polarity setting.

  Reviewed by: jhb

MFC r269184:
  Add missing newline to output dmesg properly.
2014-08-31 10:42:52 +00:00
grehan
5d455a50f5 MFC r267921, r267934, r267949, r267959, r267966, r268202, r268276,
r268427, r268428, r268521, r268638,	r268639, r268701, r268777,
    r268889, r268922, r269008, r269042,	r269043, r269080, r269094,
    r269108, r269109, r269281, r269317,	r269700, r269896, r269962,
    r269989.

Catch bhyve up to CURRENT.

Lightly tested with FreeBSD i386/amd64,	Linux i386/amd64, and
OpenBSD/amd64. Still resolving an	issue with OpenBSD/i386.

Many thanks to jhb@ for	all the	hard work on the prior MFCs !

r267921 - support the "mov r/m8, imm8" instruction
r267934 - document options
r267949 - set DMI vers/date to fixed values
r267959 - doc: sort cmd flags
r267966 - EPT misconf post-mortem info
r268202 - use correct flag for event index
r268276 - 64-bit virtio capability api
r268427 - invalidate guest TLB when cr3 is updated, needed for TSS
r268428 - identify vcpu's operating mode
r268521 - use correct offset in guest logical-to-linear translation
r268638 - chs value
r268639 - chs fake values
r268701 - instr emul operand/address size override prefix support
r268777 - emulation for legacy x86 task switching
r268889 - nested exception support
r268922 - fix INVARIANTS build
r269008 - emulate instructions found in the OpenBSD/i386 5.5 kernel
r269042 - fix fault injection
r269043 - Reduce VMEXIT_RESTARTs in task_switch.c
r269080 - fix issues in PUSH emulation
r269094 - simplify return values from the inout handlers
r269108 - don't return -1 from the push emulation handler
r269109 - avoid permanent sleep in vm_handle_hlt()
r269281 - list VT-x features in base kernel dmesg
r269317 - Mark AHCI fatal errors as not completed
r269700 - Support PCI extended config space in bhyve
r269896 - Minor cleanup
r269962 - use max guest memory when creating IOMMU domain
r269989 - fix interrupt mode names
2014-08-19 01:20:24 +00:00
marius
25248ca3c7 MFC: r260457
The changes in r233781 attempted to make logging during a machine check
exception more readable.  In practice they prevented all logging during
a machine check exception on at least some systems.  Specifically, when
an uncorrected ECC error is detected in a DIMM on a Nehalem/Westmere
class machine, all CPUs receive a machine check exception, but only
CPUs on the same package as the memory controller for the erroring DIMM
log an error.  The CPUs on the other package would complete the scan of
their machine check banks and panic before the first set of CPUs could
log an error.  The end result was a clearer display during the panic
(no interleaved messages), but a crashdump without any useful info about
the error that occurred.

To handle this case, make all CPUs spin in the machine check handler
once they have completed their scan of their machine check banks until
at least one machine check error is logged.  I tried using a DELAY()
instead so that the CPUs would not potentially hang forever, but that
was not reliable in testing.

While here, don't clear MCIP from MSR_MCG_STATUS before invoking panic.
Only clear it if the machine check handler does not panic and returns
to the interrupted thread.

MFC: r263113

Correct type for malloc().

Submitted by:	"Conrad Meyer" <conrad.meyer@isilon.com>

MFC: r269052, r269239, r269242

Intel desktop Haswell CPUs may report benign corrected parity errors (see
HSD131 erratum in [1]) at a considerable rate. So filter these (default),
unless logging is enabled. Unfortunately, there really is no better way to
reasonably implement suppressing these errors than to just skipping them
in mca_log(). Given that they are reported for bank 0, they'd need to be
masked in MSR_MC0_CTL. However, P6 family processors require that register
to be set to either all 0s or all 1s, disabling way more than the one error
in question when using all 0s there. Alternatively, it could be masked for
the corresponding CMCI, but that still wouldn't keep the periodic scanner
from detecting these spurious errors. Apart from that, register contents of
MSR_MC0_CTL{,2} don't seem to be publicly documented, neither in the Intel
Architectures Developer's Manual nor in the Haswell datasheets.

Note that while HSD131 actually is only about C0-stepping as of revision
014 of the Intel desktop 4th generation processor family specification
update, these corrected errors also have been observed with D0-stepping
aka "Haswell Refresh".

1: http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf

Reviewed by:	jhb
Sponsored by:	Bally Wulff Games & Entertainment GmbH
2014-08-05 16:04:22 +00:00
scottl
d13d12a6a2 Merge r266746, 266775:
Now that there are separate back-end implementations of busdma, the bounce
 implementation shouldn't steal flags from the common front-end.
 Move those flags to the back-end.

 Eliminate the fake contig_dmamap and replace it with a new flag,
 BUS_DMA_KMEM_ALLOC.  They serve the same purpose, but using the flag
 means that the map can be NULL again, which in turn enables significant
 optimizations for the common case of no bouncing.

Obtained from:	Netflix, Inc.
2014-07-01 06:50:35 +00:00
rodrigc
8d5996c444 MFC r263795:
Strict value checking will cause problem.
Bay trail DN2820FYKH is supported on Linux but does not work on FreeBSD.
This behaviour is bug-compatible with Linux-3.13.5.

References:
http://d.hatena.ne.jp/syuu1228/20140326
http://lxr.linux.no/linux+v3.13.5/arch/x86/kernel/acpi/boot.c#L1094

Submitted by: syuu
PR: 187966
2014-06-23 22:37:49 +00:00
rodrigc
360ded2657 Undo bad merge. 2014-06-23 22:35:41 +00:00
rodrigc
95d8d7a8ec MFC r263795:
Strict value checking will cause problem.
Bay trail DN2820FYKH is supported on Linux but does not work on FreeBSD.
This behaviour is bug-compatible with Linux-3.13.5.

References:
http://d.hatena.ne.jp/syuu1228/20140326
http://lxr.linux.no/linux+v3.13.5/arch/x86/kernel/acpi/boot.c#L1094

Submitted by: syuu
PR: 187966
2014-06-23 22:31:28 +00:00