Commit Graph

826 Commits

Author SHA1 Message Date
Ryan Libby
b1a987bb34 __pcpu: gcc -Wredundant-decls
Pollution from counter.h made __pcpu visible in amd64/pmap.c.  Delete
the existing extern decl of __pcpu in amd64/pmap.c and avoid referring
to that symbol, instead accessing the pcpu region via PCPU_SET macros.
Also delete an unused extern decl of __pcpu from mp_x86.c.

Reviewed by:	kib
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11666
2017-07-21 17:11:36 +00:00
Ian Lepore
b524a31593 Protect access to the AT realtime clock with its own mutex.
The mutex protecting access to the registered realtime clock should not be
overloaded to protect access to the atrtc hardware, which might not even be
the registered rtc. More importantly, the resettodr mutex needs to be
eliminated to remove locking/sleeping restrictions on clock drivers, and
that can't happen if MD code for amd64 depends on it. This change moves the
protection into what's really being protected: access to the atrtc date and
time registers.

This change also adds protection when the clock is accessed from
xentimer_settime(), which bypasses the resettodr locking.

Differential Revision:	https://reviews.freebsd.org/D11483
2017-07-12 02:42:57 +00:00
Jason A. Harmening
eb36b1d0bc Clean up MD pollution of bus_dma.h:
--Remove special-case handling of sparc64 bus_dmamap* functions.
  Replace with a more generic mechanism that allows MD busdma
  implementations to generate inline mapping functions by
  defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>.  This
  is currently useful for sparc64, x86, and arm64, which all
  implement non-load dmamap operations as simple wrappers
  around map objects which may be bus- or device-specific.

--Remove NULL-checked bus_dmamap macros.  Implement the
  equivalent NULL checks in the inlined x86 implementation.
  For non-x86 platforms, these checks are a minor pessimization
  as those platforms do not currently allow NULL maps.  NULL
  maps were originally allowed on arm64, which appears to have
  been the motivation behind adding arm[64]-specific barriers
  to bus_dma.h, but that support was removed in r299463.

--Simplify the internal interface used by the bus_dmamap_load*
  variants and move it to bus_dma_internal.h

--Fix some drivers that directly include sys/bus_dma.h
  despite the recommendations of bus_dma(9)

Reviewed by:	kib (previous revision), marius
Differential Revision:	https://reviews.freebsd.org/D10729
2017-07-01 05:35:29 +00:00
Konstantin Belousov
cf619a92d2 Fix batched unload for DMAR busdma in qi mode.
Do not queue dmar_map_entries with zeroed gseq to
dmar_qi_invalidate_locked().  Zero gseq stops the processing in the qi
task.  Do not assign possibly uninitialized on-stack gseq to map
entries when requeuing them on unit tlb_flush queue.  Random garbage
in gsec is interpreted as too high invalidation sequence number and
again stop the processing in the task.

Make the sequence numbers generation completely contained in
dmar_qi_invalidate_locked() and dmar_qi_emit_wait_seq().  Upper code
directly passes boolean requesting emiting wait command instead of
trying to provide hint to avoid it by passing NULL gseq pointer.

Microoptimize the requeueing to tlb_flush queue by doing it for the
whole queue.

Diagnosed and tested by:	Brett Gutstein <bgutstein@rice.edu>
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-19 21:48:52 +00:00
John Baldwin
fecabb72e1 Don't try to assign interrupts to a CPU on single-CPU systems.
All interrupts are routed to the sole CPU in that case implicitly.
This is a regression in EARLY_AP_STARTUP.  Previously the 'assign_cpu'
variable was only set when a multi-CPU system finished booting, so
it's value both meant that interrupts could be assigned and that
there was more than one CPU.

PR:		219882
Reported by:	ota@j.email.ne.jp
MFC after:	3 days
2017-06-14 13:34:09 +00:00
Konstantin Belousov
fc8929cb29 More accurately handle early EFER restoration on resume.
Do not try to set LMA bit while CPU is still in legacy mode.
Apparently Intel CPUs ignore non-id writes to LMA, while AMD's
(over-)react with #GP.

Reported and tested by:	danfe
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-06-11 14:39:08 +00:00
Marcelo Araujo
e0a6a23c6d Allow sysctl kern.vm_guest to return bhyve when running under bhyve.
Submitted by:	Sean Fagan <sef@ixsystems.com>
Reviewed by:	grehan
MFH:		4 weeks.
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D11090
2017-06-08 04:02:14 +00:00
Andriy Gapon
cae91bbe96 fix indentation
MFC after:	4 days
2017-05-30 13:53:03 +00:00
John Baldwin
04fe6458d3 Remove constants and comments for unimplemented entries in the default LDT.
These entries will never be added to the default LDT in the future.
2017-05-24 18:54:21 +00:00
John Baldwin
9d24f98ca8 Remove the BSD/OS 2.1 system call gate LDT entry.
An extra copy of the system call gate was added to the default LDT back
in 1996 (r18513 / r18514).  However, the ability to run BSD/OS 2.1
i386 binaries under FreeBSD's native ABI is most likely no longer
needed.

Discussed with:	kib
2017-05-23 22:34:18 +00:00
Hans Petter Selasky
65b017b420 Avoid use of contiguous memory allocations in busdma when possible.
This patch improves the boundary checks in busdma to allow more cases
using the regular page based kernel memory allocator. Especially in
the case of having a non-zero boundary in the parent DMA tag. For
example AMD64 based platforms set the PCI DMA tag boundary to
PCI_DMA_BOUNDARY, 4GB, which before this patch caused contiguous
memory allocations to be preferred when allocating more than PAGE_SIZE
bytes. Even if the required alignment was less than PAGE_SIZE bytes.

This patch also fixes the nsegments check for using kmem_alloc_attr()
when the maximum segment size is less than PAGE_SIZE bytes.

Updated some comments describing the code in question.

Differential Revision:	https://reviews.freebsd.org/D10645
Reviewed by:		kib, jhb, gallatin, scottl
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-16 14:21:37 +00:00
Konstantin Belousov
bd101a6648 Ensure that resume path on amd64 only accesses page tables for normal
operation after processor is configured to allow all required
features.

In particular, NX must be enabled in EFER, otherwise load of page
table element with nx bit set causes reserved bit page fault.  Since
malloc uses direct mapping for small allocations, in particular for
the suspension pcbs, and DMAP is nx after r316767, this commit tripped
fault on resume path.

Restore complete state of EFER while wakeup code is still executing
with custom page table, before calling resumectx, instead of trying to
guess which features might be needed before resumectx restored EFER on
its own.

Bisected and tested by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-05-15 20:52:43 +00:00
Conrad Meyer
1c5df7bd01 x86 MCA: Fix a deadlock in MCA exception processing
In exceptional circumstances, an MCA exception will trigger when the
freelist is exhausted. In such a case, no error will be logged on the list
and 'mca_count' will not be incremented.

Prior to this patch, all CPUs that received the exception would spin
forever.

With this change, the CPU that detects the error but finds the freelist
empty will proceed to panic the machine, ending the deadlock.

A follow-up to r260457.

Reported by:	Ryan Libby <rlibby at gmail.com>
Reviewed by:	jhb@
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D10536
2017-04-28 18:25:10 +00:00
John Baldwin
80fe25f150 Remove the LSOL26CALLS_SEL constant.
It is no longer used after SVR4/i386 ABI support was removed.

Reported by:	kib
2017-04-25 23:19:27 +00:00
Gleb Smirnoff
83c9dea1ba - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place.  To do per-cpu stats, convert all fields that previously were
  maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
  before we have set up UMA and we can do counter_u64_alloc(), provide an
  early counter mechanism:
  o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
  o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
    so that at early stages of boot, before counters are allocated we already
    point to a counter that can be safely written to.
  o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
  to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by:	kib, gallatin, marius, lidl
Differential Revision:	https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
Gleb Smirnoff
9ed01c32e0 All these files need sys/vmmeter.h, but now they got it implicitly
included via sys/pcpu.h.
2017-04-17 17:07:00 +00:00
Konstantin Belousov
366bb48985 Correct calculation of the entry->free_down in the invariants-checking
code.

Reported by:	maxim
Found by:	PVS studio scan
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-04-14 15:16:41 +00:00
Patrick Kelsey
67d955aab4 Corrected misspelled versions of rendezvous.
The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().

Reviewed by:	gnn, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10313
2017-04-09 02:00:03 +00:00
Andriy Gapon
35d92e6cec use msr 0xc001100c to discover multi-node AMD processors
This is applicable only to the older processors that do not have the AMD
Topology extension.
Opteron 6100-series "Magny-Cours" processors had multiple nodes within a
package and didn't have the Topology extension.  Without this change
FreeBSD would assume that those processors have a single L3 cache shared
by all cores while, in fact, each node has its own L3 cache.

Many thanks to Freddie Cash <fjwcash@gmail.com> for providing valuable
hardware information.

MFC after:	2 weeks
2017-04-08 14:16:42 +00:00
Andriy Gapon
978f3da16f revert r315959 because it causes build problems
The change introduced a dependency between genassym.c and header files
generated from .m files, but that dependency is not specified in the
make files.

Also, the change could be not as useful as I thought it was.

Reported by:	dchagin, Manfred Antar <null@pozo.com>, and many others
2017-03-27 12:34:29 +00:00
Andriy Gapon
80b9074a51 update comment describing topo_probe_amd()
MFC after:	2 weeks
MFC with:	r316017
2017-03-27 11:04:57 +00:00
Andriy Gapon
d9da46bf1c add SMT detection for newer AMD processors
The change seems to be more in the nomenclature than in the way the
topology is advertised by the hardware.

Tested by:	truckman (earlier version of the change)
MFC after:	2 weeks
2017-03-27 09:45:27 +00:00
Konstantin Belousov
476358b30f Timeout DMAR commands.
Implement timeouts for register-based DMAR commands.  Tunable/sysctl
hw.dmar.timeout specifies the timeout in nanoseconds, set it to zero
to allow infinite wait.  Default is 1ms.

Runtime modification of the sysctl is not safe, it is allowed for
debugging.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-27 07:06:45 +00:00
Konstantin Belousov
70fd237c5d Provide less laborius way to enable busdma DMAR to only short list of devices.
Kernel environment variable hw.busdma.default can take values 'bounce'
and 'dmar' and selects corresponding busdma backend as default.
Per-device environment variable hw.busdma.pci<domain>.<bus>.<slot>.<func>
takes the same values and overrides hw.busdma.default for the given device.

Note that even with hw.busdma.default=bounce, DMA translation engines
are still started if DMARs are enabled, to disable them use
hw.dmar.dma tunable, as before.

Sponsored by:	The FreeBSD Foundation
MFC after: 1 week
2017-03-26 00:40:35 +00:00
Andriy Gapon
a7b4c009e1 specific end of interrupt implementation for AMD Local APIC
The change is more intrusive than I would like because the feature
requires that a vector number is written to a special register.
Thus, now the vector number has to be provided to lapic_eoi().
It was readily available in the IO-APIC and MSI cases, but the IPI
handlers required more work.
Also, we now store the VMM IPI number in a global variable, so that it
is available to the justreturn handler for the same reason.

Reviewed by:	kib
MFC after:	6 weeks
Differential Revision: https://reviews.freebsd.org/D9880
2017-03-25 18:45:09 +00:00
Konstantin Belousov
3d47c58b98 Avoid leaking allocated but unused context after creation race.
As noted in the comment, nothing special needs to be done to destroy
the unneeded context after the allocation race, but the context memory
itself still should to be freed.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-25 10:47:35 +00:00
Konstantin Belousov
5f8e5c7fa2 Do not create RMRR entries for identity-mapped domains.
It does not make sense since identity mapping already provides the
required mapping for RMRR ranges.  More, since identity page tables do
not reflect content of map entries for id domains, creating RMRR
entries makes domain data inconsistent.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-25 10:45:16 +00:00
Konstantin Belousov
ad969cb1cd Slight cleanup of the comment.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-25 10:42:10 +00:00
Gavin Atkinson
cb8f312598 Improve grammar on a warning, and only use one line rather than two when
printing it.
2017-03-24 16:18:20 +00:00
Roger Pau Monné
6e2ab0aef5 x86/srat: fix parsing of APIC IDs > MAX_APIC_ID
Ignore them like it's done in the MADT parser. This allows booting on a box
with SRAT and APIC IDs > 255.

Reported by:	Wei Liu <wei.liu2@citrix.com>
Tested by:	Wei Liu <wei.liu2@citrix.com>
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Citrix Systems R&D
2017-03-16 09:33:36 +00:00
Peter Grehan
264fae0792 Add the AMD MONITORX/MWAITX feature definition introduced in
Bulldozer/Ryzen CPUs.

Reviewed by:	kib
MFC after:	1 week
2017-03-16 03:06:50 +00:00
Eric van Gyzen
38ef0279a4 Validate values read from the RTC before trying BCD decoding
Submitted by:	cem
Reported by:	Michael Gmelin <freebsd@grem.de>
Tested by:	Oleksandr Tymoshenko <gonzo@bluezbox.com>
Sponsored by:	Dell EMC
2017-03-09 02:19:30 +00:00
Andriy Gapon
c1ad4beb32 mca: fix up couple of issues introduced with amd thresholding in r314636
1. There a was a typo in one place where the processor family is
   checked (16 vs 0x16).  Now the checks are consolidated in a single
   function.
2. Instead of an array of struct amd_et_state objects the code allocated
   an array of pointers.  That was no problem on amd64 where the sizes
   are the same, but could be a problem on i386.

Reported by:	tuexen and others
Tested by:	tuexen (earlier version of the fix)
Pointyhat to:	avg
MFC after:	5 days
X-MFC with:	r314636
2017-03-05 07:46:48 +00:00
Andriy Gapon
7abf460488 MCA: add AMD Error Thresholding support
Currently the feature is implemented only for a subset of errors
reported via Bank 4.  The subset includes only DRAM-related errors.

The new code builds upon and reuses the Intel CMC (Correctable MCE
Counters) support code.  However, the AMD feature is quite different
and, unfortunately, much less regular.

For references please see AMD BKDGs for models 10h - 16h.
Specifically, see MSR0000_0413 NB Machine Check Misc (Thresholding)
Register (MC4_MISC0).
http://developer.amd.com/resources/developer-guides-manuals/

Reviewed by:	jhb
MFC after:	1 month
Differential Revision: https://reviews.freebsd.org/D9613
2017-03-03 22:42:43 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Andriy Gapon
bc1e649924 Local APIC: add support for extended LVT entries found in AMD processors
The extended LVT entries can be used to configure interrupt delivery
for various events that are internal to a processor and can use this
feature.

All current processors that support the feature have four of such entries.
The entries are all masked upon the processor reset, but it's possible
that firmware may use some of them.

BIOS and Kernel Developer's Guides for some processor models do not assign
any particular names to the extended LVTs, while other BKDGs provide names
and suggested usage for them.
However, there is no fixed mapping between the LVTs and the processor
events in any processor model that supports the feature.  Any entry can be
assigned to any event.  The assignment is done by programming an offset
of an entry into configuration bits corresponding to an event.

This change does not expose the flexibility that the feature offers.
The change adds just a single method to configure a hardcoded extended LVT
entry to deliver APIC_CMC_INT.  The method is designed to be used with
Machine Check Error Thresholding mechanism on supported processor models.

For references please see BKDGs for families 10h - 16h and specifically
descriptions of APIC30, APIC400, APIC[530:500] registers.
For a description of the Error Thresholding mechanism see, for example,
BKDG for family 10h, section 2.12.1.6.
http://developer.amd.com/resources/developer-guides-manuals/

Thanks to jhb and kib for their suggestions.

Reviewed by:	kib
Discussed with:	jhb
MFC after:	5 weeks
Relnotes:	maybe
Differential Revision: https://reviews.freebsd.org/D9612
2017-02-28 18:48:12 +00:00
Andriy Gapon
63509269e8 fix lvt_mode: edge-triggered interrupt mode is set by clearing APIC_LVT_TM
The fixed is used only to fix up buggy MPTable information and the
trigger mode is probably ignored for the relevant interrupt types
anyway.  Still, it's better to be standards compliant and have the code
do what it says it does.

Discussed with:	jhb
MFC after:	5 days
2017-02-27 17:36:31 +00:00
Yoshihiro Takahashi
645b154260 Fix the acpi idle support on i386 which was broken by r312910.
The ifdefs were '#if !defined(__i386__) || !defined(PC98)' previously,
so cpu_idle_acpi was enabled both i386 and amd64 except PC98.

I was obfuscated by '#if !defined(__i386__)' condition.

Submitted by:	bde
Reported by:	bde
2017-02-26 13:25:56 +00:00
Konstantin Belousov
d360b49b1d Do not use ULL suffix. Cast to uint64_t where the suffix is needed,
and just remove it in another place.

Requested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-02-25 10:32:49 +00:00
Warner Losh
28586889c2 Convert PCIe Hot Plug to using pci_request_feature
Convert PCIe hot plug support over to asking the firmware, if any, for
permission to use the HotPlug hardware. Implement pci_request_feature
for ACPI. All other host pci connections to allowing all valid feature
requests.

Sponsored by: Netflix
2017-02-25 06:11:59 +00:00
Jonathan T. Looney
a81ce6e9d5 We have seen several cases recently where we appear to get a double-fault:
We have an original panic. Then, instead of writing the core to the dump
device, the kernel has a second panic: "smp_targeted_tlb_shootdown:
interrupts disabled". This change is an attempt to fix that second panic.

When the other CPUs are stopped, we can't notify them of the TLB shootdown,
so we skip that operation. However, when the CPUs come back up, we
invalidate the TLB to ensure they correctly observe any changes to the
page mappings.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D9786
2017-02-24 18:56:00 +00:00
Konstantin Belousov
8cd5962571 Remove cpu_deepest_sleep variable.
On Core2 and older Intel CPUs, where TSC stops in C2, system does not
allow C2 entrance if timecounter hardware is TSC.  This is done by
tc_windup() which tests for TC_FLAGS_C2STOP flag of the new
timecounter and increases cpu_disable_c2_sleep if flag is set.  Right
now init_TSC_tc() only sets the flag if cpu_deepest_sleep >= 2, but
TSC is initialized too early for this variable to be set by
acpi_cpu.c.

There is no reason to require that ACPI reported C2 and deeper states
to set TC_FLAGS_C2STOP, so remove cpu_deepest_sleep test from
init_TSC_tc() condition.  And since this is the only use of the
variable, remove it at all.

Reported and submitted by:	Jia-Shiun Li <jiashiun@gmail.com>
Suggested by:	jhb
MFC after:	2 weeks
2017-02-24 16:11:55 +00:00
Konstantin Belousov
091326f2ad More fixes for regression in r313898 on i386.
Use long long constants where needed.

Reported and tested by:	kargl
Sponsored by:	The FreeBSD Foundation
MFC after:	10 days
2017-02-22 07:07:05 +00:00
Andriy Gapon
b929de1078 mca: change type of last_intr to time_t for consinstency
time_uptime is time_t

MFC after:	1 day
X-MFC with:	r313752
2017-02-21 09:33:21 +00:00
Konstantin Belousov
8403b5a129 Fix regression in r313898 on i386.
Use large enough type for calculation of mtrr physmask.  Typical
cpu_maxphyaddr is 36 or larger.

Reported and tested by:	sbruno
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2017-02-19 03:57:41 +00:00
Konstantin Belousov
83ebde953c Rely on CPUID feature only to enable attaching. MTRR are architectural
and there is no reason to check cpu family or vendor.

Noted by:   royger
Reviewed by: jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D9657
2017-02-17 22:50:41 +00:00
Konstantin Belousov
befb38bf9a smp_rendezvous() works for UP case as well, reduce duplicated
code.  Also fix cast and remove unneeded XXX in comment.

Noted and reviewed by: jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D9657
2017-02-17 22:49:52 +00:00
Konstantin Belousov
b1fa987835 Merge i386 and amd64 mtrr drivers.
Reviewed by:	royger, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D9648
2017-02-17 21:08:32 +00:00
Warner Losh
86d99b6884 Remove EISA bus support for add-in cards. Remove related kernel and
compile options. Remove doxygen pointers to now deleted files. Remove
EISA and VME as examples in bus_space.9.

Retained EISA mode code for IO PIC and MPTABLES because that's not
EISA bus, per se, and some people have abused EISA to mean "EISA-like
behavior as opposed to ISA" rather than using it for EISA add-in
cards.

Relnotes: yes
2017-02-16 21:57:35 +00:00
Warner Losh
5625fe9246 Remove Micro Channel Architecture support. Of the commonly available
machines, only a few 486 machines that used it, and those haven't had
enough memory to run FreeBSD for quite some time (often limited to
16MB).

Not to be confused with the Machine Check Architecture, which is still
very much alive and used (and untouched by this commit).

No Objection From: arch@
2017-02-15 23:04:25 +00:00
Andriy Gapon
92b87cdb24 mca: use time_uptime instead of ticks for CMCI throttling
This solves several problems.
First of all, cmc_throttle is specified in seconds and there was no
conversion between ticks and seconds when they were mixed together.
Second, we avoid potential problems with ticks wrapping around.

Resolution of time_uptime should be sufficient for the throttling
purposes.

Discussed with:	jhb
MFC after:	12 days
2017-02-14 22:46:39 +00:00
Andriy Gapon
3be5c621e6 mca: fix writes to MSR_MC_CTL2 in cmci_update
Previously, if the threshold was changed, then MC_CTL2_CMCI_EN would get
cleared and the logic would switch to the polling only mode.

Discussed with:	jhb
MFC after:	2 weeks
2017-02-14 22:30:22 +00:00
Jonathan T. Looney
19d4720b1e Ensure the idle thread's loop services interrupts in a timely way when
using the ACPI C1/mwait sleep method.

Previously, the mwait instruction would return when an interrupt was
pending; however, the idle loop did not actually enable interrupts when
this occurred. This led to a situation where the idle loop could quickly
spin through the C1/mwait sleep method a number of times when an interrupt
was pending. (Eventually, the situation corrected itself when something
other than an interrupt triggered the idle loop to either enable interrupts
or schedule another thread.)

Reviewed by:	kib, imp (earlier version)
Input from:	jhb
MFC after:	1 week
Sponsored by:	Netflix
2017-02-08 16:46:57 +00:00
Konstantin Belousov
9fb10d635e Define the vm_ooffset_t and vm_pindex_t types as machine-independend.
The types are for the byte offset and page index in vm object.  They
are similar to off_t, which is defined as 64bit MI integer.  Using MI
definitions will allow to provide consistent MD values of vm
object-related maximum sizes.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-02-04 12:26:38 +00:00
Konstantin Belousov
57f6622f92 For i386, remove config options CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE
and device npx.

This means that FPU is always initialized and handled when available,
and SSE+ register file and exception are handled when available.  This
makes the kernel FPU code much easier to maintain by the cost of
slight bloat for CPUs older than 25 years.

CPU_DISABLE_CMPXCHG outlived its usefulness, see the removed comment
explaining the original purpose.

Suggested by and discussed with:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2017-02-03 12:51:40 +00:00
Yoshihiro Takahashi
2b375b4edd Remove pc98 support completely.
I thank all developers and contributors for pc98.

Relnotes:	yes
2017-01-28 02:22:15 +00:00
Conrad Meyer
db4fcadf52 "Buses" is the preferred plural of "bus"
Replace archaic "busses" with modern form "buses."

Intentionally excluded:
* Old/random drivers I didn't recognize
  * Old hardware in general
* Use of "busses" in code as identifiers

No functional change.

http://grammarist.com/spelling/buses-busses/

PR:		216099
Reported by:	bltsrc at mail.ru
Sponsored by:	Dell EMC Isilon
2017-01-15 17:54:01 +00:00
Pedro F. Giffuni
be04edbb4f Remove __nonnull() attributes from x86 machine check architecture.
These are of the few cases where we use the GCC non-null attributes in
non-header code. As part of a review [1] of our use of such attributes we
are replacing such uses of the overly aggressive GCC attribute with clang's
_Nonnull attribute.

In this case the attributes serve little purpose as they just don't
enforce run time checks, If anything the attributes would cause NULL pointer
checks to be ignored but there are no such checks so only effect is
cosmetic.

The references appear to be left over from code development and likely
already fulfilled their purpose.

Reference [1]:
https://reviews.freebsd.org/D9004

Reviewed by:	jhb
MFC after:	3 weeks
2017-01-13 01:39:19 +00:00
Roger Pau Monné
ca7af67ac9 xen: fix IPI setup with EARLY_AP_STARTUP
Current Xen IPI setup functions require that the caller provide a device in
order to obtain the name of the interrupt from it. With early AP startup this
device is no longer available at the point where IPIs are bound, and a KASSERT
would trigger:

panic: NULL pcpu device_t
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff82233a20
vpanic() at vpanic+0x186/frame 0xffffffff82233aa0
kassert_panic() at kassert_panic+0x126/frame 0xffffffff82233b10
xen_setup_cpus() at xen_setup_cpus+0x5b/frame 0xffffffff82233b50
mi_startup() at mi_startup+0x118/frame 0xffffffff82233b70
btext() at btext+0x2c

Fix this by no longer requiring the presence of a device in order to bind IPIs,
and simply use the "cpuX" format where X is the CPU identifier in order to
describe the interrupt.

Reported by:            sbruno, cperciva
Tested by:              sbruno
X-MFC-With:             r310177
Sponsored by:           Citrix Systems R&D
2016-12-22 16:09:44 +00:00
Sepherosa Ziehau
fff5be0be3 hyperv: Implement userspace gettimeofday(2) with Hyper-V reference TSC
This 6 times gettimeofday performance, as measured by
tools/tools/syscall_timing

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8789
2016-12-19 07:40:45 +00:00
Mark Johnston
f85ea63d69 Don't run the MCA record refill task during boot.
The MCA taskqueue is not initialized until some time after CMCIs are
enabled on the BSP.

Reviewed by:	cem, jhb
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8783
2016-12-14 19:00:08 +00:00
Konstantin Belousov
5ab0f0c3f0 Prefix hex memory addresses with 0x in diagnostic messages from the
SRAT parser.

Submitted by:	Oliver Pinter
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D8750
2016-12-11 19:01:27 +00:00
Mark Johnston
10c480e775 Require the STACK option for code that captures stacks of running threads.
stack_machdep.c is compiled if either of the DDB or STACK options is
specified, but stack_save_td_running() isn't useable from DDB. Moreover,
stack_save_td_running() works by raising an NMI on the CPU running the
target thread, and the corresponding handler is compiled only if STACK is
configured.

Reported by:	kib
MFC after:	1 week
2016-12-06 22:48:28 +00:00
Konstantin Belousov
3dd3c4503b Release DMAR table after using it.
Reported and tested by:	hps
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-05 11:42:09 +00:00
Konstantin Belousov
85d99487b8 Rename fast taskqueues used by DMAR to avoid naming conflict of the
sleepable and spin mutexes created by the queues.

Reported and tested by:	hps
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-12-05 11:41:09 +00:00
Alexey Dokuchaev
a48f5e1ffa - Mention mismatching numbers in MSR vs. ACPI _PSS count warning: seeing
actual numbers would help debugging (also, `MSR' and `ACPI' are standard
  abbreviations and thus should be properly capitalized)
- Rephrase unsupported AMD CPUs message and wrap as an overly long line:
  `sorry' 1) is wrongly spelled after period (starts with a small letter)
  and 2) carries emotional "tinge" that is unnecessary and even bogus in
  debug message; `implemented' is not the best word as `supported' suits
  better in this context
- Improve readability when reporting resulted P-state transition (debug)

Approved by:	jhb
2016-12-01 14:31:05 +00:00
Konstantin Belousov
83a288f434 Fix automatic eventtimer hardware selection when ARAT
(APIC-Timer-always-running) is not implemented.

If machine has ncpus >= 8 and non-FSB interrupt routing from HPET,
default HPET eventtimer quality 450 is reduced by 100, i.e. it is
350. On the other hand, LAPIC default quality is 600 and it is reduced
by 200 if ARAT is not reported. We end up with HPET quality 350 <
LAPIC quality 400, despite ARAT is not set.  Then, since deep Cx
states are active by default, eventtimer fail.

E.g., on Nehalem Core i7 CPU and X58 chipset, LAPIC only works in
C0/C1/C1E and HPET does not implement FSB mode, which otherwise
requires manual switch to HPET to get working system.

Set LAPIC eventtimer quality to 100 if no ARAT.
While there, do not ignore deadlint TSC mode for LAPIC timer if ARAT
is not implemented.  If user manually selected LAPIC eventtimer on
such CPU, there is no reason to not use deadline if available and not
disabled administratively.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-26 10:33:53 +00:00
Bryan Drewery
28323add09 Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
Adrian Chadd
7cfecbb95b Add a witness check to enforce that no non-sleeping locks are held when
they shouldn't be.

I used this during driver bring-up to find that the Linux driver holds a
whole lot of locks whilst doing their equivalent of busdma operations.

If this works out well, it should be added to the other architecture busdma
implementations to aid in similar debugging.

Tested:

* bounce buffer and dmar busdma, Lenovo X230 laptop, all the internal
  hardware
* ath(4) too

Discussed with: jhb
2016-11-03 23:11:33 +00:00
Roger Pau Monné
0f4d7d9fd7 xen/intr: add reference counts to event channels
Add a reference count to xenisrc. This is required for implementation of
unmap-notifications in the grant table userspace device (gntdev). We need to
hold a reference to the event channel port, in case the user deallocates the
port before we send the notification.

Submitted by:		jaggi
Reviewed by:		royger
Differential review:	https://reviews.freebsd.org/D7429
2016-10-31 13:00:53 +00:00
Konstantin Belousov
1d6dfd1230 Use correct cpu id in the banner. Fix style.
Noted by:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	9 days
2016-10-28 12:27:05 +00:00
John Baldwin
7b64a80b55 Add powerd(8) support for several families of AMD CPUs.
Use the same logic to calculate the nominal CPU frequency from the P-state
MSRs on family 0x12, 0x15, and 0x16 CPUs as is used for family 0x10.
Family 0x14 was included in the original patch in the PR but I left that
out as the BIOS writer's guide for family 0x14 CPUs show a different layout
for the relevant MSR and include a different formulate for calculating the
frequency.

While here, simplify a few expressions and print out the family of
unsupported CPUs in hex rather than decimal.

PR:		212020
Submitted by:	Anthony Jenkins <Scoobi_doo@yahoo.com>
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7587
2016-10-27 21:31:56 +00:00
John Baldwin
16dcd7734f MFamd64: Add bounds checks on addresses used with /dev/mem.
Reject attempts to read from or memory map offsets in /dev/mem that are
beyond the maximum-supported physical address of the current CPU.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7408
2016-10-27 21:23:14 +00:00
Konstantin Belousov
295f4b6cfe Follow-up to r307866:
- Make !KDB config buildable.
- Simplify interface to nmi_handle_intr() by evaluating panic_on_nmi
  in one place, namely nmi_call_kdb().  This allows to remove do_panic
  argument from the functions, and to remove i386/amd64 duplication of
  the variable and sysctl definitions.  Note that now NMI causes
  panic(9) instead of trap_fatal() reporting and then panic(9),
  consistently for NMIs delivered while CPU operated in ring 0 and 3.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-24 20:47:46 +00:00
Konstantin Belousov
a57d70325e Fix typo.
Submitted by:	alc
MFC after:	3 days
2016-10-24 17:37:21 +00:00
Konstantin Belousov
835c2787be Handle broadcast NMIs.
On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs
reporting hardware errors are broadcasted to all CPUs.

When kernel is configured to enter kdb on NMI, the outcome is
problematic, because each CPU tries to enter kdb.  All CPUs are
executing NMI handlers, which set the latches disabling the nested NMI
delivery; this means that stop_cpus_hard(), used by kdb_enter() to
stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work.  One
indication of this is the harmless but annoying diagnostic "timeout
stopping cpus".

Much more harming behaviour is that because all CPUs try to enter kdb,
and if ddb is used as debugger, all CPUs issue prompt on console and
race for the input, not to mention the simultaneous use of the ddb
shared state.

Try to fix this by introducing a pseudo-lock for simultaneous attempts
to handle NMIs.  If one core happens to enter NMI trap handler, other
cores see it and simulate reception of the IPI_STOP_HARD.  More,
generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting
for the acknowledgement, relying on the nmi handler on other cores
suspending and then restarting the CPU.

Since it is impossible to detect at runtime whether some stray NMI is
broadcast or unicast, add a knob for administrator (really developer)
to configure debugging NMI handling mode.

The updated patch was debugged with the help from Andrey Gapon (avg)
and discussed with him.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D8249
2016-10-24 16:40:27 +00:00
Mateusz Guzik
53dc58f2dc Mark a bunch of mpsafe sysctls as such.
This gives me a sysctl Giant-free buildworld.
2016-10-19 19:42:01 +00:00
John Baldwin
4fae28a084 Reprogram I/O APIC interrupt pins when registering an I/O APIC.
All I/O APIC pins are masked when an I/O APIC is first probed.  The
APIC enumerator (MP Table or MADT) then parses its associated tables to
configure individual pins to set custom delivery modes or alternate
routing (e.g. routing IRQ 0 to intpin 2).  Pins for regular interrupt
pins are left masked until the first interrupt is assigned.  However,
pins with unusual settings (e.g. NMI or SMI) are never assigned an
interrupt and thus never re-programmed.  The I/O APIC code used to
reprogram all interrupt pins during registration but this was lost in
r151979.

In theory, this is mostly a no-op as the ACPI APIC table does not
include a way to enumerate NMI or SMI pins for the I/O APIC, so only
systems using an MP Table would be affected.

Reported by:	avg
MFC after:	1 month
2016-10-14 21:51:50 +00:00
Jung-uk Kim
493deb390b Merge ACPICA 20160930. 2016-10-04 20:27:15 +00:00
Konstantin Belousov
83c001d3c2 Re-apply r306516 (by cem):
Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags

Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.

On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one.  (Running a basic file
server workload.)

Submitted by:	Anton Rang <rang at acm.org>
Reviewed by:	cem (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8041
2016-10-04 17:01:24 +00:00
Conrad Meyer
31f575777c Revert r306516 for now, it is incomplete on i386
Noted by:	kib
2016-09-30 18:58:50 +00:00
Conrad Meyer
2965d505f6 Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags
Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.

On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one.  (Running a basic file
server workload.)

Submitted by:	Anton Rang <rang at acm.org>
Reviewed by:	cem (earlier version), kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8041
2016-09-30 18:12:16 +00:00
Sepherosa Ziehau
37e0abf2ef x86/ioapic: Fix destination cpu for Hyper-V
On Hyper-V:
- Stick to the first cpu for all I/O APIC pins.
- And don't allow destination cpu changes.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D7949
2016-09-30 06:08:21 +00:00
Konstantin Belousov
36596c2a29 Detect x2APIC mode on boot and obey it.
If BIOS performed hand-off to OS with BSP LAPIC in the x2APIC mode,
system usually consumes such configuration without a notice, since
x2APIC is turned on by OS if possible (nop).  But if BIOS
simultaneously requested OS to not use x2APIC, code assumption that
that xAPIC is active breaks.

In my opinion, we cannot safely turn off x2APIC if control is passed
in this mode.  Make madt.c ignore user or BIOS requests to turn x2APIC
off, and do not check the x2APIC black list.  Just trust the config
and try to continue, giving a warning in dmesg.

Reported and tested by:	Slawa Olhovchenkov <slw@zxy.spb.ru> (previous version)
Diagnosed by and discussed with:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-09-19 15:58:45 +00:00
Bruce Evans
5904b5a6f2 Fix decoding of tf_rsp on amd64, and move TF_HAS_STACKREGS() to the
i386-only section, and fix a comment about the amd64 kernel trapframe
not having stackregs.

tf_rsp doesn't need decoding on amd64, but had an old clone of i386
code to do this in 1 place, and since the amd64 kernel trapframe does
have stackregs, the result was an off-by-16 error for %rsp in an error
message.
2016-09-16 07:09:35 +00:00
John Baldwin
38605d7312 Remove 'cpu' and 'cpu_class' on amd64.
The 'cpu' and 'cpu_class' variables were always set to the same value
on amd64 and are legacy holdovers from i386.  Remove them entirely on
amd64.

Reviewed by:	imp, kib (older version)
Differential Revision:	https://reviews.freebsd.org/D7888
2016-09-15 17:05:54 +00:00
Bruce Evans
701ac88055 Use the MI macro TRAPF_USERMODE() instead of open-coded checks for
SEL_UPL and sometimes PSL_VM.  This is just a style change on amd64,
but on i386 it fixes 1 unimportant place where the PSL_VM check was
missing and starts fixing 1 important place where the PSL_VM check
had a logic error.

Fix logic errors in treating vm86 bioscall mode as kernel mode.  The
main place checked all the necessary flags, but put the necessary
parentheses for the PSL_VM and PCB_VM86CALL checks in the wrong
place.  The broken case is only reached if a vm86 bioscall uses a
%cs which is nonzero mod 4, but that is unusual -- most bios calls
start with %cs = 0xc000 or 0xf000 and rarely change it.  Another
place was missing the check for PCB_VM86CALL, but was only reachable
if there are bugs virtualizing PSL_I.

Add a macro TF_HAS_STACKREGS() and use this instead of converting
open-coded checks of SEL_UPL, etc. to TRAPF_USERMODE() when we only
care about whether the frame has stack registers.  This fixes 3
places in my recent fix for register variables in vm86 mode where I
messed up the PSL_VM check and cleans up other places.
2016-09-14 12:57:40 +00:00
Konstantin Belousov
1a9ded46bd Fix typo in comment.
MFC after:	3 days
2016-09-12 16:44:21 +00:00
Sepherosa Ziehau
b9f62e3a74 x86: Use sx lock for interrupt sources.
- Certain pic_assign_cpu, e.g. msi_assign_cpu can have quite a long
  call chain.  For msi_assign_cpu, mutex makes complex PCI bridge
  drivers more tricky, e.g. sleep can note be called, etc, it will
  be pretty tricky for upcoming Hyper-V PCI bridge driver for PCI
  pass-through.
- It is not used on any hot code path nor non-sleepable context, so
  sx should have the same effect as mutex.

PIC list is still protected by mutex to keep suspend/resume work.

Discussed with: jhb
Reviewed by:	jhb
MFC after:	3 weeks
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D7784
2016-09-12 04:57:58 +00:00
John Baldwin
db4b3cdad8 Remove remnants of PERFMON and I586_PMC_GUPROF from amd64.
These options were never fully ported over from i386.
2016-09-06 19:25:32 +00:00
John Baldwin
a47632d45b Fix build for !SMP kernels after the Xen MSIX workaround.
Move msix_disable_migration under #ifdef SMP since it doesn't make sense
for !SMP kernels.

PR:		212014
Reported by:	Glyn Grinstead <glyn@grinstead.org>
MFC after:	3 days
2016-08-22 21:23:17 +00:00
Konstantin Belousov
1680854946 Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC.  For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge.  Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.

Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET.  For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.

Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods.  Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location.  __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.

Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access.  But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.

Tested by:	Howard Su <howard0su@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
Pedro F. Giffuni
a061aa46fe sys: replace comma with semicolon when pertinent.
Uses of commas instead of a semicolons can easily go undetected. The comma
can serve as a statement separator but this shouldn't be abused when
statements are meant to be standalone.

Detected with devel/coccinelle following a hint from DragonFlyBSD.

MFC after:	1 month
2016-08-09 19:42:20 +00:00
John Baldwin
264cd10809 Add additional constants.
- Add constants for the fields in the root-entry table address register,
  namely the root type type (RTT) and root table address (RTA) mask.
- Add macros for the bitmask of the domain ID field in the second word
  of context table entries as well as a helper macro (DMAR_CTX2_GET_DID)
  to extract the domain ID from a context table entry.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	Chelsio Communications
2016-08-09 19:02:14 +00:00
John Baldwin
f454e7ebf5 Add __printflike() to bus_describe_intr() to enable -Wformat checks.
Fix a few places that were passing a raw string as the format to use
a "%s" format string instead.

MFC after:	2 months
2016-08-04 18:29:16 +00:00
Konstantin Belousov
fa03524a9f Merge i386 and amd64 variants of mp_watchdog.c into x86/, there is no
difference between files.
For pc98, put x86/mp_x86.c into the same place as used by i386 file list.
Fix typo in comment.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 13:51:53 +00:00
Roger Pau Monné
23006680c7 Revert r291022: x86/intr: allow mutex recursion in intr_remove_handler
This was only needed for Xen, and a better way to deal with this issue has
been found, so this commit can be reverted.

Sponsored by:		Citrix Systems R&D
MFC after:		5 days
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D7363
2016-07-29 16:35:58 +00:00
Roger Pau Monné
35fdb32d86 xen-intr: fix removal of event channels during resume
Event channel handlers cannot be removed during resume because there might
be an interrupt thread running on a CPU currently blocked in the
cpususpend_handler, which prevents the call to intr_remove_handler from
finishing and completely freezes the system during resume. r291022 tried to
fix this by allowing recursion in intr_remove_handler, but that's clearly
not enough.

Instead don't remove the handlers at the interrupt resume phase, and let
each driver remove the handler by itself during resume. In order to do this,
change the opaque event channel handler cookie to use the global interrupt
vector instead of the event channel port. The event channel port cannot be
used because after resume all event channels are reset, and the port numbers
can change.

Sponsored by:		Citrix Systems R&D
MFC after:		5 days
2016-07-29 16:34:54 +00:00
Maxim Sobolev
e0cd4b7f6f Don't print same value twice, one in decimal once in hex. This makes
output more cryptic than it needs to be and wastes cpu cycles and
console bandwidth.
2016-07-18 03:59:03 +00:00
Mark Johnston
f4d0e9c95f Allow ACPI wakeup code and page tables to be stored in non-contiguous pages.
Since these pages are allocated from a narrow range of memory, this makes
the allocation more likely to succeed.

Suggested by:	kib
Reviewed by:	jkim, kib
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D7154
2016-07-14 00:38:04 +00:00
Eric Badger
fdb6320d45 Add explicit detection of KVM hypervisor
Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use
vm_guest in conditionals testing for KVM.

Also, fix a conditional checking if we're running in a VM which caught only
the generic VM case, but not more specific VMs (KVM, VMWare, etc.).  (Spotted
by: vangyzen).

Differential revision:	https://reviews.freebsd.org/D7172
Sponsored by:	Dell Inc.
Approved by:	kib (mentor), vangyzen (mentor)
Reviewed by:	alc
MFC after:	4 weeks
2016-07-13 19:19:18 +00:00
Roger Pau Monné
302244700f xen: automatically disable MSI-X interrupt migration
If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and
70a3cb are required on the hypervisor side for this to be fixed, and those
are only included in 4.6.0, so stay on the safe side and disable MSI-X
interrupt migration on anything older than 4.6.0.

It should not cause major performance degradation unless a lot of MSI-X
interrupts are allocated.

Sponsored by:		Citrix Systems R&D
MFC after:		3 days
Reviewed by:		jhb
Differential revision:	https://reviews.freebsd.org/D7148
2016-07-12 08:43:09 +00:00
John Baldwin
be0319fd19 Add a tunable to disable migration of MSI-X interrupts.
The new 'machdep.disable_msix_migration' tunable can be set to 1 to
disable migration of MSI-X interrupts.

Xen versions prior to 4.6.0 do not properly handle updates to MSI-X
table entries after the initial write.  In particular, the operation
to unmask a table entry after updating it during migration is not
propagated to the "real" table for passthrough devices causing the
interrupt to remain masked.  At least some systems in EC2 are
affected by this bug when using SRIOV.  The tunable can be set in
loader.conf as a workaround.

Submitted by:	Jeremiah Lott <jlott@averesystems.com> (original patch)
Approved by:	re (marius)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6947
2016-06-24 22:49:32 +00:00
Mark Johnston
c722a89a63 Use M_NOWAIT when allocating memory for the ACPI wakeup handler.
If the allocation attempt fails, we may otherwise VM_WAIT after a failed
attempt to reclaim contiguous memory in the requested range. After r297466,
this results in the thread going to sleep, causing a hang during boot.

Reviewed by:	jkim, kib
Approved by:	re (gjb)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D6945
2016-06-23 19:24:38 +00:00
Konstantin Belousov
0bf716e988 Trim some spaces to record correct commit message for the r301278.
Reduce number of iterations used for calibrating ICR read loop.  The
new number of iteration still gives the same ICR latency as before,
tested on Intel SandyBridge and Haswell machines, and on AMD.  But it
significantly reduces the unneeded pause on boot in some VMs, from ~10
secs to less then 1 sec.  It was reported to occur in bhyve on AMD
host.

Reported and tested by:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-06-03 18:23:45 +00:00
Konstantin Belousov
fcc1d8c9eb diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c
index d8bda77..bb15df0 100644
--- a/sys/x86/x86/local_apic.c
+++ b/sys/x86/x86/local_apic.c
@@ -511,7 +511,7 @@ native_lapic_init(vm_paddr_t addr)
 	}

 #ifdef SMP
-#define	LOOPS	1000000
+#define	LOOPS	100000
 	/*
 	 * Calibrate the busy loop waiting for IPI ack in xAPIC mode.
 	 * lapic_ipi_wait_mult contains the number of iterations which
2016-06-03 18:05:18 +00:00
Ed Schouten
3a45c3d643 Implement _ALIGN() using internal integer types.
The existing version depends on register_t and uintptr_t, which are only
available when including headers such as <sys/types.h>. As this macro is
used by <sys/socket.h>, for example, it should be written in such a way
that it doesn't depend on those types.
2016-05-31 13:31:19 +00:00
Ed Schouten
78fe75bc28 Add missing dependency on <machine/_limits.h>.
In r227474, this header file was changed to define SIG_ATOMIC_{MIN,MAX}
in terms of LONG_{MIN,MAX}. Unlike all of the definitions in this header
file, LONG_{MIN,MAX} is provided by <limits.h>. Remove the dependency on
<limits.h> by using __LONG_{MIN,MAX} instead and including
<machine/_limits.h>.

This change is needed to make SIG_ATOMIC_{MIN,MAX} work without
including any other header files.
2016-05-31 08:38:24 +00:00
Ed Schouten
46f38226d7 Add missing dependency on <machine/_limits.h>.
This header uses __INT_MIN and __INT_MAX, which is provided by
<machine/_limits.h>. This is needed to make <stdint.h>'s WCHAR_MIN and
WCHAR_MAX work without including other headers as well.
2016-05-31 08:36:39 +00:00
Sepherosa Ziehau
98a68947d4 hyperv/vmbus: Rename ISR functions
MFC after:	1 week
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6601
2016-05-31 04:47:53 +00:00
Konstantin Belousov
f159d7d6f0 Only calibrate ICR read loop when not in x2APIC mode. Run-time
switching between LAPIC modes is not supported, and there is no need
to wait for IPI ack in x2APIC mode.  So the calibrated delay is only
needed for !x2APIC.

This saves around a second of boot time on the real hardware for
x2APIC.

Sponsored by:	The FreeBSD Foundation
2016-05-26 09:09:11 +00:00
John Baldwin
10544b0951 Implement support for RF_UNMAPPED and bus_map/unmap_resource on x86.
Add implementations of bus_map/unmap_resource to the x86 nexus driver.
Change bus_activate/deactivate_resource to honor RF_UNMAPPED and to
use bus_map/unmap_resource to create/destroy the implicit mapping when
RF_UNMAPPED is not set.

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D5237
2016-05-20 18:00:10 +00:00
John Baldwin
fdce57a042 Add an EARLY_AP_STARTUP option to start APs earlier during boot.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed).  This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP.  It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot.  Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system.  In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU.  Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code.  This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP).  This will allow the option to be turned off
if need be during initial testing.  I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0.  Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86.  Other platform maintainers
are encouraged to port their architectures over as well.  The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR:		kern/199321
Reviewed by:	markj, gnn, kib
Sponsored by:	Netflix
2016-05-14 18:22:52 +00:00
Bjoern A. Zeeb
d68b7cfac5 Remove the extra _RD as _RDTUN already includes it.
Submitted by:	emaste
MFC after:	2 weeks
2016-05-13 15:29:40 +00:00
Bjoern A. Zeeb
2474dccf1a We already turn the AMD erratum383 workaround on for certain VM_GUEST_VM
if specific CPU features are not present.
Some simulation environments, e.g. gem5, have been found to require more
TLB management from the kernel in certain setups. It is currently unclear why.
Turning on the workaround_erratum383 seems to help and make problems (panics)
go away.
Given this is a fairly uncommon environment so far, allowing the workaround
to be manually enabled from loader in order to make debugging and comparing
traces easier, but also to allow gem5 run FreeBSD in X86 timing mode, seems
to be the least intrusive option for now until the issue if fully understood.

Sponsored by:	DARPA/AFRL
Reviewed by:	kib, alc (earlier)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6206
2016-05-13 15:11:17 +00:00
Bjoern A. Zeeb
c850971baf Allow orm(4) to be disabled from probing/attaching by a hints entry:
hint.orm.0.disabled=1

Suggested by:	jhb
Reviewed by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6307
2016-05-10 22:28:06 +00:00
Edward Tomasz Napierala
084d207584 Remove misc NULL checks after M_WAITOK allocations.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-10 10:26:07 +00:00
John Baldwin
8d791e5af1 Add a new bus method to fetch device-specific CPU sets.
bus_get_cpus() returns a specified set of CPUs for a device.  It accepts
an enum for the second parameter that indicates the type of cpuset to
request.  Currently two valus are supported:

 - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
   the device when DEVICE_NUMA is enabled)
 - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)

For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL.  INTR_CPUS is mapped to 'all_cpus'
by default.  The idea is that INTR_CPUS should always return a valid set.

Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).

The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.

The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled.  They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.

Compared to the r298933, this version uses 'struct _cpuset' in
<sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h>
(<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though
<sys/_bitset.h> does not after recent changes).
2016-05-09 20:50:21 +00:00
Eric van Gyzen
2db0699d88 Work around (ignore) broken SRAT tables
Instead of panicking when parsing an invalid ACPI SRAT table,
just ignore it, effectively disabling NUMA.

https://lists.freebsd.org/pipermail/freebsd-current/2016-May/060984.html

Reported and tested by:	 Bill O'Hanlon (bill.ohanlon at gmail.com)
Reviewed by:	jhb
MFC after:	1 week
Relnotes:	If dmesg shows "SRAT: Duplicate local APIC ID",
                try updating your BIOS to fix NUMA support.
Sponsored by:	Dell Inc.
2016-05-03 20:14:04 +00:00
John Baldwin
8a08b7d36b Revert bus_get_cpus() for now.
I really thought I had run this through the tinderbox before committing,
but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.
2016-05-03 01:17:40 +00:00
John Baldwin
bc153c692f Add a new bus method to fetch device-specific CPU sets.
bus_get_cpus() returns a specified set of CPUs for a device.  It accepts
an enum for the second parameter that indicates the type of cpuset to
request.  Currently two valus are supported:

 - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
   the device when DEVICE_NUMA is enabled)
 - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)

For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL.  INTR_CPUS is mapped to 'all_cpus'
by default.  The idea is that INTR_CPUS should always return a valid set.

Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).

The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.

The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled.  They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.

Reviewed by:	wblock (manpage)
Differential Revision:	https://reviews.freebsd.org/D5519
2016-05-02 18:00:38 +00:00
Roger Pau Monné
f65466eb3a atrtc: export function to set RTC
This is going to be used by the Xen clock on Dom0 in order to set the RTC of
the host. The current logic in atrtc_settime is moved to atrtc_set and the
unused device_t parameter is removed from the atrtc_set function call so it
can be safely used by other callers.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib, jhb
Differential revision:	https://reviews.freebsd.org/D6067
2016-05-02 16:14:55 +00:00
Pedro F. Giffuni
d9c9c81c08 sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
Conrad Meyer
3765b80993 SRAT: Don't overflow domain_pxm table
If we reached MAXMEMDOM, we would previously try to insert an additional
element and only detect overflow after causing (probably trivial) memory
overflow.  Instead, detect the ndomain > MAXMEMDOM case before we write past
the end.

Reported by:	Coverity
CID:		1354783
Sponsored by:	EMC / Isilon Storage Division
2016-04-20 01:10:07 +00:00
Pedro F. Giffuni
ea24b0561f X86: use our nitems() macro when it is avaliable through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:41:46 +00:00
Konstantin Belousov
e164cafc69 Add hw.dmar.batch_coalesce tunable/sysctl, which specifies rate at
which queued invalidation completion interrupt is requested with
regard to the queued invalidation requests.  In other words, setting
the value of the knob to N requests completion interrupt after N items
are processed.  Existing behaviour is restored by setting
hw.dmar.batch_coalesce=1.

The knob significantly decreases the DMAR qi interrupt rate at the
cost of slightly longer DMAR map entries recycling.

Sponsored by:	The FreeBSD Foundation
2016-04-17 10:56:56 +00:00
Konstantin Belousov
c5c20928d3 Add x86 CPU features definitions published in the Intel SDM rev. 58.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-04-16 06:07:13 +00:00
Konstantin Belousov
9e297f96d4 Always calculate divisor for the counter mode of LAPIC timer. Even if
initially configured in the TSC deadline mode, eventtimer subsystem
can be switched to periodic, and then DCR register is loaded with
unitialized value.

Reset the LAPIC eventtimer frequency and min/max periods when changing
between deadline and counted periodic modes.

Reported and tested by:	Vladimir Zakharov <zakharov.vv@gmail.com>
Sponsored by:	The FreeBSD Foundation
2016-04-15 14:36:38 +00:00
Roger Pau Monné
9b44287ce5 busdma/bounce: revert r292255
Revert r292255 because it can create bounced regions without contiguous
page offsets, which is needed for USB devices.

Another solution would be to force bouncing the full buffer always (even
when only one page requires bouncing), but this seems overly complicated and
unnecessary, and it will probably involve using more bounce pages than the
current code.

Reported by: phk
2016-04-15 09:21:50 +00:00
Pedro F. Giffuni
a3269b0863 x86: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-14 17:04:06 +00:00
Warner Losh
bd3bce41db Deprecate using hints.acpi.0.rsdp to communicate the RSDP to the
system. This uses the hints mechnanism. This mostly works today
because when there's no static hints (the default), this value can be
fetched from the hint. When there is a static hints file, the hint
passed from the boot loader to the kernel is ignored, but for the BIOS
case we're able to find it anyway. However, with UEFI, the fallback
doesn't work, so we get a panic instead.

Switch to acpi.rsdp and use TUNABLE_ULONG_FETCH instead. Continue to
generate the old values to allow for transitions. In addition, fall
back to the old method if the new method isn't present.

Add comments about all this.

Differential Revision: https://reviews.freebsd.org/D5866
2016-04-14 04:59:51 +00:00
Andriy Gapon
0d63fc3ed8 re-enable AMD Topology extension on certain models if disabled by BIOS
Some BIOSes disable AMD Topology extension on AMD Family 15h notebook
processors.  We re-enable the extension, so that we can properly discover
core and cache topology.  Linux seems to do the same.

Reported by:	Johannes Dieterich <dieterich.joh@gmail.com>
Reviewed by:	jhb, kib
Tested by:	Johannes Dieterich <dieterich.joh@gmail.com>
		(earlier version)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D5883
2016-04-12 13:30:39 +00:00
Pedro F. Giffuni
74b8d63dcc Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
John Baldwin
62d70a8174 Add more fine-grained kernel options for NUMA support.
VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in
the virtual memory system.  DEVICE_NUMA is used to enable affinity
reporting for devices such as bus_get_domain().

MAXMEMDOM must still be set to a value greater than for any NUMA support
to be effective.  Note that 'cpuset -gd' always works if MAXMEMDOM is
enabled and the system supports NUMA.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D5782
2016-04-09 13:58:04 +00:00
Sepherosa Ziehau
19605ff758 xen: Set ipi_{alloc,free} even for UP
This keeps XEN apic_ops aligned w/ x86's.

Suggested by:	kib, jhb
Reviewed by:	jhb, royger
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5871
2016-04-07 07:00:00 +00:00
Sepherosa Ziehau
8b0986c27f x86: Allow interrupt vector allocation/free even on UP
It is needed by the hypervisor FreeBSD guest to allocate/free private
interrupt vectors.

Reviewed by:	kib, jhb, Dexuan Cui <decui microsoft com>
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5849
2016-04-07 06:36:03 +00:00
Andriy Gapon
c77702de74 x86 topo: add some comments, descriptions and references to documentation
Plus a minor cosmetic change.

MFC after:	1 month
2016-04-05 10:36:40 +00:00
Andriy Gapon
4725e6bff3 new x86 smp topology detection code
Previously, the code determined a topology of processing units
(hardware threads, cores, packages) and then deduced a cache topology
using certain assumptions.  The new code builds a topology that
includes both processing units and caches using the information
provided by the hardware.

At the moment, the discovered full topology is used only to creeate
a scheduling topology for SCHED_ULE.
There is no KPI for other kernel uses.

Summary:
- based on APIC ID derivation rules for Intel and AMD CPUs
- can handle non-uniform topologies
- requires homogeneous APIC ID assignment (same bit widths for ID
  components)
- topology for dual-node AMD CPUs may not be optimal
- topology for latest AMD CPU models may not be optimal as the code is
  several years old
- supports only thread/package/core/cache nodes

Todo:
  - AMD dual-node processors
  - latest AMD processors
  - NUMA nodes
  - checking for homogeneity of the APIC ID assignment across packages
  - more flexible cache placement within topology
  - expose topology to userland, e.g., via sysctl nodes

Long term todo:
  - KPI for CPU sharing and affinity with respect to various resources
    (e.g., two logical processors may share the same FPU, etc)

Reviewed by:	mav
Tested by:	mav
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D2728
2016-04-04 16:09:29 +00:00
John Baldwin
2b1e924b69 Move i386/i386/autoconf.c to sys/x86/x86 and use it on both amd64 and i386. 2016-04-03 23:03:54 +00:00
Konstantin Belousov
5c8e0b3bcb Style(9), use tabs for the #define LOOPS line.
Print unsigned values with %u.
Make code slightly more compact by inlining loop limit.

Noted by:	bde
Sponsored by:	The FreeBSD Foundation
2016-04-01 08:47:23 +00:00
Konstantin Belousov
0df87548b9 Type of the interrupt handlers on x86 cannot be expressed in C.
Simplify and unify placeholder type definitions.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5771
2016-03-29 19:56:48 +00:00
Konstantin Belousov
d317106ce2 Fix several bugs in r297374:
- fix UP build [1]
- do not obliterate initial reading of rdtsc by the loop counter [2]
- restore the meaning of the argument -1 to native_lapic_ipi_wait()
  as wait until LAPIC acknowledge without timeout
- correct formula for calculating loop iteration count for 1us, it was
  inverted, and ensure that even on unlikely slow CPUs at least one
  check for ack is performed.

Reported by:	Michael Butler <imb@protected-networks.net> [1], rpokala[2],
	jhb[3]
Tested by:	Michael Butler
Pointy hat to:	kib
Sponsored by:	The FreeBSD Foundation
2016-03-29 19:54:13 +00:00
Konstantin Belousov
998e1ef11f Calibrate the frequency of the of the native_lapic_ipi_wait() loop,
and avoid a delay while waiting for IPI delivery acknowledgement in
xAPIC mode.  This makes the loop exit immediately after the delivery
bit in APIC_ICR register is set, instead of waiting for some
microseconds.

We only need to ensure that some amount of time is allowed for the
LAPIC to react to the command, and we need that the wait time is
finite and reasonable.  For that reasons, it is irrelevant if the CPU
frequency or throttling decrease the speed and make the loop,
calibrated for full CPU speed at boot time, execute somewhat slower.

Discussed with:	bde, jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2016-03-29 08:44:56 +00:00
Konstantin Belousov
d58c003a8a Use ANSI function definition.
Sponsored by:	The FreeBSD Foundation
2016-03-29 08:31:34 +00:00
Konstantin Belousov
841d5e0151 Do not load LAPIC_DCR_TIMER with an undefined value. If we are in the
deadline mode the divide configuration is not used and
lapic_timer_divisor is not set.

Reported by:	dhw, mav
Tested by:	mav
Sponsored by:	The FreeBSD Foundation
2016-03-28 15:05:00 +00:00
Konstantin Belousov
ecabd74728 Use TSC deadline mode for LAPIC timer, when available. The mode fires
LAPIC timer iinterrupt when TSC reaches the value written to the
IA32_TSC_DEADLINE MSR.  To arm or reset the timer in deadline mode, a
single non-serializing MSR write is enough.  This is an advance from
the one-shot mode of LAPIC, where timer operated with the FSB
frequency and required two (serialized in case of xAPIC) writes to the
APIC registers.

The LVT_TIMER register value is cached to avoid unneeded writes in the
deadline mode.  Unused arguments to specify period (which is passed in
struct lapic as la_timer_period) and interrupt enable (which is always
enabled) are removed from lapic_timer_{oneshot,periodic,deadline}
functions.  Instead, special lapic_timer_oneshot_nointr() function for
interrupt-less one-shot calibration is added.

Reviewed by:	mav (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5738
2016-03-28 09:52:44 +00:00
Konstantin Belousov
7c4e76935e Add defines for the LAPIC TSC deadline timer mode. The LVT timer mode
field is two-bit, extend the mask.

Also add comments about all MSRs writes to which are not serializing.

Sponsored by:	The FreeBSD Foundation
2016-03-28 09:43:40 +00:00
John Baldwin
7a2c1d8c60 Enable interrupts on the BSP once all PICs are initialized.
This moves the enabling of interrupts slightly earlier (the old location
was still before devices were enumerated and probed) and does it in the
interrupt code (rather than in the device configuration code).  This
also avoids tripping over an assertion on the first TLB shootdown with
earlier AP startup.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D5710
2016-03-24 00:24:07 +00:00
Justin Hibbits
f8fd3fb518 Fix the resource_list_print_type() calls to use uintmax_t.
Missed a bunch from r297000.
2016-03-22 22:25:08 +00:00
John Baldwin
4a5202f9c4 Check IPI status more frequently when waiting.
An IPI cannot be sent via the local APIC if a previous IPI is still
being delivered.  Attempts to send an IPI will wait for a pending IPI
to clear.  Prior to r278325 these checks used a spin loop with a
hardcoded maximum count which broke AP startup on some systems.
However, r278325 also enforced a minimum latency of 5 microseconds if an
IPI was still pending which resulted in a measurable performance hit.
This change reduces that minimum latency to 1 microsecond.

Tested by:	stas
MFC after:	3 days
2016-03-18 19:48:49 +00:00
Justin Hibbits
da1b038af9 Use uintmax_t (typedef'd to rman_res_t type) for rman ranges.
On some architectures, u_long isn't large enough for resource definitions.
Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but
type `long' is only 32-bit.  This extends rman's resources to uintmax_t.  With
this change, any resource can feasibly be placed anywhere in physical memory
(within the constraints of the driver).

Why uintmax_t and not something machine dependent, or uint64_t?  Though it's
possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on
32-bit architectures.  64-bit architectures should have plenty of RAM to absorb
the increase on resource sizes if and when this occurs, and the number of
resources on memory-constrained systems should be sufficiently small as to not
pose a drastic overhead.  That being said, uintmax_t was chosen for source
clarity.  If it's specified as uint64_t, all printf()-like calls would either
need casts to uintmax_t, or be littered with PRI*64 macros.  Casts to uintmax_t
aren't horrible, but it would also bake into the API for
resource_list_print_type() either a hidden assumption that entries get cast to
uintmax_t for printing, or these calls would need the PRI*64 macros.  Since
source code is meant to be read more often than written, I chose the clearest
path of simply using uintmax_t.

Tested on a PowerPC p5020-based board, which places all device resources in
0xfxxxxxxxx, and has 8GB RAM.
Regression tested on qemu-system-i386
Regression tested on qemu-system-mips (malta profile)

Tested PAE and devinfo on virtualbox (live CD)

Special thanks to bz for his testing on ARM.

Reviewed By: bz, jhb (previous)
Relnotes:	Yes
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D4544
2016-03-18 01:28:41 +00:00
Justin Hibbits
534ccd7bbf Replace all resource occurrences of '0UL/~0UL' with '0/~0'.
Summary:
The idea behind this is '~0ul' is well-defined, and casting to uintmax_t, on a
32-bit platform, will leave the upper 32 bits as 0.  The maximum range of a
resource is 0xFFF.... (all bits of the full type set).  By dropping the 'ul'
suffix, C type promotion rules apply, and the sign extension of ~0 on 32 bit
platforms gets it to a type-independent 'unsigned max'.

Reviewed By: cem
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5255
2016-03-03 05:07:35 +00:00
John Baldwin
cbc4d2db75 Remove taskqueue_enqueue_fast().
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167.  It has been a compat shim ever
since.  It's time for the compat shim to go.

Submitted by:	Howard Su <howard0su@gmail.com>
Reviewed by:	sephe
Differential Revision:	https://reviews.freebsd.org/D5131
2016-03-01 17:47:32 +00:00
Justin Hibbits
e665eafb25 Correct the memory rman ranges to be to BUS_SPACE_MAXADDR
Summary:
As part of the migration of rman_res_t to be typed to uintmax_t, memory ranges
must be clamped appropriately for the bus, to prevent completely bogus addresses
from being used.

This is extracted from D4544.

Reviewed By: cem
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5134
2016-03-01 02:59:06 +00:00
Jung-uk Kim
0eda5b3f23 Silence PVS-Studio warning (V595). It can never be NULL here. 2016-02-23 23:57:24 +00:00
Svatopluk Kraus
a1e1814d76 As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to
include it explicitly when <vm/pmap.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5373
2016-02-22 09:02:20 +00:00
Konstantin Belousov
2fe1339ea2 Some BIOSes ACPI bytecode needs to take (sleepable) acpi mutex for
acpi_GetInteger() execution.  Intel DMAR interrupt remapping code
needs to know UID of the HPET to properly route the FSB interrupts
from the HPET, even when interrupt remapping is disabled, and the code
is executed under some non-sleepable mutexes.

Cache HPET UIDs in the device softc at the attach time and provide
lock-less method to get UID, use the method from the dmar hpet
handling code instead of calling GetInteger().

Reported and tested by:	Larry Rosenman <ler@lerctr.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-02-20 13:37:04 +00:00
Justin Hibbits
7915adb560 Introduce a RMAN_IS_DEFAULT_RANGE() macro, and use it.
This simplifies checking for default resource range for bus_alloc_resource(),
and improves readability.

This is part of, and related to, the migration of rman_res_t from u_long to
uintmax_t.

Discussed with:	jhb
Suggested by:	marcel
2016-02-20 01:32:58 +00:00
Konstantin Belousov
90edf67ecf POSIX states that #include <signal.h> shall make both mcontext_t and
ucontext_t available.  Our code even has XXX comment about this.

Add a bit of compliance by moving struct __ucontext definition into
sys/_ucontext.h and including it into signal.h and sys/ucontext.h.

Several machine/ucontext.h headers were changed to use namespace-safe
types (like uint64_t->__uint64_t) to not depend on sys/types.h.
struct __stack_t from sys/signal.h is made always visible in private
namespace to satisfy sys/_ucontext.h requirements.

Apparently mips _types.h pollutes global namespace with f_register_t
type definition.  This commit does not try to fix the issue.

PR:	207079
Reported and tested by:	Ting-Wei Lan <lantw44@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-02-12 07:38:19 +00:00
Justin Hibbits
2dd1bdf183 Convert rman to use rman_res_t instead of u_long
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources.  For
now, this is still compatible with u_long.

This is step one in migrating rman to use uintmax_t for resources instead of
u_long.

Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.

This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.

Reviewed By: jhb
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
2016-01-27 02:23:54 +00:00
Sepherosa Ziehau
69a53a7a3a hyperv: use x86 generic code to do the hypervisor detection
This is first step to move the generic part of HV code into kernel instead
of module, so that it is possible to use hypercall to implement some other
paravirtualization code in the kernel.

Submitted by:		Howard Su <howard0su@gmail.com>
Reviewed by:		royger, delphij, adrian
Approved by:		adrian (mentor)
Sponsored by:		Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D3072
2016-01-14 02:50:13 +00:00
Ed Maste
0e42ee5dd8 Move amd64 metadata.h to x86 and share with i386
MFC after:	1 week
2016-01-07 19:47:26 +00:00
Ian Lepore
69dcb7e771 Make the 'env' directive described in config(5) work on all architectures,
providing compiled-in static environment data that is used instead of any
data passed in from a boot loader.

Previously 'env' worked only on i386 and arm xscale systems, because it
required the MD startup code to examine the global envmode variable and
decide whether to use static_env or an environment obtained from the boot
loader, and set the global kern_envp accordingly.  Most startup code wasn't
doing so.  Making things even more complex, some mips startup code uses an
alternate scheme that involves calling init_static_kenv() to pass an empty
buffer and its size, then uses a series of kern_setenv() calls to populate
that buffer.

Now all MD startup code calls init_static_kenv(), and that routine provides
a single point where envmode is checked and the decision is made whether to
use the compiled-in static_kenv or the values provided by the MD code.

The routine also continues to serve its original purpose for mips; if a
non-zero buffer size is passed the routine installs the empty buffer ready
to accept kern_setenv() values.  Now if the size is zero, the provided buffer
full of existing env data is installed.  A NULL pointer can be passed if the
boot loader provides no env data; this allows the static env to be installed
if envmode is set to do so.

Most of the work here is a near-mechanical change to call the init function
instead of directly setting kern_envp.  A notable exception is in xen/pv.c;
that code was originally installing a buffer full of preformatted env data
along with its non-zero size (like mips code does), which would have allowed
kern_setenv() calls to wipe out the preformatted data.  Now it passes a zero
for the size so that the buffer of data it installs is treated as
non-writeable.
2016-01-02 02:53:48 +00:00
Konstantin Belousov
6b247f858e Add standard extended feature bit 6 from the Intel SDM rev. 57, which
indicates that data-pointer in the saved x87 FPU state is only updated
on FPU exceptions.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-29 22:14:21 +00:00
John Baldwin
9e8d8b4b0c Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c.
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D4670
2015-12-23 21:41:42 +00:00
Enji Cooper
b59f7a7ad8 Remove redundant declarations in sys/x86/xen which are now handled in other sys/x86
headers

Differential Revision: https://reviews.freebsd.org/D4685
X-MFC with: r291949
Sponsored by: EMC / Isilon Storage Division
2015-12-23 17:43:55 +00:00
Conrad Meyer
986fd63b46 x86: Add CPUID_STDEXT_* macros for CPU feature bits
A follow-up to r292478 and r292488.

Sponsored by:	EMC / Isilon Storage Division
2015-12-21 04:42:58 +00:00
Conrad Meyer
ce43b54ab2 x86: Detect feature flags "AVX512DQ", "AVX512IFMA", "AVX512BW", "AVX512VBMI"
Documented in Intel Architecture Set Extensions Programming Reference
(319433-023).

Sponsored by:	EMC / Isilon Storage Division
2015-12-20 03:34:30 +00:00
Conrad Meyer
f750a7edaa x86: Detect feature flags "CLWB" and "PCOMMIT"
"The availability of CLWB instruction is indicated by the presence of
the CPUID feature flag CLWB (bit 24 of the EBX register)."

CLWB is similar to CLFLUSHOPT, except that it is not required to discard
cacheline contents.

"On processors that supports PCOMMIT, PCOMMIT is enumerated through
CPUID (CPUID.7.0.EBX[22]) only when the feature is enabled by BIOS."

PCOMMIT is used to cause store-to-memory operations to become persistent
(protected from power failure).

Sponsored by:	EMC / Isilon Storage Division
2015-12-19 20:47:15 +00:00
Roger Pau Monné
a7285da666 x86/bounce: try to always completely fill bounce pages
Current code doesn't try to make use of the full page when bouncing because
the size is only expanded to be a multiple of the alignment. Instead try to
always create segments of PAGE_SIZE when using bounce pages.

This allows us to remove the specific casing done for
BUS_DMA_KEEP_PG_OFFSET, since the requirement is to make sure the offsets
into contiguous segments are aligned, and now this is done by default.

Sponsored by:		Citrix Systems R&D
Reviewed by:		hps, kib
Differential revision:	https://reviews.freebsd.org/D4119
2015-12-15 10:07:03 +00:00
Konstantin Belousov
7c958a41fe Merge common parts of i386 and amd64 md_var.h and smp.h into
new headers x86/include x86_var.h and x86_smp.h.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4358
2015-12-07 17:41:20 +00:00
Konstantin Belousov
9a5d210cb4 It seems that at least some KVM versions advertise support for EIO
suppression but the version of the IOAPIC reported is 0x11 and neither
IOAPIC EOIR nor the Linux trick of temporal reprogramming of the pin
to edge-trigger mode to issue EOI work.

Disable eoi suppression if KVM is detected.  The mode can still be
forced with the tunable.

Reported and tested by:	Roman Mamontov <mr.xanto@gmail.com>
Sponsored by:	The FreeBSD Foundation
2015-12-05 08:52:37 +00:00
Konstantin Belousov
27691a24ab For amd64 non-PCID machines, and for i386 machines with support for
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*].  This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU.  Note that current code does not issue total
invalidation requests for the kernel_pmap.

Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386.  Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.

To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical.  Merge the code under x86/.

Noted by:	jhb [*]
Reviewed by:	cem, jhb, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4346
2015-12-03 11:14:14 +00:00
Konstantin Belousov
906430e4f0 In the SandyBridge x2APIC workaround detection code, only fetch the
environment variable when SandyBridge CPU is detected.  Reduce code
duplication.

Sponsored by:	The FreeBSD Foundation
2015-12-03 10:59:10 +00:00
Konstantin Belousov
2a8a46b161 Correct the number of DTLB entries reported for the CPUID Leaf 2
descriptor 0x6c.

Confirmed by:	Intel
MFC after:	3 days
2015-11-24 19:55:11 +00:00
Svatopluk Kraus
eae22c4430 Revert r291142.
The not quite consistent logic for bounce pages allocation is utilizited
by re(4) interface which can hang now.

Approved by:	kib (mentor)
2015-11-23 11:19:00 +00:00
Svatopluk Kraus
6fa7734d6f Fix BUS_DMA_MIN_ALLOC_COMP flag logic. When bus_dmamap_t map is being
created for bus_dma_tag_t tag, bounce pages should be allocated
only if needed.

Before the fix, they were allocated always if BUS_DMA_COULD_BOUNCE flag
was set but BUS_DMA_MIN_ALLOC_COMP not. As bounce pages are never freed,
it could cause memory exhaustion when a lot of such tags together with
their maps were created.

Note that there could be more maps in one tag by current design.
However BUS_DMA_MIN_ALLOC_COMP flag is tag's flag. It's set after
bounce pages are allocated. Thus, they are allocated only for first
tag's map which needs them.

Approved by:	kib (mentor)
2015-11-21 19:55:01 +00:00
Marius Strobl
ec2fbee752 Avoid a NULL pointer dereference in bounce_bus_dmamap_unload() when
the map has been created via bounce_bus_dmamem_alloc(). In that case
bus_dmamap_unload(9) typically isn't called during normal operation
but still should be during detach, cleanup from failed attach etc.

Submitted by:	yongari
MFC after:	3 days
2015-11-21 02:08:47 +00:00
Marius Strobl
8fd47ac11c Avoid a NULL pointer dereference in bounce_bus_dmamap_sync() when the
map has been created via bounce_bus_dmamem_alloc(). Even for coherent
DMA - which bus_dmamem_alloc(9) typically is used for -, calling of
bus_dmamap_sync(9) isn't optional.

PR:		188899 (non-original problem)
MFC after:	3 days
2015-11-20 02:23:35 +00:00
Roger Pau Monné
1522652230 xen: fix dropping bitmap IPIs during resume
Current Xen resume code clears all pending bitmap IPIs on resume, which is
not correct. Instead re-inject bitmap IPI vectors on resume to all CPUs in
order to acknowledge any pending bitmap IPIs.

Sponsored by:		Citrix Systems R&D
MFC after:		2 weeks
2015-11-18 18:11:19 +00:00
Roger Pau Monné
ea64b86f94 xen/intr: properly dispose event channels on resume
All event channels are torn down when performing a migration on Xen, make
sure all handlers are also removed and the event channel structure is
properly disposed so it can be reused.

Sponsored by:		Citrix Systems R&D
MFC after:		2 weeks
2015-11-18 18:10:28 +00:00
Roger Pau Monné
531cfe55e2 x86/intr: allow mutex recursion in intr_remove_handler
This is needed so interrupt handlers can be removed while the PIC is
resuming, it was previously not possible due to intr_resume holding the
intr_table_lock and intr_remove_handler recursing on it.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib (previous version)
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D4114
2015-11-18 18:09:49 +00:00
Roger Pau Monné
5c4133b1b5 x86/dma_bounce: rework _bus_dmamap_load_ma implementation
The implementation of bus_dmamap_load_ma_triv currently calls
_bus_dmamap_load_phys on each page that is part of the passed in buffer.
Since each page is treated as an individual buffer, the resulting behaviour
is different from the behaviour of _bus_dmamap_load_buffer. This breaks
certain drivers, like Xen blkfront.

If an unmapped buffer of size 4096 that starts at offset 13 into the first
page is passed to the current _bus_dmamap_load_ma implementation (so the ma
array contains two pages), the result is that two segments are created, one
with a size of 4083 and the other with size 13 (because two independant
calls to _bus_dmamap_load_phys are performed, one for each physical page).
If the same is done with a mapped buffer and calling _bus_dmamap_load_buffer
the result is that only one segment is created, with a size of 4096.

This patch relegates the usage of bus_dmamap_load_ma_triv in x86 bounce
buffer code to drivers requesting BUS_DMA_KEEP_PG_OFFSET and implements
_bus_dmamap_load_ma so that it's behaviour is the same as the mapped version
(_bus_dmamap_load_buffer). This patch only modifies the x86 bounce buffer
code, other arches are left untouched.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib, jah (previous version)
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D888
2015-11-09 12:19:58 +00:00
Tijl Coosemans
27f38a8d69 Since r289279 bufinit() uses mp_ncpus, but some architectures set this
variable during mp_start() which is too late.  Move this to mp_setmaxid()
where other architectures set it and move x86 assertions to MI code.

Reviewed by:	kib (x86 part)
2015-11-08 14:26:50 +00:00
Roger Pau Monné
f186ed526a xen/intr: fix the event channel enabled per-cpu mask
Fix two issues with the current event channel code, first ENABLED_SETSIZE is
not correctly defined and then using a BITSET to store the per-cpu masks is
not portable to other arches, since on arm32 the event channel arrays shared
with the hypervisor are of type uint64_t and not long. Partially restore the
previous code but switch the bit operations to use the recently introduced
xen_{set/clear/test}_bit versions.

Reviewed by:		Julien Grall <julien.grall@citrix.com>
Sponsored by:		Citrix Systems R&D
Differential Revision:	https://reviews.freebsd.org/D4080
2015-11-05 14:33:46 +00:00
Ian Lepore
53f93ed3ff Fix an alignment check that is wrong in half the busdma implementations.
This will enable the elimination of a workaround in the USB driver that
artifically allocates buffers twice as big as they need to be (which
actually saves memory for very small buffers on the buggy platforms).

When deciding how to allocate a dma buffer, armv4, armv6, mips, and
x86/iommu all correctly check for the tag alignment <= maxsize as enabling
simple uma/malloc based allocation.  Powerpc, sparc64, x86/bounce, and
arm64/bounce were all checking for alignment < maxsize; on those platforms
when alignment was equal to the max size it would fall back to page-based
allocators even for very small buffers.

This change makes all platforms use the <= check.  It should be noted that
on all platforms other than arm[v6] and mips, this check is relying on
undocumented behavior in malloc(9) that if you allocate a block of a given
size it will be aligned to the next larger power-of-2 boundary.  There is
nothing in the malloc(9) man page that makes that explicit promise (but the
busdma code has been relying on this behavior all along so I guess it works).

Arm and mips code uses the allocator in kern/subr_busdma_buffalloc.c, which
does explicitly implement this promise about size and alignment.  Other
platforms probably should switch to the aligned allocator.
2015-11-02 23:37:19 +00:00
Roger Pau Monné
f4576dd975 x86/dma_bounce: revert r289834 and r289836
The new load_ma implementation can cause dereferences when used with
certain drivers, back it out until the reason is found:

Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 03
fault virtual address   = 0x30
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff808a2d22
stack pointer           = 0x28:0xfffffe07cc737710
frame pointer           = 0x28:0xfffffe07cc737790
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 13 (g_down)
trap number             = 12
panic: page fault
cpuid = 11
KDB: stack backtrace:
#0 0xffffffff80641647 at kdb_backtrace+0x67
#1 0xffffffff80606762 at vpanic+0x182
#2 0xffffffff806067e3 at panic+0x43
#3 0xffffffff8084eef1 at trap_fatal+0x351
#4 0xffffffff8084f0e4 at trap_pfault+0x1e4
#5 0xffffffff8084e82f at trap+0x4bf
#6 0xffffffff80830d57 at calltrap+0x8
#7 0xffffffff8063beab at _bus_dmamap_load_ccb+0x1fb
#8 0xffffffff8063bc51 at bus_dmamap_load_ccb+0x91
#9 0xffffffff8042dcad at ata_dmaload+0x11d
#10 0xffffffff8042df7e at ata_begin_transaction+0x7e
#11 0xffffffff8042c18e at ataaction+0x9ce
#12 0xffffffff802a220f at xpt_run_devq+0x5bf
#13 0xffffffff802a17ad at xpt_action_default+0x94d
#14 0xffffffff802c0024 at adastart+0x8b4
#15 0xffffffff802a2e93 at xpt_run_allocq+0x193
#16 0xffffffff802c0735 at adastrategy+0xf5
#17 0xffffffff80554206 at g_disk_start+0x426
Uptime: 2m29s
2015-10-26 14:50:35 +00:00
Conrad Meyer
ce7543042c xen: Add missing semi-colon for BITSET_DEFINE()
Broken when it was removed from the macro in r289867.

Pointy-hat:	markj
Sponsored by:	EMC / Isilon Storage Division
2015-10-24 19:04:55 +00:00
Roger Pau Monné
59cd0f10b3 x86/dma_bounce: rework _bus_dmamap_load_ma implementation
The implementation of bus_dmamap_load_ma_triv currently calls
_bus_dmamap_load_phys on each page that is part of the passed in buffer.
Since each page is treated as an individual buffer, the resulting behaviour
is different from the behaviour of _bus_dmamap_load_buffer. This breaks
certain drivers, like Xen blkfront.

If an unmapped buffer of size 4096 that starts at offset 13 into the first
page is passed to the current _bus_dmamap_load_ma implementation (so the ma
array contains two pages), the result is that two segments are created, one
with a size of 4083 and the other with size 13 (because two independant
calls to _bus_dmamap_load_phys are performed, one for each physical page).
If the same is done with a mapped buffer and calling _bus_dmamap_load_buffer
the result is that only one segment is created, with a size of 4096.

This patch relegates the usage of bus_dmamap_load_ma_triv in x86 bounce
buffer code to drivers requesting BUS_DMA_KEEP_PG_OFFSET and implements
_bus_dmamap_load_ma so that it's behaviour is the same as the mapped version
(_bus_dmamap_load_buffer). This patch only modifies the x86 bounce buffer
code, other arches are left untouched.

Reviewed by:		kib, jah
Differential Revision:	https://reviews.freebsd.org/D888
Sponsored by:		Citrix Systems R&D
2015-10-23 15:39:59 +00:00
Jason A. Harmening
a50730587b Remove unclear comment about address truncation in busdma. Add (hopefully much clearer) comment at declaration of PHYS_TO_VM_PAGE().
Noted by:	avg
2015-10-23 12:03:25 +00:00
Konstantin Belousov
c0db387d25 Decode new values for CPUID leaf 2 cache and TLB descriptors, from the
Intel SDM revision 56.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-23 11:43:56 +00:00
Roger Pau Monné
2f9ec994bc xen: Code cleanup and small bug fixes
xen/hypervisor.h:
 - Remove unused helpers: MULTI_update_va_mapping, is_initial_xendomain,
   is_running_on_xen
 - Remove unused define CONFIG_X86_PAE
 - Remove unused variable xen_start_info: note that it's used inpcifront
   which is not built at all
 - Remove forward declaration of HYPERVISOR_crash

xen/xen-os.h:
 - Remove unused define CONFIG_X86_PAE
 - Drop unused helpers: test_and_clear_bit, clear_bit,
   force_evtchn_callback
 - Implement a generic version (based on ofed/include/linux/bitops.h) of
   set_bit and test_bit and prefix them by xen_ to avoid any use by other
   code than Xen. Note that It would be worth to investigate a generic
   implementation in FreeBSD.
 - Replace barrier() by __compiler_membar()
 - Replace cpu_relax() by cpu_spinwait(): it's exactly the same as rep;nop
   = pause

xen/xen_intr.h:
 - Move the prototype of xen_intr_handle_upcall in it: Use by all the
   platform

x86/xen/xen_intr.c:
 - Use BITSET* for the enabledbits: Avoid to use custom helpers
 - test_bit/set_bit has been renamed to xen_test_bit/xen_set_bit
 - Don't export the variable xen_intr_pcpu

dev/xen/blkback/blkback.c:
 - Fix the string format when XBB_DEBUG is enabled: host_addr is typed
   uint64_t

dev/xen/balloon/balloon.c:
 - Remove set but not used variable
 - Use the correct type for frame_list: xen_pfn_t represents the frame
   number on any architecture

dev/xen/control/control.c:
 - Return BUS_PROBE_WILDCARD in xs_probe: Returning 0 in a probe callback
   means the driver can handle this device. If by any chance xenstore is the
   first driver, every new device with the driver is unset will use
   xenstore.

dev/xen/grant-table/grant_table.c:
 - Remove unused cmpxchg
 - Drop unused include opt_pmap.h: Doesn't exist on ARM64 and it doesn't
   contain anything required for the code on x86

dev/xen/netfront/netfront.c:
 - Use the correct type for rx_pfn_array: xen_pfn_t represents the frame
   number on any architecture

dev/xen/netback/netback.c:
 - Use the correct type for gmfn: xen_pfn_t represents the frame number on
   any architecture

dev/xen/xenstore/xenstore.c:
 - Return BUS_PROBE_WILDCARD in xctrl_probe: Returning 0 in a probe callback
   means the driver can handle this device. If by any chance xenstore is the
  first driver, every new device with the driver is unset will use xenstore.

Note that with the changes, x86/include/xen/xen-os.h doesn't contain anymore
arch-specific code. Although, a new series will add some helpers that differ
between x86 and ARM64, so I've kept the headers for now.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3921
Sponsored by:		Citrix Systems R&D
2015-10-21 10:44:07 +00:00
Roger Pau Monné
6a306bff7f x86/xen: Consolidate xen-os.h in a single place
amd64 and i386 platform code contain very similar xen/xen-os.h

The only differences are:
 - Functions/variables/types which were unused in i386/xen/xen-os.h:
    * xen_xchg
    * __xchg_dummy
    * __xg
    * __xchg
    * atomic_t
    * atomic_inc
    * rdtscll

The functions/variables/types unused in xen-os.h can be dropped and there
is no more differences betwen amd64 and i386.

The new header is placed in x86/include/xen and each platform will have
dummy headers include x86/xen/*.h. This is to be able to include
machine/xen/*.h in the PV drivers.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3880
Sponsored by:		Citrix Systems R&D
2015-10-21 10:04:35 +00:00
Jason A. Harmening
012cf46f07 Don't page-align the physical address when calling PHYS_TO_VM_PAGE().
M    busdma_bounce.c
2015-10-17 14:58:55 +00:00
Jason A. Harmening
dcaa560af0 Ensure the client regions for unmapped bounce buffers created through bus_dmamap_load_phys() do not span multiple pages.
This is already done for mapped buffers.
While here, stop casting bus_addr_t to vm_offset_t.
2015-10-13 02:17:56 +00:00
Bjoern A. Zeeb
6b1ad46a3b dmar_ctx_dtr() does not exist since r284869. Remove the static function
declaration to avoid a cmpile time warning.
2015-09-22 16:50:59 +00:00
Zbigniew Bodek
18c72666ce Add domain support to PCI bus allocation
When the system has more than a single PCI domain, the bus numbers
are not unique, thus they cannot be used for "pci" device numbering.
Change bus numbers to -1 (i.e. to-be-determined automatically)
wherever the code did not care about domains.

Reviewed by:   jhb
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3406
2015-09-16 23:34:51 +00:00
Adrian Chadd
a14bc739d5 Add ASUS Sandybridge laptops to the similar x2apic disable logic
that was recently added for Lenovo laptops.

This is a prime candidate for conversion into a table and also
checking other fields like "product".

Tested:

* ASUS UX31E
2015-09-16 01:44:11 +00:00
Mark Johnston
610141cebb Add stack_save_td_running(), a function to trace the kernel stack of a
running thread.

It is currently implemented only on amd64 and i386; on these
architectures, it is implemented by raising an NMI on the CPU on which
the target thread is currently running. Unlike stack_save_td(), it may
fail, for example if the thread is running in user mode.

This change also modifies the kern.proc.kstack sysctl to use this function,
so that stacks of running threads are shown in the output of "procstat -kk".
This is handy for debugging threads that are stuck in a busy loop.

Reviewed by:	bdrewery, jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3256
2015-09-11 03:54:37 +00:00
Mark Johnston
1e954a7c63 Remove the arg0 field from struct amd64_frame. Its existence was a bug,
since on amd64 the first argument to a function is generally not on the
stack.

Revert an old DTrace bug fix to some code that assumed that
sizeof(struct amd64_frame) == 16.

Reviewed by:	jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3255
2015-09-11 03:31:22 +00:00