Commit Graph

1997 Commits

Author SHA1 Message Date
Konstantin Belousov
2f73a6e9c4 Correct definition for PGEX_SGX.
At the moment it is only used for page fault error code textual
representation.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-06-08 20:26:04 +00:00
Konstantin Belousov
185f7e0a9d amd64 ef_rt_arch_call: Preserve %rflags around call into EFI RT service.
If service code faulted, we might end up unwinding with interrupts
disabled.  Top-level kernel code should have interrupts enabled, which
is enforced by checks.

Save %rflags before entering EFI, and restore to the known good value
on return.  This handles situation with disabled interrupts on fault
and perhaps other potential bugs, e.g. invalid value for PSL_D.

Reported and tested by:	Jan Martin Mikkelsen <janm@transactionware.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-03 15:32:42 +00:00
Konstantin Belousov
f3dbf2ca4a Add PG_PS_PDP_FRAME symbol.
Similar to PG_FRAME and PG_PS_FRAME, it denotes the mask of the
physical address component of 1G superpage PDP entry.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20386
2019-05-24 23:26:17 +00:00
Konstantin Belousov
4d3b28bcdc amd64 pmap: rework delayed invalidation, removing global mutex.
For machines having cmpxcgh16b instruction, i.e. everything but very
early Athlons, provide lockless implementation of delayed
invalidation.

The implementation maintains lock-less single-linked list with the
trick from the T.L. Harris article about volatile mark of the elements
being removed. Double-CAS is used to atomically update both link and
generation.  New thread starting DI appends itself to the end of the
queue, setting the generation to the generation of the last element
+1.  On DI finish, thread donates its generation to the previous
element.  The generation of the fake head of the list is the last
passed DI generation.  Basically, the implementation is a queued
spinlock but without spinlock.

Many thanks both to Peter Holm and Mark Johnson for keeping with me
while I produced intermediate versions of the patch.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
MFC note:	td_md.md_invl_gen should go to the end of struct thread
Differential revision:	https://reviews.freebsd.org/D19630
2019-05-16 13:28:48 +00:00
Konstantin Belousov
7355a02bdd Mitigations for Microarchitectural Data Sampling.
Microarchitectural buffers on some Intel processors utilizing
speculative execution may allow a local process to obtain a memory
disclosure.  An attacker may be able to read secret data from the
kernel or from a process when executing untrusted code (for example,
in a web browser).

Reference: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00233.html
Security:	CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091
Security:	FreeBSD-SA-19:07.mds
Reviewed by:	jhb
Tested by:	emaste, lwhsu
Approved by:	so (gtetlow)
2019-05-14 17:02:20 +00:00
Mateusz Guzik
a8c2fcb287 x86: store pending bitmapped IPIs in per-cpu areas
This gets rid of the global cpu_ipi_pending array.

While replace cmpset with fcmpset in the delivery code and opportunistically
check if given IPI is already pending.

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:36:54 +00:00
Conrad Meyer
665919aaaf x86: Implement MWAIT support for stopping a CPU
IPI_STOP is used after panic or when ddb is entered manually.  MONITOR/
MWAIT allows CPUs that support the feature to sleep in a low power way
instead of spinning.  Something similar is already used at idle.

It is perhaps especially useful in oversubscribed VM environments, and is
safe to use even if the panic/ddb thread is not the BSP.  (Except in the
presence of MWAIT errata, which are detected automatically on platforms with
known wakeup problems.)

It can be tuned/sysctled with "machdep.stop_mwait," which defaults to 0
(off).  This commit also introduces the tunable
"machdep.mwait_cpustop_broken," which defaults to 0, unless the CPU has
known errata, but may be set to "1" in loader.conf to signal that mwait
wakeup is broken on CPUs FreeBSD does not yet know about.

Unfortunately, Bhyve doesn't yet support MONITOR extensions, so this doesn't
help bhyve hypervisors running FreeBSD guests.

Submitted by:   Anton Rang <rang AT acm.org> (earlier version)
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20135
2019-05-04 20:34:26 +00:00
Conrad Meyer
83dc49beaf x86: Define pc_monitorbuf as a logical structure
Rather than just accessing it via pointer cast.

No functional change intended.

Discussed with:	kib (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20135
2019-05-04 17:35:13 +00:00
Rodney W. Grimes
a488c9c99a Add accessor function for vm->maxcpus
Replace most VM_MAXCPU constant useses with an accessor function to
vm->maxcpus which for now is initialized and kept at the value of
VM_MAXCPUS.

This is a rework of Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
work from D10070 to adjust it for the cpu topology changes that
occured in r332298

Submitted by:		Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
Reviewed by:		Patrick Mooney <patrick.mooney@joyent.com>
Approved by:		bde (mentor), jhb (maintainer)
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D18755
2019-04-25 22:51:36 +00:00
Konstantin Belousov
fd8d844f76 amd64 KPTI: add control from procctl(2).
Add the infrastructure to allow MD procctl(2) commands, and use it to
introduce amd64 PTI control and reporting.  PTI mode cannot be
modified for existing pmap, the knob controls PTI of the new vmspace
created on exec.

Requested by:	jhb
Reviewed by:	jhb, markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19514
2019-03-16 11:44:33 +00:00
Konstantin Belousov
6f1fe3305a amd64: Add md process flags and first P_MD_PTI flag.
PTI mode for the process pmap on exec is activated iff P_MD_PTI is set.

On exec, the existing vmspace can be reused only if pti mode of the
pmap matches the P_MD_PTI flag of the process.  Add MD
cpu_exec_vmspace_reuse() callback for exec_new_vmspace() which can
vetoed reuse of the existing vmspace.

MFC note: md_flags change struct proc KBI.

Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19514
2019-03-16 11:31:01 +00:00
John Baldwin
2e43efd0bb Drop "All rights reserved" from my copyright statements.
Reviewed by:	rgrimes
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19485
2019-03-06 22:11:45 +00:00
Konstantin Belousov
e7a9df16e6 Add kernel support for Intel userspace protection keys feature on
Skylake Xeons.

See SDM rev. 68 Vol 3 4.6.2 Protection Keys and the description of the
RDPKRU and WRPKRU instructions.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-20 09:51:13 +00:00
Konstantin Belousov
87b1bf4f31 amd64: add defines and decode protection keys and SGX page faults reasons.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-20 09:46:44 +00:00
Konstantin Belousov
5ddeaf67c6 Provide convenience C wrappers for RDPKRU and WRPKRU instructions.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-19 19:17:20 +00:00
John Baldwin
7f7f6f85a1 Add a custom implementation of cpu_lock_delay() for x86.
Avoid using DELAY() since it can try to use spin locks on CPUs without
a P-state invariant TSC.  For cpu_lock_delay(), always use the TSC if
it exists (even if it is not P-state invariant) to delay for a
microsecond.  If the TSC does not exist, read from I/O port 0x84 to
delay instead.

PR:		228768
Reported by:	Roger Hammerstein <cheeky.m@live.com>
Reviewed by:	kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17851
2018-11-05 22:54:03 +00:00
John Baldwin
4cbbb74888 Add a KPI for the delay while spinning on a spin lock.
Replace a call to DELAY(1) with a new cpu_lock_delay() KPI.  Currently
cpu_lock_delay() is defined to DELAY(1) on all platforms.  However,
platforms with a DELAY() implementation that uses spin locks should
implement a custom cpu_lock_delay() doesn't use locks.

Reviewed by:	kib
MFC after:	3 days
2018-11-05 21:34:17 +00:00
Konstantin Belousov
6bc6a54280 Add pci_early function to detect Intel stolen memory.
On some Intel devices BIOS does not properly reserve memory (called
"stolen memory") for the GPU.  If the stolen memory is claimed by the
OS, functions that depend on stolen memory (like frame buffer
compression) can't be used.

A function called pci_early_quirks that is called before the virtual
memory system is started was added. In Linux, this PCI early quirks
function iterates through all PCI slots to check for any device that
require quirks.  While this more generic solution is preferable I only
ported the Intel graphics specific parts because I think my
implementation would be too similar to Linux GPL'd solution after
looking at the Linux code too much.

The code regarding Intel graphics stolen memory was ported from
Linux. In the case of Intel graphics stolen memory this
pci_early_quirks will read the stolen memory base and size from north
bridge registers.  The values are stored in global variables that is
later read by linuxkpi_gplv2. Linuxkpi stores these values in a
Linux-specific structure that is read by the drm driver.

Relevant linuxkpi code is here:
https://github.com/FreeBSDDesktop/kms-drm/blob/drm-v4.16/linuxkpi/gplv2/src/linux_compat.c#L37

For now, only amd64 arch is suppor ted since that is the only arch
supported by the new drm drivers. I was told that Intel GPUs are
always located on 0:2:0 so these values are hard coded for now.

Note that the structure and early execution of the detection code is
not required in its current form, but we expect that the code will be
added shortly which fixes the potential BIOS bugs by reserving the
stolen range in phys_avail[].  This must be done as early as possible
to avoid conflicts with the potential usage of the memory in kernel.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Reviewed by:	bwidawsk, imp
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16719
Differential revision:	https://reviews.freebsd.org/D17775
2018-10-31 23:17:00 +00:00
Konstantin Belousov
2dec2b4a34 amd64: flush L1 data cache on syscall return with an error.
The knob allows to select the flushing mode or turn it off/on.  The
idea, as well as the list of the ignored syscall errors, were taken
from https://www.openwall.com/lists/kernel-hardening/2018/10/11/10 .

I was not able to measure statistically significant difference between
flush enabled vs disabled using syscall_timing getuid.

Reviewed by:	bwidawsk
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17536
2018-10-20 23:17:24 +00:00
Mateusz Guzik
c88205e76e amd64: relax constraints in curthread and curpcb
This makes the compiler less likely to reload the content from %gs.

The 'P' modifier drops all synteax prefixes and 'n' constraint treats
input as a known at compilation time immediate integer.

Example reloading victim was spinlock_enter.

Stolen from:	OpenBSD

Reported by:	jtl
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17615
2018-10-20 17:00:18 +00:00
Konstantin Belousov
3ade944019 Do not flush cache for PCIe config window.
Apparently AMD machines cannot tolerate this. This was uncovered by
r339386, where cache flush started really flushing the requested range.

Introduce pmap_mapdev_pciecfg(), which simply does not flush cache
comparing with pmap_mapdev().  It assumes that the MCFG region was
never accessed through the cacheable mapping, which is most likely
true for machine to boot at all.

Note that i386 does not need the change, since the architecture
handles access per-page due to the KVA shortage, and page remapping
already does not flush the cache.

Reported and tested by:	mjg, Mike Tancsa <mike@sentex.net>
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D17612
2018-10-18 20:49:16 +00:00
Konstantin Belousov
2fd0c8e7ca Provide pmap_large_map() KPI on amd64.
The KPI allows to map very large contigous physical memory regions
into KVA, which are not covered by DMAP.

I see both with QEMU and with some real hardware started shipping, the
regions for NVDIMMs might be very far apart from the normal RAM, and
we expect that at least initial users of NVDIMM could install very
large amount of such memory.  IMO it is not reasonable to extend DMAP
to cover that far-away regions both because it could overflow existing
4T window for DMAP in KVA, and because it costs in page table pages
allocations, for gap and for possibly unused NV RAM.

Also, KPI provides some special functionality for fast cache flushing
based on the knowledge of the NVRAM mapping use.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17070
2018-10-16 17:28:10 +00:00
Konstantin Belousov
9d5d89b209 Add clwb().
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D17070
2018-10-16 17:00:42 +00:00
Mateusz Guzik
6816c88458 amd64: partially depessimize cpu_fetch_syscall_args and cpu_set_syscall_retval
Vast majority of syscalls take 6 or less arguments. Move handling of other
cases to a fallback function. Similarly, special casing for _syscall
and __syscall
magic syscalls is moved away.

Return is almost always 0. The change replaces 3 branches with 1 in the common
case. Also the 'frame' variable convinces clang not to reload it on each access.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17542
2018-10-13 21:18:31 +00:00
Mateusz Guzik
3f102f5881 Provide string functions for use before ifuncs get resolved.
The change is a no-op for architectures which don't ifunc memset,
memcpy nor memmove.

Convert places which need them. Xen bits by royger.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17487
2018-10-11 23:28:04 +00:00
John Baldwin
b843f9be5e Fully restore the GDTR, IDTR, and LDTR after VT-x VM exits.
The VT-x VMCS only stores the base address of the GDTR and IDTR.  As a
result, VM exits use a fixed limit of 0xffff for the host GDTR and
IDTR losing the smaller limits set in when the initial GDT is loaded
on each CPU during boot.  Explicitly save and restore the full GDTR
and IDTR contents around VM entries and exits to restore the correct
limit.

Similarly, explicitly save and restore the LDT selector.  VM exits
always clear the host LDTR as if the LDT was loaded with a NULL
selector and a userspace hypervisor is probably using a NULL selector
anyway, but save and restore the LDT explicitly just to be safe.

PR:		230773
Reported by:	John Levon <levon@movementarian.org>
Reviewed by:	kib
Tested by:	araujo
Approved by:	re (rgrimes)
MFC after:	1 week
2018-10-11 18:27:19 +00:00
Andrew Turner
27d2645787 Handle a guest executing a vm instruction by trapping and raising an
undefined instruction exception. Previously we would exit the guest,
however an unprivileged user could execute these.

Found with:	syzkaller
Reviewed by:	araujo, tychon (previous version)
Approved by:	re (kib)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17192
2018-09-27 11:16:19 +00:00
Konstantin Belousov
d12c446550 Convert x86 cache invalidation functions to ifuncs.
This simplifies the runtime logic and reduces the number of
runtime-constant branches.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D16736
2018-09-19 19:35:02 +00:00
Konstantin Belousov
50cd0be78f Catch exceptions during EFI RT calls on amd64.
This appeared to be required to have EFI RT support and EFI RTC
enabled by default, because there are too many reports of faulting
calls on many different machines.  The knob is added to leave the
exceptions unhandled to allow to debug the actual bugs.

Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 21:37:05 +00:00
Konstantin Belousov
1565fb29a7 Add amd64 mdthread fields needed for the upcoming EFI RT exception
handling.

This is split into a separate commit from the main change to make it
easier to handle possible revert after upcoming KBI freeze.

Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 21:16:43 +00:00
Konstantin Belousov
d4be3789fe Normalize use of semicolon with EFI_TIME_LOCK macros.
Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 19:48:41 +00:00
John Baldwin
a800b45c18 Merge amd64 and i386 <machine/intr_machdep.h> headers.
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16803
2018-08-20 12:31:39 +00:00
Konstantin Belousov
c1141fba00 Update L1TF workaround to sustain L1D pollution from NMI.
Current mitigation for L1TF in bhyve flushes L1D either by an explicit
WRMSR command, or by software reading enough uninteresting data to
fully populate all lines of L1D.  If NMI occurs after either of
methods is completed, but before VM entry, L1D becomes polluted with
the cache lines touched by NMI handlers.  There is no interesting data
which NMI accesses, but something sensitive might be co-located on the
same cache line, and then L1TF exposes that to a rogue guest.

Use VM entry MSR load list to ensure atomicity of L1D cache and VM
entry if updated microcode was loaded.  If only software flush method
is available, try to help the bhyve sw flusher by also flushing L1D on
NMI exit to kernel mode.

Suggested by and discussed with: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D16790
2018-08-19 18:47:16 +00:00
Konstantin Belousov
8fba5348fc amd64: ensure that curproc->p_vmspace pmap always matches PCPU
curpmap.

When performing context switch on a machine without PCID, if current
%cr3 equals to the new pmap %cr3, which is typical for kernel_pmap
vs. kernel process, I overlooked to update PCPU curpmap value.  Remove
check for %cr3 not equal to pm_cr3 for doing the update.  It is
believed that this case cannot happen at all, due to other changes in
this revision.

Also, do not set the very first curpmap to kernel_pmap, it should be
vmspace0 pmap instead to match curproc.

Move the common code to activate the initial pmap both on BSP and APs
into pmap_activate_boot() helper.

Reported by: eadler, ambrisko
Discussed with: kevans
Reviewed by:	alc, markj (previous version)
Tested by: ambrisko (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16618
2018-08-14 16:37:14 +00:00
Konstantin Belousov
b3a7db3b06 Use SMAP on amd64.
Ifuncs selectors dispatch copyin(9) family to the suitable variant, to
set rflags.AC around userspace access.  Rflags.AC bit is cleared in
all kernel entry points unconditionally even on machines not
supporting SMAP.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13838
2018-07-29 20:47:00 +00:00
Warner Losh
67d33338c0 Rename VM_FREELIST_ISADMA to VM_FREELIST_LOWMEM.
There's no differene between VM_FREELIST_ISADMA and VM_FREELIST_LOWMEM
except for the default boundary (16MB on x86 and 256MB on MIPS, but
they are otherwise the same). We don't need both for any system we
support (there were some really old ARC systems that did have ISA/EISA
bus, but we never ran on them and they are too old to ever grow
support for).

Differential Review: https://reviews.freebsd.org/D16290
2018-07-27 18:34:20 +00:00
Konstantin Belousov
53dec71d39 Expand x86 struct pcpus to UMA_PCPU_ALLOC_SIZE AKA PAGE_SIZE.
This restores counters(9) operation.
Revert r336024. Improve assert of pcpu size on x86.

Reviewed by:	mmacy
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D16163
2018-07-06 19:50:44 +00:00
Konstantin Belousov
fb0a281196 Revert to recommit with the proper message. 2018-07-06 19:50:25 +00:00
Konstantin Belousov
1614716655 Save a call to pmap_remove() if entry cannot have any pages mapped.
Due to the way rtld creates mappings for the shared objects, each dso
causes unmap of at least three guard map entries.  For instance, in
the buildworld load, this change reduces the amount of pmap_remove()
calls by 1/5.

Profiled by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16148
2018-07-06 19:48:47 +00:00
Hans Petter Selasky
a7a7f5b472 Make sure kernel modules built by default are portable between UP and
SMP systems by extending defined(SMP) to include defined(KLD_MODULE).

This is a regression issue after r335873 .

Discussed with:		mmacy@
Sponsored by:		Mellanox Technologies
2018-07-06 10:13:42 +00:00
Matt Macy
428194fed2 counter(9): unbreak amd64 following r336020
Apply temporary fix to counter until daylight hours.
The fact that the assembly for counter_u64_add relied on the sizeof(struct pcpu) was
the basis for the otherwise arbitrary offset never came up in D15933.
critical_{enter,exit} is now inline so the only real added overhead is the
added (mostly false) conditional branch in exit.
2018-07-06 10:10:00 +00:00
Matt Macy
ab3059a8e7 Back pcpu zone with domain correct pages
- Change pcpu zone consumers to use a stride size of PAGE_SIZE.
  (defined as UMA_PCPU_ALLOC_SIZE to make future identification easier)

- Allocate page from the correct domain for a given cpu.

- Don't initialize pc_domain to non-zero value if NUMA is not defined
  There are some misconceptions surrounding this field. It is the
  _VM_ NUMA domain and should only ever correspond to valid domain
  values as understood by the VM.

The former slab size of sizeof(struct pcpu) was somewhat arbitrary.
The new value is PAGE_SIZE because that's the smallest granularity
which the VM can allocate a slab for a given domain. If you have
fewer than PAGE_SIZE/8 counters on your system there will be some
memory wasted, but this is obviously something where you want the
cache line to be coming from the correct domain.

Reviewed by: jeff
Sponsored by: Limelight Networks
Differential Revision:  https://reviews.freebsd.org/D15933
2018-07-06 02:06:03 +00:00
John Baldwin
79ba91952d Use 'e' instead of 'i' constraints with 64-bit atomic operations on amd64.
The ADD, AND, OR, and SUB instructions take at most a 32-bit
sign-extended immediate operand.  64-bit constants that do not fit into
that constraint need to be loaded into a register.  The 'i' constraint
tells the compiler it can pass any integer constant to the assembler,
whereas the 'e' constrain only permits constants that fit into a 32-bit
sign-extended value.  This fixes using
atomic_add/clear/set/subtract_long/64 with constants that do not fit into
a 32-bit sign-extended immediate.

Reported by:	several folks
Tested by:	Pete Wright <pete@nomadlogic.org>
MFC after:	2 weeks
2018-07-03 22:03:28 +00:00
Matt Macy
f4b3640475 inline atomics and allow tied modules to inline locks
- inline atomics in modules on i386 and amd64 (they were always
  inline on other arches)
- allow modules to opt in to inlining locks by specifying
  MODULE_TIED=1 in the makefile

Reviewed by: kib
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16079
2018-07-02 19:48:38 +00:00
Konstantin Belousov
7f12ebe583 Do not leave stray qword on top of stack for interrupts and exceptions
without error code.  Doing so it mis-aligned the stack.

Since the only consumer of the SSE instructions with the alignment
requirements is AES-NI module, and since the FPU context cannot be
accessed in interrupts, the only situation where the alignment matter
are the compat32 syscalls, as reported in the PR.

PR:	229222
Reported and tested by:	 dewayne@heuristicsystems.com.au
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-06-25 11:29:04 +00:00
Mark Johnston
f090f67503 Tell the compiler that rdtscp clobbers %ecx. 2018-06-09 18:31:19 +00:00
Matt Macy
155046394a cpufunc: add rdtscp for x86 2018-06-07 00:54:11 +00:00
Matt Macy
07d80fd8dc hwpmc: ABI fixes
- increase pmc cpuid field from 8 to 12 bits
- add cpuid version string to initialize entry in the log
  so that filter can identify which counter index an
  event name maps to
- GC unused config flags
- make fixed counter assignment more robust as well as the
  changes needed to be properly identified for filter
2018-06-04 02:05:48 +00:00
Bruce Evans
49c871278a Fix high resolution kernel profiling just enough to not crash at boot
time, especially for SMP.  If configured, it turns itself on at boot
time for calibration, so is fragile even if never otherwise used.

Both types of kernel profiling were supposed to use a global spinlock
in the SMP case.  If hi-res profiling is configured (but not necessarily
used), this was supposed to be optimized by only using it when
necessary, and slightly more efficiently, in asm.  But it was not done
at all for mcount entry where it is necessary.  This caused crashes
in the SMP case when either type of profiling was enabled.  For mcount
exit, it only caused wrong times.  The times were wrongest with an
i8254 timer since using that requires exclusive access to the hardware.
The i8254 timer was too slow to use here 20 years ago and is much less
usable now, but it is the default for the SMP case since TSCs weren't
invariant when SMP was new.  Do the locking in all hi-res SMP cases for
simplicity.

Calibration uses special asms, and the clobber lists in these were sort
of inverted.  They contained the arg and return registers which are not
clobbered, but on amd64 they didn't contain the residue of the call-used
registers which may be clobbered (%r10 and %r11).  This usually caused
hangs at boot time.  This usually affected even the UP case.
2018-06-02 05:48:44 +00:00
Matt Macy
e92a1350b5 hwpmc: remove unused pre-table driven bits for intel
Intel now provides comprehensive tables for all performance counters
and the various valid configuration permutations as text .json files.
Libpmc has been converted to use these and hwpmc_core has been greatly
simplified by moving to passthrough of the table values.

The one gotcha is that said tables don't support pentium pro and and pentium
IV. There's very few users of hwpmc on _amd64_ kernels on new hardware. It is
unlikely that anyone is doing low level optimization on 15 year old Intel
hardware. Nonetheless, if someone feels strongly enough to populate the
corresponding tables for p4 and ppro I will reinstate the files in to the
build.

Code for the K8 counters and !x86 architectures remains unchanged.
2018-05-31 22:41:07 +00:00