Commit Graph

7844 Commits

Author SHA1 Message Date
Konstantin Belousov
78a3652794 bhyve: emulate CLFLUSH and CLFLUSHOPT.
Apparently CLFLUSH on mmio can cause VM exit, as reported in the PR.
I do not see that anything useful can be done except emulating page
faults on invalid addresses.

Due to the instruction encoding pecularity, also emulate SFENCE.

PR:	232081
Reported by:	phk
Reviewed by:	araujo, avg, jhb (all: previous version)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17482
2018-10-12 15:30:15 +00:00
Mateusz Guzik
3c9a1d0493 amd64: make memmove and memcpy less slow with mov
The reasoning is the same as with the memset change, see r339205

Reviewed by:	kib (previous version)
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17441
2018-10-11 23:37:57 +00:00
Mateusz Guzik
3f102f5881 Provide string functions for use before ifuncs get resolved.
The change is a no-op for architectures which don't ifunc memset,
memcpy nor memmove.

Convert places which need them. Xen bits by royger.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17487
2018-10-11 23:28:04 +00:00
John Baldwin
b843f9be5e Fully restore the GDTR, IDTR, and LDTR after VT-x VM exits.
The VT-x VMCS only stores the base address of the GDTR and IDTR.  As a
result, VM exits use a fixed limit of 0xffff for the host GDTR and
IDTR losing the smaller limits set in when the initial GDT is loaded
on each CPU during boot.  Explicitly save and restore the full GDTR
and IDTR contents around VM entries and exits to restore the correct
limit.

Similarly, explicitly save and restore the LDT selector.  VM exits
always clear the host LDTR as if the LDT was loaded with a NULL
selector and a userspace hypervisor is probably using a NULL selector
anyway, but save and restore the LDT explicitly just to be safe.

PR:		230773
Reported by:	John Levon <levon@movementarian.org>
Reviewed by:	kib
Tested by:	araujo
Approved by:	re (rgrimes)
MFC after:	1 week
2018-10-11 18:27:19 +00:00
Brooks Davis
c7d0908e1c Regenerated assorted syscall related files after:
- r327895: Implement 'domainset'...
 - r329876: Use linux types for linux-specific syscalls

Diff generated with:
	find . -name syscalls.conf | xargs dirname | \
	    xargs -n1 -I DIR make -C DIR sysent

Approved by:	re (kib)
Sponsored by:	DARPA, AFRL
2018-10-09 20:42:17 +00:00
Michael Tuexen
6b45121a6d Address the warning regarding duplicate option 'GEOM_PART_GPT' when
configuring kernels for i386, amd64, and arm64.
The 'GEOM_PART_GPT' option was added to the DEFAULTS configuration
in r337967.

Approved by:		re (kib@)
Reviewed by:		ler@
Differential Revision:	https://reviews.freebsd.org/D17458
Sponsored by:		Netflix, Inc.
2018-10-07 15:54:13 +00:00
Mateusz Guzik
97bb9a0818 amd64: make memset less slow with mov
rep stos has a high startup time even on modern microarchitectures like
Skylake. Intel optimization manuals discuss how for small sizes it is
beneficial to go for streaming stores. Since those cannot be used without
extra penalty in the kernel I investigated performance impact of just
regular movs.

The patch below implements a very simple scheme: a 32-byte loop followed
by filling in the remainder of at most 31 bytes. It has a 256 breaking
point on which it falls back to rep stos. It provides a significant win
over the current primitive on several machines I tested (both Intel and
AMD). A 64-byte loop did not provide any benefit even for multiple of 64
sizes.

See the review for benchmark data.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17398
2018-10-05 19:25:09 +00:00
Mateusz Guzik
9657b80ce7 amd64: hide non-erms jump label under non-erms copyin/copyout
This change is a no-op in terms of semantics, but has a side effect
of removing a perfectly useless nop sled for CPUs with ERMS.

Approved by:	re (gjb)
Sponsored by:   The FreeBSD Foundation
2018-10-04 20:01:48 +00:00
Mark Johnston
cb4961abc1 Apply r339046 to i386.
Belatedly add a comment to the amd64 pmap explaining why we initialize
the kernel pmap's resident page count.

Reviewed by:	alc, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17377
2018-10-01 18:48:33 +00:00
Mark Johnston
c6c770d041 Count bootstrap data as resident in the kernel pmap.
Such data may later be unmapped.  This occurs, for example, when a
loader-provided microcode update file is discarded.

Reviewed by:	alc, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17340
2018-10-01 14:47:49 +00:00
Konstantin Belousov
f76bec2a28 Revert part of the r338891 which reordered local invalidation and IPI.
For PCID case, there is a dependency between pm_gen zeroing and
reading pm_active for IPI target selection, to ensure that the
invalidation is not missed.

Reported and tested by:	mjg
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2018-09-28 14:08:20 +00:00
Mateusz Guzik
825eeb55f4 amd64: fix return value of copyinstr after r338970
The function stopped swapping rdi and rsi, but the error handling
code was not updated with the new register name.

Approved by:	re (implicit)
Sponsored by:	The FreeBSD Foundation
2018-09-27 20:48:07 +00:00
John Baldwin
83382d027f Don't clear DR6 for debug exceptions from userland.
This reverts part of r333368.  The attempt to clear DR6 was occuring
too soon as trapsignal() does not pause to let the debugger notice the
SIGTRAP and query DR6.  The signal exchange does not occur until much
later during ast().  As a result, GDB was no longer recognizing
hardware breakpoints and watchpoints on x86.

In addition, any userland programs that want to inspect DR6 in a
SIGTRAP handler don't have a way to do this if we clear DR6 in the
exception handler.

Instead of relying on the kernel to clear DR6, debuggers will have to
explicitly clear it after a trace trap (which they needed to do on
older kernels anyway).

Reviewed by:	kib
Approved by:	re (delphij)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17319
2018-09-27 17:33:59 +00:00
Mateusz Guzik
5910b87605 amd64: macroify and mostly depessimize copyinstr
See r338968 for details.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17288
2018-09-27 15:53:36 +00:00
Mateusz Guzik
3d95cc51bb amd64: mostly depessimize copystr
- remove a forward branch in the common case
- replace xchg + lodsb/stosb loop with simple movs

A simple test on Intel(R) Core(TM) i7-4600U CPU @ 2.10GH copying
/foo/bar/baz in a loop goes from 295715863 ops/s to 465807408.

Further changes are pending.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17281
2018-09-27 15:27:53 +00:00
Mateusz Guzik
0e59ecce47 amd64: clean up copyin/copyout
- move the PSL.AC comment to the fault handler
- stop testing for zero-sized ops. after several minutes of package
building there were no copyin calls with zero bytes and very few
copyout. the semantic of returning 0 in this case is preserved
- shorten exit paths by clearing %eax earlier
- replace xchg with 3 movs. this is what compilers do. a naive
benchmark on EPYC suggests about 1% increase in thoughput thanks to
this change.
- remove the useless movb %cl,%al from copyout. it looks like a
leftover from many years ago

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17286
2018-09-27 15:24:16 +00:00
Mateusz Guzik
a8e3f99ec1 amd64: implement memcmp in assembly
Both the in-kernel C variant and libc asm variant have very poor performance.
The former compiles to a single byte comparison loop, which breaks down even
for small sizes. The latter uses rep cmpsq/b which turn out to have very poor
throughput and are slower than a hand-coded 32-byte comparison loop.

Depending on size this is about 3-4 times faster than the current routines.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17328
2018-09-27 14:05:44 +00:00
Andrew Turner
27d2645787 Handle a guest executing a vm instruction by trapping and raising an
undefined instruction exception. Previously we would exit the guest,
however an unprivileged user could execute these.

Found with:	syzkaller
Reviewed by:	araujo, tychon (previous version)
Approved by:	re (kib)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17192
2018-09-27 11:16:19 +00:00
Konstantin Belousov
05e1cca97a Fix some uses of dmaplimit.
dmaplimit is the first byte after the end of DMAP.

Reported by:	"Johnson, Archna" <Archna.Johnson@netapp.com>
Reviewed by:	alc, markj
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17318
2018-09-25 20:07:58 +00:00
Konstantin Belousov
cbe100dfca Fix an issue in r338862.
For pmap_invalidate_all_pcid(), only reset pm_gen for non-kernel
pmaps, as it was done before the conversion to ifuncs.  The reset is
useless but innocent for kernel_pmap. Coverity reported that cpuid is
used uninitialized in this case.

Reported by:	cem
Reviewed by:	alc, cem, markj
CID:	 1395807
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D17314
2018-09-25 18:24:25 +00:00
Konstantin Belousov
108ff63e8a Further reorganize pmap_invalidate TLB code.
Split calculation of mask for shootdown IPI and local
invalidation. Reorder IPI before local.

Suggested by:	alc
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D17277
2018-09-22 17:04:39 +00:00
Mark Johnston
6cdde6fda2 Use the GNU as-compatible .endm instead of .endmacro.
Approved by:	re (gjb)
2018-09-21 20:20:03 +00:00
Konstantin Belousov
d995b5b1ea Convert x86 TLB top-level invalidation functions to ifuncs.
Note that shootdown IPI handlers are already per-mode.

Suggested by:	alc
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D17184
2018-09-21 17:53:06 +00:00
Mateusz Guzik
68c7542edf amd64: even up copyin/copyout with memcpy + other cleanup
- _fault handlers for both primitives are identical, provide just one
- change the copying scheme to match memcpy (in particular jump
avoidance for the most common case of multiply of 8)
- stop re-reading pcb address on exit, just store it locally (in r9)

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17265
2018-09-21 15:00:46 +00:00
Mateusz Guzik
bb32268086 amd64: check for small size in memmove, memcpy and memset
If the size is 15 bytes or less avoid spinning up rep just to copy the 8
bytes. In my tests on EPYC and old Intel microarchs without ERMS (like
Westmere) it provided a nice win over the current version (e.g. for EPYC
memset with 15 bytes of size goes from 59712651 ops/s to 70600095) all
while almost not pessimizing the other cases.

Data collected during package building shows that < 16 sizes are pretty
common.

Verified with the glibc test suite.

Approved by:	re (kib)
2018-09-21 12:27:36 +00:00
Mateusz Guzik
5254f065ab amd64: macroify copyin/copyout and provide erms variants, follow up
Fix a fat-fingered typo with a "funny" side-effect: when doing copyin on a
cpu without ERMS and with size being a multiply of 8 a page fault would be
triggered resulting in EFAULT.

Pointy hat: mjg
Approved by:	re (implicit)
2018-09-20 20:32:08 +00:00
Mateusz Guzik
4bf0035fd1 amd64: macroify copyin/copyout and provide erms variants
Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17257
2018-09-20 18:30:17 +00:00
Mateusz Guzik
a286a3099c amd64: move fusufault after all users
A lot of function have the following check:
        cmpq    %rax,%rdi                       /* verify address is valid */
        ja      fusufault

The label is present earlier in kernel .text, which means this is a jump
backwards. Absent any information in branch predictor, the cpu predicts it
as taken. Since it is almost never taken in practice, this results in a
completely avoidable misprediction.

Move it past all consumers, so that it is predicted as not taken.

Approved by:	re (kib)
2018-09-20 13:29:43 +00:00
Konstantin Belousov
d12c446550 Convert x86 cache invalidation functions to ifuncs.
This simplifies the runtime logic and reduces the number of
runtime-constant branches.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D16736
2018-09-19 19:35:02 +00:00
Konstantin Belousov
215aa93033 amd64 pmap: remove tautological assert.
pm_pcid is unsigned.

Reviewed by:	cem, markj
CID:	1395727
Noted by:	cem
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D17235
2018-09-19 15:39:16 +00:00
Konstantin Belousov
3c022be2ca Use ifunc to resolve context switching mode on amd64.
Patch removes all checks for pti/pcid/invpcid from the context switch
path. I verified this by looking at the generated code, compiling with
the in-tree clang.  The invpcid_works1 trick required inline attribute
for pmap_activate_sw_pcid_pti() to work.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-17 15:52:19 +00:00
Mateusz Guzik
d6943c5804 amd64: tidy up kernel memmove, take 2
There is no need to use %rax for temporary values and avoiding doing
so shortens the func.
Handle the explicit 'check for tail' depessimisization for backwards copying.

This reduces the diff against userspace.

Tested with the glibc test suite.

Approved by:	re (kib)
2018-09-17 15:51:49 +00:00
Konstantin Belousov
09a6ada991 Calculate PTI, PCID and INVPCID modes earlier, before ifuncs are resolved.
This will be used in following conversion of pmap_activate_sw().

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-17 15:34:19 +00:00
Konstantin Belousov
76ed0c542f Make the PTI violation check to follow style of the SMAP check.
No functional changes.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-17 14:59:05 +00:00
Mateusz Guzik
9d1b868da0 Revert amd64: tidy up kernel memmove
There is a braino in the non-erms variant which breaks the
functionality.

Will be fixed at a later time with a different patch.

Reported by:	Manfred Antar
Approved by:	re (implicit)
2018-09-16 21:46:27 +00:00
Mateusz Guzik
17f67f63b9 amd64: tidy up kernel memmove
There is no need to use %rax for temporary values and avoiding doing
so shortens the func.
Handle the explicit 'check for tail' depessimisization for backwards copying.

This reduces the diff against userspace.

Approved by:	re (kib)
2018-09-16 19:28:27 +00:00
Konstantin Belousov
bd6c14afa7 Remove unneeded new line from the panic string.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-16 18:36:42 +00:00
Mateusz Guzik
c51b7ab9e3 amd64: implement pagezero_erms
Intel docs claim such a memset (rep stosb + 4096 bytes) is
special-cased by microarchs. They also switched Linux to use
it for this purpose.

Approved by:	re (gjb)
2018-09-14 15:29:35 +00:00
Mateusz Guzik
13ea074dc3 amd64: implement ERMS-based memmove, memcpy and memset
Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17124
2018-09-13 14:53:51 +00:00
Mateusz Guzik
e382dd47aa amd64: enable options NUMA in GENERIC and MINIMAL
Reviewed by:	gallatin, cem, scottl
Approved by:	re (kib)
Relnotes:	yes
Sponsored by:	Dell EMC Isilon, Netflix
Differential Revision:	https://reviews.freebsd.org/D17059
2018-09-11 23:54:31 +00:00
Mateusz Guzik
12360b3079 amd64: depessimize copyinstr_smap
The stac/clac combo around each byte copy is causing a measurable
slowdown in benchmarks. Do it only before and after all data is
copied. While here reorder the code to avoid a forward branch in
the common case.

Note the copying loop (originating from copyinstr) is avoidably slow
and will be fixed later.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17063
2018-09-06 19:42:40 +00:00
Konstantin Belousov
20df4f456d amd64: Properly re-merge r334537 into SMAP-ified copyin(9) and copyout(9).
Also this fixes the eflags.ac leak from copyin_smap() when the copied
data length is multiple of eight bytes.

Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2018-09-04 19:27:53 +00:00
Konstantin Belousov
e21c5abc2a amd64: For non-PTI mode, do not initialize PCPU kcr3 to KPML4phys.
Non-PTI mode does not switch kcr3, which means that kcr3 is almost
always stale.  This is important for the NMI handler, which reloads
%cr3 with PCPU(kcr3) if the value is different from PMAP_NO_CR3.

The end result is that curpmap in NMI handler does not match the page
table loaded into hardware.  The manifestation was copyin(9) looping
forever when a usermode access page fault cannot be resolved by
vm_fault() updating a different page table.

Reported by:	mmacy
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Approved by:	re (gjb)
2018-09-04 19:26:54 +00:00
Konstantin Belousov
50cd0be78f Catch exceptions during EFI RT calls on amd64.
This appeared to be required to have EFI RT support and EFI RTC
enabled by default, because there are too many reports of faulting
calls on many different machines.  The knob is added to leave the
exceptions unhandled to allow to debug the actual bugs.

Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 21:37:05 +00:00
Konstantin Belousov
1565fb29a7 Add amd64 mdthread fields needed for the upcoming EFI RT exception
handling.

This is split into a separate commit from the main change to make it
easier to handle possible revert after upcoming KBI freeze.

Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 21:16:43 +00:00
Konstantin Belousov
9eb958988a Swap order of dererencing PCPU curpmap and checking for usermode in
trap_pfault() KPTI violation check.

EFI RT may set curpmap to NULL for the duration of the call for some
machines (PCID but no INVPCID).  Since apparently EFI RT code must be
ready for exceptions from the calls, avoid dereferencing curpmap until
we know that this call does not come from usermode.

Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 20:07:36 +00:00
Konstantin Belousov
d4be3789fe Normalize use of semicolon with EFI_TIME_LOCK macros.
Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 19:48:41 +00:00
Konstantin Belousov
f0165b1ca6 Remove {max/min}_offset() macros, use vm_map_{max/min}() inlines.
Exposing max_offset and min_offset defines in public headers is
causing clashes with variable names, for example when building QEMU.

Based on the submission by:	royger
Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
Approved by:	re (marius)
Differential revision:	https://reviews.freebsd.org/D16881
2018-08-29 12:24:19 +00:00
Konstantin Belousov
d367236183 Several bug fixes and robustness improvements for the AP boot page
table allocation.

At the time that mp_bootaddress() is called, phys_avail[] array does
not reflect some memory reservations already done, like kernel
placement.  Recent changes to DMAP protection which make kernel text
read-only in DMAP revealed this, where on some machines AP boot page
tables selection appears to intersect with the kernel itself.

Fix this by checking the addresses selected using the same algorithm
as bootaddr_rwx().  Also, try to chomp pages for the page table not
only at the start of the contiguous range, but also at the end.  This
should improve robustness when the only suitable range is already
consumed by the kernel.

Reported and tested by: Michael Gmelin <freebsd@grem.de>
Reviewed by:    jhb
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Approved by:    re (gjb)
Differential revision:  https://reviews.freebsd.org/D16907
2018-08-28 18:47:02 +00:00
Alan Cox
49bfa624ac Eliminate the arena parameter to kmem_free(). Implicitly this corrects an
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().

Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions.  Use this flag to determine which
arena the kmem virtual addresses are returned to.

Eliminate UMA_SLAB_KRWX.  The introduction of VPO_KMEM_EXEC makes it
redundant.

Update the nearby comment for UMA_SLAB_KERNEL.

Reviewed by:	kib, markj
Discussed with:	jeff
Approved by:	re (marius)
Differential Revision:	https://reviews.freebsd.org/D16845
2018-08-25 19:38:08 +00:00