Commit Graph

7817 Commits

Author SHA1 Message Date
Mateusz Guzik
a286a3099c amd64: move fusufault after all users
A lot of function have the following check:
        cmpq    %rax,%rdi                       /* verify address is valid */
        ja      fusufault

The label is present earlier in kernel .text, which means this is a jump
backwards. Absent any information in branch predictor, the cpu predicts it
as taken. Since it is almost never taken in practice, this results in a
completely avoidable misprediction.

Move it past all consumers, so that it is predicted as not taken.

Approved by:	re (kib)
2018-09-20 13:29:43 +00:00
Konstantin Belousov
d12c446550 Convert x86 cache invalidation functions to ifuncs.
This simplifies the runtime logic and reduces the number of
runtime-constant branches.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D16736
2018-09-19 19:35:02 +00:00
Konstantin Belousov
215aa93033 amd64 pmap: remove tautological assert.
pm_pcid is unsigned.

Reviewed by:	cem, markj
CID:	1395727
Noted by:	cem
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D17235
2018-09-19 15:39:16 +00:00
Konstantin Belousov
3c022be2ca Use ifunc to resolve context switching mode on amd64.
Patch removes all checks for pti/pcid/invpcid from the context switch
path. I verified this by looking at the generated code, compiling with
the in-tree clang.  The invpcid_works1 trick required inline attribute
for pmap_activate_sw_pcid_pti() to work.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-17 15:52:19 +00:00
Mateusz Guzik
d6943c5804 amd64: tidy up kernel memmove, take 2
There is no need to use %rax for temporary values and avoiding doing
so shortens the func.
Handle the explicit 'check for tail' depessimisization for backwards copying.

This reduces the diff against userspace.

Tested with the glibc test suite.

Approved by:	re (kib)
2018-09-17 15:51:49 +00:00
Konstantin Belousov
09a6ada991 Calculate PTI, PCID and INVPCID modes earlier, before ifuncs are resolved.
This will be used in following conversion of pmap_activate_sw().

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-17 15:34:19 +00:00
Konstantin Belousov
76ed0c542f Make the PTI violation check to follow style of the SMAP check.
No functional changes.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-17 14:59:05 +00:00
Mateusz Guzik
9d1b868da0 Revert amd64: tidy up kernel memmove
There is a braino in the non-erms variant which breaks the
functionality.

Will be fixed at a later time with a different patch.

Reported by:	Manfred Antar
Approved by:	re (implicit)
2018-09-16 21:46:27 +00:00
Mateusz Guzik
17f67f63b9 amd64: tidy up kernel memmove
There is no need to use %rax for temporary values and avoiding doing
so shortens the func.
Handle the explicit 'check for tail' depessimisization for backwards copying.

This reduces the diff against userspace.

Approved by:	re (kib)
2018-09-16 19:28:27 +00:00
Konstantin Belousov
bd6c14afa7 Remove unneeded new line from the panic string.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-16 18:36:42 +00:00
Mateusz Guzik
c51b7ab9e3 amd64: implement pagezero_erms
Intel docs claim such a memset (rep stosb + 4096 bytes) is
special-cased by microarchs. They also switched Linux to use
it for this purpose.

Approved by:	re (gjb)
2018-09-14 15:29:35 +00:00
Mateusz Guzik
13ea074dc3 amd64: implement ERMS-based memmove, memcpy and memset
Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17124
2018-09-13 14:53:51 +00:00
Mateusz Guzik
e382dd47aa amd64: enable options NUMA in GENERIC and MINIMAL
Reviewed by:	gallatin, cem, scottl
Approved by:	re (kib)
Relnotes:	yes
Sponsored by:	Dell EMC Isilon, Netflix
Differential Revision:	https://reviews.freebsd.org/D17059
2018-09-11 23:54:31 +00:00
Mateusz Guzik
12360b3079 amd64: depessimize copyinstr_smap
The stac/clac combo around each byte copy is causing a measurable
slowdown in benchmarks. Do it only before and after all data is
copied. While here reorder the code to avoid a forward branch in
the common case.

Note the copying loop (originating from copyinstr) is avoidably slow
and will be fixed later.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17063
2018-09-06 19:42:40 +00:00
Konstantin Belousov
20df4f456d amd64: Properly re-merge r334537 into SMAP-ified copyin(9) and copyout(9).
Also this fixes the eflags.ac leak from copyin_smap() when the copied
data length is multiple of eight bytes.

Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2018-09-04 19:27:53 +00:00
Konstantin Belousov
e21c5abc2a amd64: For non-PTI mode, do not initialize PCPU kcr3 to KPML4phys.
Non-PTI mode does not switch kcr3, which means that kcr3 is almost
always stale.  This is important for the NMI handler, which reloads
%cr3 with PCPU(kcr3) if the value is different from PMAP_NO_CR3.

The end result is that curpmap in NMI handler does not match the page
table loaded into hardware.  The manifestation was copyin(9) looping
forever when a usermode access page fault cannot be resolved by
vm_fault() updating a different page table.

Reported by:	mmacy
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Approved by:	re (gjb)
2018-09-04 19:26:54 +00:00
Konstantin Belousov
50cd0be78f Catch exceptions during EFI RT calls on amd64.
This appeared to be required to have EFI RT support and EFI RTC
enabled by default, because there are too many reports of faulting
calls on many different machines.  The knob is added to leave the
exceptions unhandled to allow to debug the actual bugs.

Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 21:37:05 +00:00
Konstantin Belousov
1565fb29a7 Add amd64 mdthread fields needed for the upcoming EFI RT exception
handling.

This is split into a separate commit from the main change to make it
easier to handle possible revert after upcoming KBI freeze.

Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 21:16:43 +00:00
Konstantin Belousov
9eb958988a Swap order of dererencing PCPU curpmap and checking for usermode in
trap_pfault() KPTI violation check.

EFI RT may set curpmap to NULL for the duration of the call for some
machines (PCID but no INVPCID).  Since apparently EFI RT code must be
ready for exceptions from the calls, avoid dereferencing curpmap until
we know that this call does not come from usermode.

Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 20:07:36 +00:00
Konstantin Belousov
d4be3789fe Normalize use of semicolon with EFI_TIME_LOCK macros.
Reviewed by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:    re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D16972
2018-09-02 19:48:41 +00:00
Konstantin Belousov
f0165b1ca6 Remove {max/min}_offset() macros, use vm_map_{max/min}() inlines.
Exposing max_offset and min_offset defines in public headers is
causing clashes with variable names, for example when building QEMU.

Based on the submission by:	royger
Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
Approved by:	re (marius)
Differential revision:	https://reviews.freebsd.org/D16881
2018-08-29 12:24:19 +00:00
Konstantin Belousov
d367236183 Several bug fixes and robustness improvements for the AP boot page
table allocation.

At the time that mp_bootaddress() is called, phys_avail[] array does
not reflect some memory reservations already done, like kernel
placement.  Recent changes to DMAP protection which make kernel text
read-only in DMAP revealed this, where on some machines AP boot page
tables selection appears to intersect with the kernel itself.

Fix this by checking the addresses selected using the same algorithm
as bootaddr_rwx().  Also, try to chomp pages for the page table not
only at the start of the contiguous range, but also at the end.  This
should improve robustness when the only suitable range is already
consumed by the kernel.

Reported and tested by: Michael Gmelin <freebsd@grem.de>
Reviewed by:    jhb
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Approved by:    re (gjb)
Differential revision:  https://reviews.freebsd.org/D16907
2018-08-28 18:47:02 +00:00
Alan Cox
49bfa624ac Eliminate the arena parameter to kmem_free(). Implicitly this corrects an
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().

Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions.  Use this flag to determine which
arena the kmem virtual addresses are returned to.

Eliminate UMA_SLAB_KRWX.  The introduction of VPO_KMEM_EXEC makes it
redundant.

Update the nearby comment for UMA_SLAB_KERNEL.

Reviewed by:	kib, markj
Discussed with:	jeff
Approved by:	re (marius)
Differential Revision:	https://reviews.freebsd.org/D16845
2018-08-25 19:38:08 +00:00
Konstantin Belousov
60b7423434 Unify amd64 and i386 vmspace0 pmap activation.
Add pmap_activate_boot() for i386, move the invocation on APs from MD
init_secondary() to x86 init_secondary_tail().

Suggested by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (marius)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16893
2018-08-25 15:21:28 +00:00
Warner Losh
592ffb2175 Revert drm2 removal.
Revert r338177, r338176, r338175, r338174, r338172

After long consultations with re@, core members and mmacy, revert
these changes. Followup changes will be made to mark them as
deprecated and prent a message about where to find the up-to-date
driver.  Followup commits will be made to make this clear in the
installer. Followup commits to reduce POLA in ways we're still
exploring.

It's anticipated that after the freeze, this will be removed in
13-current (with the residual of the drm2 code copied to
sys/arm/dev/drm2 for the TEGRA port's use w/o the intel or
radeon drivers).

Due to the impending freeze, there was no formal core vote for
this. I've been talking to different core members all day, as well as
Matt Macey and Glen Barber. Nobody is completely happy, all are
grudgingly going along with this. Work is in progress to mitigate
the negative effects as much as possible.

Requested by: re@ (gjb, rgrimes)
2018-08-24 00:02:00 +00:00
Mark Johnston
36716fe2e6 Prepare the kernel linker to handle PC-relative ifunc relocations.
The boot-time ifunc resolver assumes that it only needs to apply
IRELATIVE relocations to PLT entries.  With an upcoming optimization,
this assumption no longer holds, so add the support required to handle
PC-relative relocations targeting GNU_IFUNC symbols.
- Provide a custom symbol lookup routine that can be used in early boot.
  The default lookup routine uses kobj, which is not functional at that
  point.
- Apply all existing relocations during boot rather than filtering
  IRELATIVE relocations.
- Ensure that we continue to apply ifunc relocations in a second pass
  when loading a kernel module.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16749
2018-08-22 20:44:30 +00:00
Konstantin Belousov
614a9ce31a Skip PMAP_PCID_KERN + 1 PCPU pcid_next value on APs as well.
r337838 did it for BSP.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-08-22 14:58:52 +00:00
Matt Macy
d157fbd5b4 Remove legacy drm and drm2 from tree
As discussed on the MLs drm2 conflicts with the ports' version and there
is no upstream for most if not all of drm. Both have been merged in to
a single port.

Users on powerpc, 32-bit hardware, or with GPUs predating Radeon
and i915 will need to install the graphics/drm-legacy-kmod. All
other users should be able to use one of the LinuxKPI-based ports:
graphics/drm-stable-kmod, graphics/drm-next-kmod, graphics/drm-devel-kmod.

MFC: never
Approved by: core@
2018-08-22 01:50:12 +00:00
Alan Cox
83a90bffd8 Eliminate kmem_malloc()'s unused arena parameter. (The arena parameter
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)

Reviewed by:	kib, markj
Discussed with:	jeff, re@
Differential Revision:	https://reviews.freebsd.org/D16825
2018-08-21 16:43:46 +00:00
Konstantin Belousov
a997bcc015 Update comment about ABI of flush_l1s_sw to match the reality.
CPUID instruction clobbers %rbx and %rdx.

Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2018-08-20 19:09:39 +00:00
Konstantin Belousov
b0568ddbec Always initialize PCPU kcr3 for vmspace0 pmap.
If an exception or NMI occurs before CPU switched to a pmap different
from vmspace0, PCPU kcr3 is left zero for pti config, which causes
triple-fault in the handler.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-08-20 19:07:57 +00:00
John Baldwin
a800b45c18 Merge amd64 and i386 <machine/intr_machdep.h> headers.
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16803
2018-08-20 12:31:39 +00:00
Konstantin Belousov
c1141fba00 Update L1TF workaround to sustain L1D pollution from NMI.
Current mitigation for L1TF in bhyve flushes L1D either by an explicit
WRMSR command, or by software reading enough uninteresting data to
fully populate all lines of L1D.  If NMI occurs after either of
methods is completed, but before VM entry, L1D becomes polluted with
the cache lines touched by NMI handlers.  There is no interesting data
which NMI accesses, but something sensitive might be co-located on the
same cache line, and then L1TF exposes that to a rogue guest.

Use VM entry MSR load list to ensure atomicity of L1D cache and VM
entry if updated microcode was loaded.  If only software flush method
is available, try to help the bhyve sw flusher by also flushing L1D on
NMI exit to kernel mode.

Suggested by and discussed with: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D16790
2018-08-19 18:47:16 +00:00
John Baldwin
8cd385fda0 Make 'device crypto' lines more consistent.
- In configurations with a pseudo devices section, move 'device crypto'
  into that section.
- Use a consistent comment.  Note that other things common in kernel
  configs such as GELI also require 'device crypto', not just IPSEC.

Reviewed by:	rgrimes, cem, imp
Differential Revision:	https://reviews.freebsd.org/D16775
2018-08-18 20:32:08 +00:00
Warner Losh
62ee5bbd73 GPT is standard in x86 and arm64 land. Add it to DEFAULTS with the
others.

Differential Revision: https://reviews.freebsd.org/D16740
2018-08-17 14:47:21 +00:00
Konstantin Belousov
54564eda77 Fix early EFIRT on PCID machines after r337773.
Ensure that the valid PCID state is created for proc0 pmap, since it
might be used by efirt enter() before first context switch on the BSP.

Sponsored by:	The FreeBSD Foundation
MFC after:	6 days
2018-08-15 12:48:49 +00:00
Konstantin Belousov
c30578feeb Provide part of the mitigation for L1TF-VMM.
On the guest entry in bhyve, flush L1 data cache, using either L1D
flush command MSR if available, or by reading enough uninteresting
data to fill whole cache.

Flush is automatically enabled on CPUs which do not report RDCL_NO,
and can be disabled with the hw.vmm.l1d_flush tunable/kenv.

Security:	CVE-2018-3646
Reviewed by:	emaste. jhb, Tony Luck <tony.luck@intel.com>
Sponsored by:	The FreeBSD Foundation
2018-08-14 17:29:41 +00:00
Konstantin Belousov
9840c7373c Reserve page at the physical address zero on amd64.
We always zero the invalidated PTE/PDE for superpage, which means that
L1TF CPU vulnerability (CVE-2018-3620) can be only used for reading
from the page at zero.

Note that both i386 and amd64 exclude the page from phys_avail[]
array, so this change is redundant, but I think that phys_avail[] on
UEFI-boot does not need to do that.  Eventually the blacklisting
should be made conditional on CPUs which report that they are not
vulnerable to L1TF.

Reviewed by:	emaste. jhb
Sponsored by:	The FreeBSD Foundation
2018-08-14 17:14:33 +00:00
Konstantin Belousov
8fba5348fc amd64: ensure that curproc->p_vmspace pmap always matches PCPU
curpmap.

When performing context switch on a machine without PCID, if current
%cr3 equals to the new pmap %cr3, which is typical for kernel_pmap
vs. kernel process, I overlooked to update PCPU curpmap value.  Remove
check for %cr3 not equal to pm_cr3 for doing the update.  It is
believed that this case cannot happen at all, due to other changes in
this revision.

Also, do not set the very first curpmap to kernel_pmap, it should be
vmspace0 pmap instead to match curproc.

Move the common code to activate the initial pmap both on BSP and APs
into pmap_activate_boot() helper.

Reported by: eadler, ambrisko
Discussed with: kevans
Reviewed by:	alc, markj (previous version)
Tested by: ambrisko (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16618
2018-08-14 16:37:14 +00:00
Konstantin Belousov
ef52dc71eb Fix typo.
Noted by:	alc
MFC after:	3 days
2018-08-14 16:27:17 +00:00
Mark Johnston
97edfc1b45 Implement kernel support for early loading of Intel microcode updates.
Updates in the format described in section 9.11 of the Intel SDM can
now be applied as one of the first steps in booting the kernel.  Updates
that are loaded this way are automatically re-applied upon exit from
ACPI sleep states, in contrast with the existing cpucontrol(8)-based
method.  For the time being only Intel updates are supported.

Microcode update files are passed to the kernel via loader(8).  The
file type must be "cpu_microcode" in order for the file to be recognized
as a candidate microcode update.  Updates for multiple CPU types may be
concatenated together into a single file, in which case the kernel
will select and apply a matching update.  Memory used to store the
update file will be freed back to the system once the update is applied,
so this approach will not consume more memory than required.

Reviewed by:	kib
MFC after:	6 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16370
2018-08-13 17:13:09 +00:00
Konstantin Belousov
cb0eecdf92 Futex support functions in linux.ko and linux32.ko on amd64 should be
aware of SMAP.

Reported and tested by:	Johannes Lundberg <johalun0@gmail.com>, wulf
Sponsored by:	The FreeBSD Foundation
2018-08-07 18:29:10 +00:00
Kyle Evans
3395e43a04 efirt: Don't enter EFI context early, convert addrs to KVA instead
efi_enter here was needed because efi_runtime dereference causes a fault
outside of EFI context, due to runtime table living in runtime service
space. This may cause problems early in boot, though, so instead access it
by converting paddr to KVA for access.

While here, remove the other direct PHYS_TO_DMAP calls and the explicit DMAP
requirement from efidev.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D16591
2018-08-04 21:41:10 +00:00
Konstantin Belousov
54c531cacd Add END()s for amd64 linux futex support routines.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-08-04 13:57:50 +00:00
Konstantin Belousov
35efb3b1de Fix typo in copyinstr_smap, resulting in mis-handling of too long strings.
Reported and tested by:	pho
PR:	230286
Sponsored by:	The FreeBSD Foundation
2018-08-03 15:35:29 +00:00
Konstantin Belousov
e45b89d23d Add pmap_is_valid_memattr(9).
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation, Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15583
2018-08-01 18:45:51 +00:00
Mark Johnston
8a5efe3601 Make sure that ENTRY() and END() refer to the same symbol.
X-MFC with:	r336876
2018-08-01 15:50:42 +00:00
Marcelo Araujo
be963beee6 - Add the ability to run bhyve(8) within a jail(8).
This patch adds a new sysctl(8) knob "security.jail.vmm_allowed",
by default this option is disable.

Submitted by:	Shawn Webb <shawn.webb____hardenedbsd.org>
Reviewed by:	jamie@ and myself.
Relnotes:	Yes.
Sponsored by:	HardenedBSD and G2, Inc.
Differential Revision:	https://reviews.freebsd.org/D16057
2018-08-01 00:39:21 +00:00
Mark Johnston
40fd44953c COMPAT_LINUX32 has not depended on COMPAT_43 in some time.
MFC after:	3 days
2018-07-31 21:40:13 +00:00
Kyle Evans
164138e7d8 amd64/GENERIC: Enable EFIRT by default
As noted in UDPATING, the new loader tunable efi.rt_disabled may be used to
disable EFIRT at runtime. It should have no effect if you are not booted via
UEFI boot.

MFC after:	6 weeks
2018-07-30 17:54:18 +00:00