Commit Graph

8302 Commits

Author SHA1 Message Date
araujo
15d1271a22 - Add the ability to run bhyve(8) within a jail(8).
This patch adds a new sysctl(8) knob "security.jail.vmm_allowed",
by default this option is disable.

Submitted by:	Shawn Webb <shawn.webb____hardenedbsd.org>
Reviewed by:	jamie@ and myself.
Relnotes:	Yes.
Sponsored by:	HardenedBSD and G2, Inc.
Differential Revision:	https://reviews.freebsd.org/D16057
2018-08-01 00:39:21 +00:00
markj
0037c1c1c0 COMPAT_LINUX32 has not depended on COMPAT_43 in some time.
MFC after:	3 days
2018-07-31 21:40:13 +00:00
kevans
5727137549 amd64/GENERIC: Enable EFIRT by default
As noted in UDPATING, the new loader tunable efi.rt_disabled may be used to
disable EFIRT at runtime. It should have no effect if you are not booted via
UEFI boot.

MFC after:	6 weeks
2018-07-30 17:54:18 +00:00
kib
893bfcbd5a Remove unneeded CLDs instructions in the SMAP-ed version of several
functions from support.S.

I believe they re-appeared due to me mis-merging my r327820 into the
topic branch.

Sponsored by:	The FreeBSD Foundation
2018-07-30 16:54:51 +00:00
kib
c1e9eab13f Use SMAP on amd64.
Ifuncs selectors dispatch copyin(9) family to the suitable variant, to
set rflags.AC around userspace access.  Rflags.AC bit is cleared in
all kernel entry points unconditionally even on machines not
supporting SMAP.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13838
2018-07-29 20:47:00 +00:00
imp
11397021c9 Rename VM_FREELIST_ISADMA to VM_FREELIST_LOWMEM.
There's no differene between VM_FREELIST_ISADMA and VM_FREELIST_LOWMEM
except for the default boundary (16MB on x86 and 256MB on MIPS, but
they are otherwise the same). We don't need both for any system we
support (there were some really old ARC systems that did have ISA/EISA
bus, but we never ran on them and they are too old to ever grow
support for).

Differential Review: https://reviews.freebsd.org/D16290
2018-07-27 18:34:20 +00:00
markj
21c018b44b Fix handling of KVA in kmem_bootstrap_free().
Do not use vm_map_remove() to release KVA back to the system.  Because
kernel map entries do not have an associated VM object, with r336030
the vm_map_remove() call will not update the kernel page tables.  Avoid
relying on the vm_map layer and instead update the pmap and release KVA
to the kernel arena directly in kmem_bootstrap_free().

Because the pmap updates will generally result in superpage demotions,
modify pmap_init() to insert PTPs shadowed by superpage mappings into
the kernel pmap's radix tree.

While here, port r329171 to i386.

Reported by:	alc
Reviewed by:	alc, kib
X-MFC with:	r336505
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16426
2018-07-27 15:46:34 +00:00
kib
568f897094 On amd64, enable workarounds for several Ryzen erratas as described in
the AMD document 55449 'Revision Guide for AMD Family 17h Models
00h-0Fh Processors' rev 1.12.

The errata numbers are mentioned near each action.

It seems that newer BIOSes already include required chicken bits
settings, so the magic MSR updates are only needed when BIOS cannot be
updated.  On the other hand, MWAIT avoidance seems to be important.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-07-27 15:31:20 +00:00
kib
42ec259d74 Extend ranges of the critical sections to ensure that context switch
code never sees FPU pcb flags not consistent with the hardware state.

This is uncovered by the eager FPU switch mode.

Analyzed, reviewed and tested by:	gleb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-07-24 19:22:52 +00:00
markj
1810bcca4b Have preload_delete_name() free pages backing preloaded data.
On i386 and amd64, add a vm_phys segment for physical memory used to
store the kernel binary and other preloaded data.  This makes it
possible to free such memory back to the system once it is no longer
needed, e.g., when a preloaded kernel module is unloaded.  Previously,
it would have remained unused.

Reviewed by:	kib, royger
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16330
2018-07-19 20:00:28 +00:00
royger
4347b60bdb xen: implement early init helper for PVHv2
In order to setup an initial environment and jump into the generic
hammer_time initialization function. Some of the code is shared with
PVHv1, while other code is PVHv2 specific.

This allows booting FreeBSD as a PVHv2 DomU and Dom0.

Sponsored by:	Citrix Systems R&D
2018-07-19 08:44:52 +00:00
royger
616ce111c5 xen: add PVHv2 entry point
The PVHv2 entry point is fairly similar to the multiboot1 one. The
kernel is started in protected mode with paging disabled. More
information about the exact BSP state can be found in the pvh.markdown
document on the Xen tree.

This entry point is going to be joined with the native entry point at
hammer_time, and in order to do so the BSP needs to be bootstrapped
into long mode with the same set of page tables as used on bare metal.

Sponsored by:	Citrix Systems R&D
2018-07-19 07:39:35 +00:00
kib
d5fe3eb988 Expand x86 struct pcpus to UMA_PCPU_ALLOC_SIZE AKA PAGE_SIZE.
This restores counters(9) operation.
Revert r336024. Improve assert of pcpu size on x86.

Reviewed by:	mmacy
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D16163
2018-07-06 19:50:44 +00:00
kib
4bb78ec9b8 Revert to recommit with the proper message. 2018-07-06 19:50:25 +00:00
kib
49a6d02633 Save a call to pmap_remove() if entry cannot have any pages mapped.
Due to the way rtld creates mappings for the shared objects, each dso
causes unmap of at least three guard map entries.  For instance, in
the buildworld load, this change reduces the amount of pmap_remove()
calls by 1/5.

Profiled by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16148
2018-07-06 19:48:47 +00:00
hselasky
147a9bb331 Make sure kernel modules built by default are portable between UP and
SMP systems by extending defined(SMP) to include defined(KLD_MODULE).

This is a regression issue after r335873 .

Discussed with:		mmacy@
Sponsored by:		Mellanox Technologies
2018-07-06 10:13:42 +00:00
mmacy
c8354314f3 counter(9): unbreak amd64 following r336020
Apply temporary fix to counter until daylight hours.
The fact that the assembly for counter_u64_add relied on the sizeof(struct pcpu) was
the basis for the otherwise arbitrary offset never came up in D15933.
critical_{enter,exit} is now inline so the only real added overhead is the
added (mostly false) conditional branch in exit.
2018-07-06 10:10:00 +00:00
mmacy
ff20311f27 Back pcpu zone with domain correct pages
- Change pcpu zone consumers to use a stride size of PAGE_SIZE.
  (defined as UMA_PCPU_ALLOC_SIZE to make future identification easier)

- Allocate page from the correct domain for a given cpu.

- Don't initialize pc_domain to non-zero value if NUMA is not defined
  There are some misconceptions surrounding this field. It is the
  _VM_ NUMA domain and should only ever correspond to valid domain
  values as understood by the VM.

The former slab size of sizeof(struct pcpu) was somewhat arbitrary.
The new value is PAGE_SIZE because that's the smallest granularity
which the VM can allocate a slab for a given domain. If you have
fewer than PAGE_SIZE/8 counters on your system there will be some
memory wasted, but this is obviously something where you want the
cache line to be coming from the correct domain.

Reviewed by: jeff
Sponsored by: Limelight Networks
Differential Revision:  https://reviews.freebsd.org/D15933
2018-07-06 02:06:03 +00:00
kib
caf6998654 Extend r335969 to superpages.
It is possible that a fictitious unmanaged userspace mapping of
superpage is created on x86, e.g. by pmap_object_init_pt(), with the
physical address outside the vm_page_array[] coverage.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-05 17:28:06 +00:00
kib
288402cc53 Revert r335999 to re-commit with the correct error message. 2018-07-05 17:26:13 +00:00
kib
12d583b35c In x86 pmap_extract_and_hold(), there is no need to recalculate the
physical address, which is readily available after sucessfull
vm_page_pa_tryrelock().

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-05 16:38:54 +00:00
kib
a0b8819d1d In x86 pmap_extract_and_hold(), there is no need to recalculate the
physical address, which is readily available after sucessfull
vm_page_pa_tryrelock().

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-05 16:27:34 +00:00
alc
c5f98d1933 As of r335784, if pmap_enter() replaces a managed mapping by an unmanaged
mapping, then it leaks the unlinked PV entry.  This change eliminates that
leak, freeing the PV entry.

Reviewed by:	kib, markj
X-MFC with:	r335784
Differential Revision:	https://reviews.freebsd.org/D16130
2018-07-05 02:04:18 +00:00
kib
7c8694b186 In x86 pmap_extract_and_hold()s, handle the case of PHYS_TO_VM_PAGE()
returning NULL.

vm_fault_quick_hold_pages() can be legitimately called on userspace
mappings backed by fictitious pages created by unmanaged device and sg
pagers.

Note that other architectures pmap_extract_and_hold() might need
similar fix, but I postponed the examination.

Reported by:	bde
Discussed with:	alc
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-04 21:21:59 +00:00
jhb
27ec4ff5e8 Use 'e' instead of 'i' constraints with 64-bit atomic operations on amd64.
The ADD, AND, OR, and SUB instructions take at most a 32-bit
sign-extended immediate operand.  64-bit constants that do not fit into
that constraint need to be loaded into a register.  The 'i' constraint
tells the compiler it can pass any integer constant to the assembler,
whereas the 'e' constrain only permits constants that fit into a 32-bit
sign-extended value.  This fixes using
atomic_add/clear/set/subtract_long/64 with constants that do not fit into
a 32-bit sign-extended immediate.

Reported by:	several folks
Tested by:	Pete Wright <pete@nomadlogic.org>
MFC after:	2 weeks
2018-07-03 22:03:28 +00:00
mmacy
40fc34fd82 inline atomics and allow tied modules to inline locks
- inline atomics in modules on i386 and amd64 (they were always
  inline on other arches)
- allow modules to opt in to inlining locks by specifying
  MODULE_TIED=1 in the makefile

Reviewed by: kib
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16079
2018-07-02 19:48:38 +00:00
markj
17f29dfe16 Invalidate the mapping before updating its physical address.
Doing so ensures that all threads sharing the pmap have a consistent
view of the mapping.  This fixes the problem described in the commit
log messages for r329254 without the overhead of an extra fault in the
common case.  Once other pmap_enter() implementations are similarly
modified, the workaround added in r329254 can be removed, reducing the
overhead of CoW faults.

With this change we can reuse the PV entry from the old mapping,
potentially avoiding a call to reclaim_pv_chunk().  Otherwise, there is
nothing preventing the old PV entry from being reclaimed.  In rare
cases this could result in the PTE's page table page being freed,
leading to a use-after-free of the page when the updated PTE is written
following the allocation of the PV entry for the new mapping.

Reported and tested by:	pho
Reviewed by:	alc, kib
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D16005
2018-06-28 21:40:31 +00:00
kib
e513711c99 Do not leave stray qword on top of stack for interrupts and exceptions
without error code.  Doing so it mis-aligned the stack.

Since the only consumer of the SSE instructions with the alignment
requirements is AES-NI module, and since the FPU context cannot be
accessed in interrupts, the only situation where the alignment matter
are the compat32 syscalls, as reported in the PR.

PR:	229222
Reported and tested by:	 dewayne@heuristicsystems.com.au
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-06-25 11:29:04 +00:00
markj
c9e59b6914 Re-count available PV entries after reclaiming a PV chunk.
The call to reclaim_pv_chunk() in reserve_pv_entries() may free a
PV chunk with free entries belonging to the current pmap.  In this
case we must account for the free entries that were reclaimed, or
reserve_pv_entries() may return without having reserved the requested
number of entries.

Reviewed by:	alc, kib
Tested by:	pho (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D15911
2018-06-23 10:41:52 +00:00
chuck
360d52c628 Fix the Linux kernel version number calculation
The Linux compatibility code was converting the version number (e.g.
2.6.32) in two different ways and then comparing the results.

The linux_map_osrel() function converted MAJOR.MINOR.PATCH similar to
what FreeBSD does natively. I.e. where major=v0, minor=v1, and patch=v2
    v = v0 * 1000000 + v1 * 1000 + v2;

The LINUX_KERNVER() macro, on the other hand, converted the value with
bit shifts. I.e. where major=a, minor=b, and patch=c
    v = (((a) << 16) + ((b) << 8) + (c))

The Linux kernel uses the later format via the KERNEL_VERSION() macro in
include/generated/uapi/linux/version.h

Fix is to use the LINUX_KERNVER() macro in linux_map_osrel() as well as
in the .trans_osrel functions.

PR: 229209
Reviewed by: emaste, cem, imp (mentor)
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D15952
2018-06-22 00:02:03 +00:00
mmacy
8a6c208e18 remove ixl iwarp and ixlv from the build until they are in a working state 2018-06-19 02:48:53 +00:00
erj
01a93772b8 ixl(4): Update to use iflib
Update the driver to use iflib in order to bring performance,
maintainability, and (hopefully) stability benefits to the driver.

The driver currently isn't completely ported; features that are missing:

- VF driver (ixlv)
- SR-IOV host support
- RDMA support

The plan is to have these re-added to the driver before the next FreeBSD release.

Reviewed by:	gallatin@
Contributions by: gallatin@, mmacy@, krzysztof.galazka@intel.com
Tested by:	jeffrey.e.pieper@intel.com
MFC after:	1 month
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D15577
2018-06-18 20:12:54 +00:00
emaste
bc07d95d93 linuxulator: do not include legacy syscalls on arm64
Existing linuxulator platforms (i386, amd64) support legacy syscalls,
such as non-*at ones like open, but arm64 and other new platforms do
not.

Wrap these in #ifdef LINUX_LEGACY_SYSCALLS, #defined in the MD linux.h
files.  We may need finer grained control in the future but this is
sufficient for now.

Reviewed by:	andrew
Sponsored by:	Turing Robotic Industries
Differential Revision:	https://reviews.freebsd.org/D15237
2018-06-15 14:41:51 +00:00
kib
d006eed188 linuxolator/amd64: Don't mangle %r10 on return from syscall for EJUSTRETURN.
This fixes the %r10 content for rt_sigreturn.

Submitted by:	Yanko Yankulov <yanko.yankulov@gmail.com>
MFC after:	1 week
2018-06-14 12:35:57 +00:00
kib
a3a3e0b0ab Reorganize code flow in fpudna()/npxdna() to highlight the critical
section scope.  Sprinkle __predict_false() for conditions known to
never occur or occur only on rare platforms.

Sponsored by:	The FreeBSD Foundation
2018-06-14 11:09:51 +00:00
kib
18574304a4 Remove printf() in #NM handler.
Give up and remove the almost useless informational message reporting
that device not available exception occured while our state tracking
indicates the current CPU has FPU context loaded for the current
thread.

It seems that this is recurring bug with some VM monitors.

Sponsored by:	The FreeBSD Foundation
2018-06-14 10:33:26 +00:00
kib
b00affdeb5 Enable eager FPU context switch by default on amd64.
With compilers making increasing use of vector instructions the
performance benefit of lazily switching FPU state is no longer a
desirable tradeoff.  Linux switched to eager FPU context switch some
time ago, and the idea was floated on the FreeBSD-current mailing list
some years ago[1].

Enable eager FPU context switch by default on amd64, with a tunable/sysctl
available to turn it back off.

[1] https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055198.html

Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2018-06-13 17:55:09 +00:00
jtl
8222f5cb7c Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable.  There are a few exceptions.  For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9).  The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.

(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory.  This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory.  This change makes the behavior consistent.)

This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified.  After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.

Allocations that do need executable memory have various choices.  They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.

Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.

PR:		228927
Reviewed by:	alc, kib, markj, jhb (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
araujo
7e47182377 Add SPDX tags to vmm(4).
MFC after:	4 weeks.
Sponsored by:	iXsystems Inc.
2018-06-13 07:02:58 +00:00
jkim
06750cc309 Fix number of auxargs entries to copy out for 32-bit Linuxulator.
PR:		228790
2018-06-12 22:54:48 +00:00
kib
d1806e5461 Fix braino in r334799. Maxmem is in pages.
Reported by:	ae, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-06-11 15:28:20 +00:00
bde
0ea71e2e9d Untangle configuration ifdefs a little. On x86, msi is optional on pci,
and also on apic in common and i386 files (except for xen it is optional
only on xenhvm), but it was not ifdefed except on apic in common and i386
files.

This is all that is left from an attempt to build a (sub-)minimal kernel
without any devices.  The isa "option" is still used without ifdefs in many
standard files even on amd64.  ISAPNP is not optional on at least i386.
ATPIC is not optional on i386 (it is used mainly for Xspuriousint).  But
pci is now supposed to be optional on x86.
2018-06-10 14:49:13 +00:00
markj
e402a8384b Tell the compiler that rdtscp clobbers %ecx. 2018-06-09 18:31:19 +00:00
tychon
fd32dfa062 Don't bother looking for non-executable pages when a process is
excluded from PTI.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15708
2018-06-08 20:35:58 +00:00
mmacy
33d22ed3f8 hwpmc: simplify calling convention for hwpmc interrupt handling
pmc_process_interrupt takes 5 arguments when only 3 are needed.
cpu is always available in curcpu and inuserspace can always be
derived from the passed trapframe.

While facially a reasonable cleanup this change was motivated
by the need to workaround a compiler bug.

core2_intr(cpu, tf) ->
  pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) ->
    pmc_add_sample(cpu, ring, pm, tf, inuserspace)

In the process of optimizing the tail call the tf pointer was getting
clobbered:

(kgdb) up
    at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709
4709                                pmc_save_kernel_callchain(ps->ps_pc,
(kgdb) up
1205                    error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,

resulting in a crash in pmc_save_kernel_callchain.
2018-06-08 04:58:03 +00:00
mjg
5e26c51e6d amd64: remove now unused bzero, bcmp and bcopy. move pagecopy higher up. 2018-06-08 04:18:42 +00:00
mjg
511cde02d7 amd64: fix a retarded bug in memset
memset fills the target buffer from a byte-sized value passed in as the
second argument.

The fully-sized (8 bytes) register containing it is named %rsi. Lower 4 bytes
can be referred to as %esi and finally the lowest byte is %sil.

Vast majority of all the callers just zero the target buffer and set it up by
doing xor %esi,%esi which has a side-effect of zeroing the upper parts of
the register as well. Some others do a word-sized move to %esi which has the
same result.

However, there are callers which only fill %sil. This does *not* clear up
the rest of the register.

The value of %rsi is multiplied by $0x0101010101010101 to create a 8-byte sized
pattern for 8-byte stores.

Prior to the patch, the func just blindly took %rsi assuming the unwanted bytes
are zeroed out. Since this is not the case for the callers which only play with
%sil (the rest of the register can have absolutely anything), the resulting
pattern can be garbage.

This has potential for funny bugs. One side effect (which was not amusing)
after enabling it instead of bzero was that the kernel was hanging on boot
as a xen domU.

Reported by:	Trond Endrestøl <Trond.Endrestol fagskolen.gjovik.no>
Pointy hat: me
2018-06-08 00:47:24 +00:00
kib
8d0f43dd62 Account for dmap limit when selecting the pages for the bootstrap
pagetables.

physmap[] can be inconsistent with the physical memory limit due to
buggy bios, or to the hw.physmem tunable. Since bootstrap pagetables
are initialized by accesses through the DMAP, we must ensure that DMAP
really cover the selected pages. This is only relevant when machine
has less than 4G RAM and buggy BIOS, which is the combination on Acer
Chromebook 720.

The call to mp_bootaddress() is moved later to have Maxmem initialized.

An alternative could be to always cover 4G for DMAP, but this change
seems to be simpler.

Reported and tested by:	grembo
Reviewed by:	royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15675
2018-06-07 17:04:34 +00:00
mmacy
a2a3facbc2 cpufunc: add rdtscp for x86 2018-06-07 00:54:11 +00:00
mmacy
3b8eb9c59a hwpmc: ABI fixes
- increase pmc cpuid field from 8 to 12 bits
- add cpuid version string to initialize entry in the log
  so that filter can identify which counter index an
  event name maps to
- GC unused config flags
- make fixed counter assignment more robust as well as the
  changes needed to be properly identified for filter
2018-06-04 02:05:48 +00:00