Commit Graph

8691 Commits

Author SHA1 Message Date
kevans
f813cffaf1 Set .ORDER for makesyscalls generated files
When either makesyscalls.lua or syscalls.master changes, all of the
${GENERATED} targets are now out-of-date. With make jobs > 1, this means we
will run the makesyscalls script in parallel for the same ABI, generating
the same set of output files.

Prior to r356603 , there is a large window for interlacing output for some
of the generated files that we were generating in-place rather than staging
in a temp dir. After that, we still should't need to run the script more
than once per-ABI as the first invocation should update all of them. Add
.ORDER to do so cleanly.

Reviewed by:	brooks
Discussed with:	sjg
Differential Revision:	https://reviews.freebsd.org/D23099
2020-01-10 18:24:17 +00:00
kaktus
b05c0a53bd sysctl: mark more nodes as MPSAFE
vm.kvm_size and vm.kvm_free are read only and marked as MPSAFE on i386
already. Mark them as that on amd64 and arm64 too to avoid locking Giant.

Reviewed by:	kib (mentor)
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23039
2020-01-06 10:52:13 +00:00
mjg
06b12c0bed locks: add default delay struct
Use it for all primitives. This makes everything fit in 8 bytes.
2020-01-05 12:48:19 +00:00
alc
0adae291f5 When a copy-on-write fault occurs, pmap_enter() is called on to replace the
mapping to the old read-only page with a mapping to the new read-write page.
To destroy the old mapping, pmap_enter() must destroy its page table and PV
entries and invalidate its TLB entry.  This change simply invalidates that
TLB entry a little earlier, specifically, on amd64 and arm64, before the PV
list lock is held.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23027
2020-01-04 19:50:25 +00:00
kib
3e003ff13f bhyve: terminate waiting loops if thread suspension is requested.
PR:	242724
Reviewed by:	markj
Reported and tested by:	Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
	 (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22881
2020-01-02 22:37:04 +00:00
trasz
f870efbd57 Add basic getcpu(2) support to linuxulator. The purpose of this
syscall is to query the CPU number and the NUMA domain the calling
thread is currently running on.  The third argument is ignored.
It doesn't do anything regarding scheduling - it's literally
just a way to query the current state, without any guarantees
you won't get rescheduled an opcode later.

This unbreaks Java from CentOS 8
(java-11-openjdk-11.0.5.10-0.el8_0.x86_64).

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22972
2019-12-31 22:01:08 +00:00
trasz
0426ee90ee Regen after r356229.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-12-31 16:01:37 +00:00
trasz
08fb04a272 Fix definitions for Linux getcpu(2).
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-12-31 15:57:29 +00:00
kaktus
121a4336ab linux(4): implement copy_file_range(2)
copy_file_range(2) is implemented natively since r350315, make it available
for Linux binaries too.

Reviewed by:	kib (mentor), trasz (previous version)
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D22959
2019-12-30 18:11:06 +00:00
trasz
4b3d6ef9bc Implement Linux syslog(2) syscall; just enough to make Linux dmesg(8)
utility work.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22465
2019-12-29 15:53:55 +00:00
bdragon
cde01c97b2 Unbreak build. It seems that mips and amd64 still pull in link_elf.c, so
we need to have elf_cpu_parse_dynamic() everywhere after all to avoid
an undefined symbol.
2019-12-24 16:52:10 +00:00
alc
ab7c715e84 Micro-optimize the control flow in _pmap_unwire_ptp(), and eliminate
unnecessary parentheses.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22893
2019-12-21 22:32:24 +00:00
alc
41b7f699dc Correct a mistakenly inverted condition in r355833.
Noticed by:	kib
X-MFC with:	r355833
2019-12-20 20:46:26 +00:00
alc
e5b1f5ca56 When pmap_enter_{l2,pde}() are called to create a kernel mapping, they are
incrementing (and decrementing) the ref_count on kernel page table pages.
They should not do this.  Kernel page table pages are expected to have a
fixed ref_count.  Address this problem by refactoring pmap_alloc{_l2,pde}()
and their callers.  This also eliminates some duplicated code from the
callers.

Correctly implement PMAP_ENTER_NOREPLACE in pmap_enter_{l2,pde}() on kernel
mappings.

Reduce code duplication by defining a function, pmap_abort_ptp(), for
handling a common error case.

Handle a possible page table page leak in pmap_copy().  Suppose that we are
determining whether to copy a superpage mapping.  If we abort because there
is already a mapping in the destination pmap at the current address, then
simply decrementing the page table page's ref_count is correct, because the
page table page must have a ref_count > 1.  However, if we abort because we
failed to allocate a PV entry, this might be a just allocated page table
page that has a ref_count = 1, so we should call pmap_abort_ptp().

Simplify error handling in pmap_enter_quick_locked().

Reviewed by:	kib, markj (an earlier)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22763
2019-12-18 18:21:39 +00:00
trasz
e41f6b35b7 Add compat.linux.emul_path, so it can be set to something other
than "/compat/linux".  Useful when you have several compat directories
with different Linux versions and you don't want to clash with files
installed by linux-c7 packages.

Reviewed by:	bcr (manpages)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22574
2019-12-16 20:07:04 +00:00
trasz
820308e362 Add sync_file_range(2) implementation to linux(4); it's a thin wrapper
over the usual fsync(2).

This silences some warnings when running "apt-get upgrade".

Reviewed by:	brooks, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22371
2019-12-14 13:37:17 +00:00
trasz
2d2dde30e5 Regen after r355752.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22371
2019-12-14 13:32:37 +00:00
trasz
90c1a7bcc7 Fix definitions for linuxulator's sync_file_range(2).
Reviewed by:	brooks, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22371
2019-12-14 13:30:43 +00:00
jhb
6a9a1b3dee Support software breakpoints in the debug server on Intel CPUs.
- Allow the userland hypervisor to intercept breakpoint exceptions
  (BP#) in the guest.  A new capability (VM_CAP_BPT_EXIT) is used to
  enable this feature.  These exceptions are reported to userland via
  a new VM_EXITCODE_BPT that includes the length of the original
  breakpoint instruction.  If userland wishes to pass the exception
  through to the guest, it must be explicitly re-injected via
  vm_inject_exception().

- Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH
  pseudo-register.  Injecting a BP# on Intel requires setting this to
  the length of the breakpoint instruction.  AMD SVM currently ignores
  writes to this register (but reports success) and fails to read it.

- Rework the per-vCPU state tracked by the debug server.  Rather than
  a single 'stepping_vcpu' global, add a structure for each vCPU that
  tracks state about that vCPU ('stepping', 'stepped', and
  'hit_swbreak').  A global 'stopped_vcpu' tracks which vCPU is
  currently reporting an event.  Event handlers for MTRAP and
  breakpoint exits loop until the associated event is reported to the
  debugger.

  Breakpoint events are discarded if the breakpoint is not present
  when a vCPU resumes in the breakpoint handler to retry submitting
  the breakpoint event.

- Maintain a linked-list of active breakpoints in response to the GDB
  'Z0' and 'z0' packets.

Reviewed by:	markj (earlier version)
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D20309
2019-12-13 19:21:58 +00:00
markj
e0995c18f6 Introduce vm_page_astate.
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state.  The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.

This change merely adds the structure and updates references to atomic
state fields.  No functional change intended.

Reviewed by:	alc, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22650
2019-12-10 18:14:50 +00:00
jhb
d254d2e38e Use 4 byte stack alignment instead of 8 byte.
This was an old bug prior to r355373 and mostly harmless as it would
waste at most a handful of bytes on the stack.
2019-12-09 19:18:05 +00:00
jhb
4459aedbdc Copy out aux args after the argument and environment vectors.
Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector.  Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack.  It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.

This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).

Reviewed by:	kib
Tested on:	amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22695
2019-12-09 19:17:28 +00:00
kib
a51c00ce20 amd64: properly set the start of the io permission bitmap for BSP
... after the initial common TSS is copied into its final location
during PCPU reallocation.

Reported by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-12-07 00:23:19 +00:00
brooks
dfa2e15cbe sysent: Reduce duplication and improve readability.
Use the power of variable to avoid spelling out source and generated
files too many times.  The previous Makefiles were hard to read, hard to
edit, and badly formatted.

Reviewed by:	kevans, emaste
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22714
2019-12-06 23:59:23 +00:00
scottl
ed396a5316 Move the mds, irbs, and ssb mitigation knobs into machdep.mitigations.
They're in both the old and new places in HEAD for the moment for
discussion and transition.  The old locations will be garbage collected
in 4 weeks.  MFCs to 12 an 11 will keep the old and new for transition
purposes.

Reviewed by:	kib
MFC after:	4 weeks
Sponsored by:	Intel
Differential Revision:	https://reviews.freebsd.org/D22590
2019-12-06 02:43:05 +00:00
imp
a476ba06d5 Regularize my copyright notice
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
  All Rights Reserved on same line as other copyright holders (but not
  me). Other such holders are also listed last where it's clear.
2019-12-04 16:56:11 +00:00
jhb
0d8d23a6a3 Use uintptr_t instead of register_t * for the stack base.
- Use ustringp for the location of the argv and environment strings
  and allow destp to travel further down the stack for the stackgap
  and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
  stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs.  This
  used to hold translated system call arguments, but hasn't been used
  since r159992.

Reviewed by:	kib
Tested on:	md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22501
2019-12-03 23:17:54 +00:00
jeff
412097ad50 Fix a few places that free a page from an object without busy held. This is
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22611
2019-12-02 22:42:05 +00:00
anish
2589e5c32b bhyve amd: amdvi_dump_cmds() log the command for which the command completion failed. Completion is checked in poll mode although it can be done using interrupts.
No need to log all the commands in command ring but only the last one for which completion failed.

Reported by: np@freebsd.org
Reviewed by: np, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22566
2019-12-01 04:00:08 +00:00
scottl
e20ff19808 Remove the trm(4) driver
Differential Revision:	https://reviews.freebsd.org/D22575
2019-11-28 02:32:17 +00:00
kib
372acbf63e amd64: assert that EARLY_COUNTER does not corrupt memory.
Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22514
2019-11-24 19:02:13 +00:00
andrew
e95c204297 Add kcsan_md_unsupported from NetBSD.
It's used to ignore virtual addresses that may have a different physical
address depending on the CPU.

Sponsored by:	DARPA, AFRL
2019-11-21 13:22:23 +00:00
andrew
e0d8dc7f56 Fix for style(9): use parentheses around return statements.
Reported by:	kib
Sponsored by:	DARPA, AFRL
2019-11-21 12:29:20 +00:00
andrew
6e5970c8f4 Port the NetBSD KCSAN runtime to FreeBSD.
Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in
the FreeBSD kernel. It is a useful tool for finding data races between
threads executing on different CPUs.

This can be enabled by enabling KCSAN in the kernel config, or by using the
GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later
needs a compiler change to allow -fsanitize=thread that KCSAN uses.

Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22315
2019-11-21 11:22:08 +00:00
kib
edd82c43cd amd64: in double fault handler, do not rely on sane gsbase value.
Typical reasons for doublefault faults are either kernel stack
overflow or bugs in the code that manipulates protection CPU state.
The later code is the code which often has to set up gsbase for
kernel.  Switching to explicit load of GSBASE MSR in the fault handler
makes it more probable to output a useful information.

Now all IST handlers have nmi_pcpu structure on top of their stacks.

It would be even more useful to save gsbase value at the moment of the
fault.  I did not this because I do not want to modify PCB layout now.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-11-20 11:12:19 +00:00
kevans
60027726b9 Convert in-tree sysent targets to use new makesyscalls.lua
flua is bootstrapped as part of the build for those on older
versions/revisions that don't yet have flua installed. Once upgraded past
r354833, "make sysent" will again naturally work as expected.

Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D21894
2019-11-18 23:28:23 +00:00
jhb
81f62ee15e Check for errors from copyout() and suword*() in sv_copyout_args/strings.
Reviewed by:	brooks, kib
Tested on:	amd64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22401
2019-11-18 20:07:43 +00:00
markj
8d166fea2a Set MALLOC_DEBUG_MAXZONES=1 in GENERIC-NODEBUG configurations.
The purpose of this option is to make it easier to track down memory
corruption bugs by reducing the number of malloc(9) types that might
have recently been associated with a given chunk of memory.  However, it
increases fragmentation and is disabled in release kernels.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-11-18 20:03:28 +00:00
kib
8e4ff8df03 amd64 copyout: remove irrelevant comment.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-11-17 14:41:47 +00:00
scottl
3a09fa0468 TSX Asynchronous Abort mitigation for Intel CVE-2019-11135.
This CVE has already been announced in FreeBSD SA-19:26.mcu.

Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.

Control knobs are:
machdep.mitigations.taa.enable:
        0 - no software mitigation is enabled
        1 - attempt to disable TSX
        2 - use the VERW mitigation
        3 - automatically select the mitigation based on processor
	    features.

machdep.mitigations.taa.state:
        inactive        - no mitigation is active/enabled
        TSX disable     - TSX is disabled in the bare metal CPU as well as
                        - any virtualized CPUs
        VERW            - VERW instruction clears CPU buffers
	not vulnerable	- The CPU has identified itself as not being
			  vulnerable

Nothing in the base FreeBSD system uses TSX.  However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.

Reviewed by:	emaste, imp, kib, scottph
Sponsored by:	Intel
Differential Revision:	22374
2019-11-16 00:26:42 +00:00
jhb
ad77a7b3cc Use a sv_copyout_auxargs hook in the Linux ELF ABIs.
Reviewed by:	emaste
Tested on:	amd64 (linux64 only), i386
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22356
2019-11-15 23:01:43 +00:00
jhb
3f50cb7491 Add a sv_copyout_auxargs() hook in sysentvec.
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook.  In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland.  This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.

Reviewed by:	brooks, emaste
Tested on:	amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22355
2019-11-15 18:42:13 +00:00
jpaetzel
b0857bd293 Add the pvscsi driver to the tree.
This driver allows to usage of the paravirt SCSI controller
in VMware products like ESXi.  The pvscsi driver provides a
substantial performance improvement in block devices versus
the emulated mpt and mps SCSI/SAS controllers.

Error handling in this driver has not been extensively tested
yet.

Submitted by:	vbhakta@vmware.com
Relnotes:	yes
Sponsored by:	VMware, Panzura
Differential Revision:	D18613
2019-11-14 23:31:20 +00:00
kib
2d220d40d8 amd64: only set PCB_FULL_IRET pcb flag when #gp or similar exception comes
from usermode.

If CPU supports RDFSBASE, the flag also means that userspace fsbase
and gsbase are already written into pcb, which might be not true when
we handle #gp from kernel.

The offender is rdmsr_safe(), and the visible result is corrupted
userspace TLS base.

Reported by:	pstef
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-11-13 22:39:46 +00:00
kib
f6d6546684 Workaround for Intel SKL002/SKL012S errata.
Disable the use of executable 2M page mappings in EPT-format page
tables on affected CPUs.  For bhyve virtual machines, this effectively
disables all use of superpage mappings on affected CPUs.  The
vm.pmap.allow_2m_x_ept sysctl can be set to override the default and
enable mappings on affected CPUs.

Alternate approaches have been suggested, but at present we do not
believe the complexity is warranted for typical bhyve's use cases.

Reviewed by:	alc, emaste, markj, scottl
Security:	CVE-2018-12207
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21884
2019-11-12 18:01:33 +00:00
kib
620b44daa3 amd64: move GDT into PCPU area.
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22302
2019-11-12 15:51:47 +00:00
kib
fa1265d4c5 amd64: assert that size of the software prototype table for gdt is equal
to the size of hardware gdt.

Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22302
2019-11-12 15:47:46 +00:00
avg
258042511c teach db_nextframe/x86 about [X]xen_intr_upcall interrupt handler
Discussed with:	kib, royger
MFC after:	3 weeks
Sponsored by:	Panzura
2019-11-12 11:00:01 +00:00
kib
188c358474 amd64: Issue MFENCE on context switch on AMD CPUs when reusing address space.
On some AMD CPUs, in particular, machines that do not implement
CLFLUSHOPT but do provide CLFLUSH, the CLFLUSH instruction is only
synchronized with MFENCE.

Code using CLFLUSH typicall needs to brace it with MFENCE both before
and after flush, see for instance pmap_invalidate_cache_range().  If
context switch occurs while inside the protected region, we need to
ensure visibility of flushes done on the old CPU, to new CPU.

For all other machines, locked operation done to lock switched thread,
should be enough.  For case of different address spaces, reload of
%cr3 is serializing.

Reviewed by:	cem, jhb, scottph
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22007
2019-11-11 21:59:20 +00:00
avg
62919e1623 db_nextframe/amd64: remove TRAP_INTERRUPT frame type
Besides the confusing name, this type is effectively unused.
In all cases where it could be set, the INTERRUPT type is set by the
earlier code.  The conditions for TRAP_INTERRUPT are a subset of the
conditions for INTERRUPT.

Reviewed by:	kib, markj
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D22305
2019-11-11 17:11:49 +00:00