Commit Graph

8168 Commits

Author SHA1 Message Date
Edward Tomasz Napierala
eff8d99fb3 Regen after r357503.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-02-04 16:02:04 +00:00
Edward Tomasz Napierala
369c4633c1 Add missing linux(4) syscall entries. This fixes missing debug
messages for some of the unimplemented syscalls, in particular
the AIO-related ones.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23231
2020-02-04 16:01:06 +00:00
Mark Johnston
1c29da0279 Reimplement stack capture of running threads on i386 and amd64.
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.

Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.

Simplify the KPIs.  Remove stack_save_td_running() and add a return
value to stack_save_td().  On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP.  If the
target thread is running in user mode, stack_save_td() returns EBUSY.

Reviewed by:	kib
Reported by:	mjg, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23355
2020-01-31 15:43:33 +00:00
Mateusz Guzik
aa88cc44f3 amd64: speed up failing case for memcmp
Instead of branching on up to 8 bytes, drop the size to 4.

Assorted clean ups while here.

Validated with glibc test suite.
2020-01-30 19:56:22 +00:00
Mateusz Guzik
f0ddecd745 amd64: revamp memcmp
Borrow the trick from memset and memmove and use the scale/index/base addressing
to avoid branches.

If a mismatch is found, the routine has to calculate the difference. Make sure
there is always up to 8 bytes to inspect. This replaces the previous loop which
would operate over up to 16 bytes with an unrolled list of 8 tests.

Speed varies a lot, but this is a net win over the previous routine with probably
a lot more to gain.

Validated with glibc test suite.
2020-01-28 17:48:17 +00:00
Mark Johnston
f01fe060e6 Regen.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-01-21 17:28:36 +00:00
Mark Johnston
149afbf3ba Fix 64-bit syscall argument fetching in 32-bit Linux syscall handlers.
The Linux32 system call argument fetcher places each argument (passed in
registers in the Linux x86 system call convention) into an entry in the
generic system call args array.  Each member of this array is 8 bytes
wide, so this approach is broken for system calls that take off_t
arguments.

Fix the problem by splitting l_loff_t arguments in the 32-bit system
call descriptions, the same as we do for FreeBSD32.  Change entry points
to handle this using the PAIR32TO64 macro.

Move linux_ftruncate64() into compat/linux.

PR:		243155
Reported by:	Alex S <iwtcex@gmail.com>
Reviewed by:	kib (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23210
2020-01-21 17:28:22 +00:00
Konstantin Belousov
2ee49fac82 Add support for Hygon Dhyana Family 18h processor.
As a new x86 CPU vendor, Chengdu Haiguang IC Design Co., Ltd (Hygon)
is a joint venture between AMD and Haiguang Information Technology Co.,
Ltd., aims at providing x86 processors for China server market.

The first generation Hygon processor(Dhyana) shares most architecture
with AMD's family 17h, but with different CPU vendor ID("HygonGenuine")
and PCI vendor ID(0x1d94) and family series number 18h(Hygon negotiated
with AMD to confirm that only Hygon use family 18h).

To enable Hygon Dhyana support in FreeBSD, add new definitions
HYGON_VENDOR_ID("HygonGenuine") and X86_VENDOR_HYGON(0x1d94) to identify
Hygon Dhyana CPU.

Initialize the CPU features(topology, local APIC ext, MSI, TSC, hwpstate,
MCA, DEBUG_CTL, etc) for amd64 and i386 mode by sharing the code path of
AMD family 17h.

The changes have been applied on FreeBSD 13.0-CURRENT and tested
successfully on Hygon Dhyana processor.

References:
[1] Linux kernel patches for Hygon Dhyana, merged in 4.20:

https://git.kernel.org/tip/c9661c1e80b609cd038db7c908e061f0535804ef

[2] MSR and CPUID definition:

https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf

Submitted by:	Pu Wen <puwen@hygon.cn>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23163
2020-01-21 13:22:35 +00:00
Kyle Evans
05d7dd739c sysent targets: further cleanup and deduplication
r355473 vastly improved the readability and cleanliness of these Makefiles.
Every single one of them follows the same pattern and duplicates the exact
same logic.

Now that we have GENERATED/SRCS, split SRCS up into the two parameters we'll
use for ${MAKESYSCALLS} rather than assuming a specific ordering of SRCS and
include a common sysent.mk to handle the rest. This makes it less tedious to
make sweeping changes.

Some default values are provided for GENERATED/SYSENT_*; almost all of these
just use a 'syscalls.master' and 'syscalls.conf' in cwd, and they all use
effectively the same filenames with an arbitrary prefix. Most ABIs will be
able to get away with just setting GENERATED_PREFIX and including
^/sys/conf/sysent.mk, while others only need light additions. kern/Makefile
is the notable exception, as it doesn't take a SYSENT_CONF and the generated
files are spread out between ^/sys/kern and ^/sys/sys, but it otherwise fits
the pattern enough to use the common version.

Reviewed by:	brooks, imp
Nice!:		emaste
Differential Revision:	https://reviews.freebsd.org/D23197
2020-01-18 20:37:45 +00:00
Kyle Evans
1171c633fb Set .ORDER for makesyscalls generated files
When either makesyscalls.lua or syscalls.master changes, all of the
${GENERATED} targets are now out-of-date. With make jobs > 1, this means we
will run the makesyscalls script in parallel for the same ABI, generating
the same set of output files.

Prior to r356603 , there is a large window for interlacing output for some
of the generated files that we were generating in-place rather than staging
in a temp dir. After that, we still should't need to run the script more
than once per-ABI as the first invocation should update all of them. Add
.ORDER to do so cleanly.

Reviewed by:	brooks
Discussed with:	sjg
Differential Revision:	https://reviews.freebsd.org/D23099
2020-01-10 18:24:17 +00:00
Pawel Biernacki
a1d7296784 sysctl: mark more nodes as MPSAFE
vm.kvm_size and vm.kvm_free are read only and marked as MPSAFE on i386
already. Mark them as that on amd64 and arm64 too to avoid locking Giant.

Reviewed by:	kib (mentor)
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23039
2020-01-06 10:52:13 +00:00
Mateusz Guzik
2e77cad11d locks: add default delay struct
Use it for all primitives. This makes everything fit in 8 bytes.
2020-01-05 12:48:19 +00:00
Alan Cox
1c3a241032 When a copy-on-write fault occurs, pmap_enter() is called on to replace the
mapping to the old read-only page with a mapping to the new read-write page.
To destroy the old mapping, pmap_enter() must destroy its page table and PV
entries and invalidate its TLB entry.  This change simply invalidates that
TLB entry a little earlier, specifically, on amd64 and arm64, before the PV
list lock is held.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23027
2020-01-04 19:50:25 +00:00
Konstantin Belousov
b837dadd87 bhyve: terminate waiting loops if thread suspension is requested.
PR:	242724
Reviewed by:	markj
Reported and tested by:	Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
	 (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22881
2020-01-02 22:37:04 +00:00
Edward Tomasz Napierala
cc50333011 Add basic getcpu(2) support to linuxulator. The purpose of this
syscall is to query the CPU number and the NUMA domain the calling
thread is currently running on.  The third argument is ignored.
It doesn't do anything regarding scheduling - it's literally
just a way to query the current state, without any guarantees
you won't get rescheduled an opcode later.

This unbreaks Java from CentOS 8
(java-11-openjdk-11.0.5.10-0.el8_0.x86_64).

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22972
2019-12-31 22:01:08 +00:00
Edward Tomasz Napierala
da7627d797 Regen after r356229.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-12-31 16:01:37 +00:00
Edward Tomasz Napierala
a8bfc7a85c Fix definitions for Linux getcpu(2).
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-12-31 15:57:29 +00:00
Pawel Biernacki
54666dffa8 linux(4): implement copy_file_range(2)
copy_file_range(2) is implemented natively since r350315, make it available
for Linux binaries too.

Reviewed by:	kib (mentor), trasz (previous version)
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D22959
2019-12-30 18:11:06 +00:00
Edward Tomasz Napierala
ee0fe82ee2 Implement Linux syslog(2) syscall; just enough to make Linux dmesg(8)
utility work.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22465
2019-12-29 15:53:55 +00:00
Brandon Bergren
38f69a619e Unbreak build. It seems that mips and amd64 still pull in link_elf.c, so
we need to have elf_cpu_parse_dynamic() everywhere after all to avoid
an undefined symbol.
2019-12-24 16:52:10 +00:00
Alan Cox
50079417a5 Micro-optimize the control flow in _pmap_unwire_ptp(), and eliminate
unnecessary parentheses.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22893
2019-12-21 22:32:24 +00:00
Alan Cox
7c237b7c3a Correct a mistakenly inverted condition in r355833.
Noticed by:	kib
X-MFC with:	r355833
2019-12-20 20:46:26 +00:00
Alan Cox
48ef33180a When pmap_enter_{l2,pde}() are called to create a kernel mapping, they are
incrementing (and decrementing) the ref_count on kernel page table pages.
They should not do this.  Kernel page table pages are expected to have a
fixed ref_count.  Address this problem by refactoring pmap_alloc{_l2,pde}()
and their callers.  This also eliminates some duplicated code from the
callers.

Correctly implement PMAP_ENTER_NOREPLACE in pmap_enter_{l2,pde}() on kernel
mappings.

Reduce code duplication by defining a function, pmap_abort_ptp(), for
handling a common error case.

Handle a possible page table page leak in pmap_copy().  Suppose that we are
determining whether to copy a superpage mapping.  If we abort because there
is already a mapping in the destination pmap at the current address, then
simply decrementing the page table page's ref_count is correct, because the
page table page must have a ref_count > 1.  However, if we abort because we
failed to allocate a PV entry, this might be a just allocated page table
page that has a ref_count = 1, so we should call pmap_abort_ptp().

Simplify error handling in pmap_enter_quick_locked().

Reviewed by:	kib, markj (an earlier)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22763
2019-12-18 18:21:39 +00:00
Edward Tomasz Napierala
b5f20658ee Add compat.linux.emul_path, so it can be set to something other
than "/compat/linux".  Useful when you have several compat directories
with different Linux versions and you don't want to clash with files
installed by linux-c7 packages.

Reviewed by:	bcr (manpages)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22574
2019-12-16 20:07:04 +00:00
Edward Tomasz Napierala
cf69fe66d4 Add sync_file_range(2) implementation to linux(4); it's a thin wrapper
over the usual fsync(2).

This silences some warnings when running "apt-get upgrade".

Reviewed by:	brooks, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22371
2019-12-14 13:37:17 +00:00
Edward Tomasz Napierala
0cde2b3239 Regen after r355752.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22371
2019-12-14 13:32:37 +00:00
Edward Tomasz Napierala
0610f417a4 Fix definitions for linuxulator's sync_file_range(2).
Reviewed by:	brooks, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22371
2019-12-14 13:30:43 +00:00
John Baldwin
cbd03a9df2 Support software breakpoints in the debug server on Intel CPUs.
- Allow the userland hypervisor to intercept breakpoint exceptions
  (BP#) in the guest.  A new capability (VM_CAP_BPT_EXIT) is used to
  enable this feature.  These exceptions are reported to userland via
  a new VM_EXITCODE_BPT that includes the length of the original
  breakpoint instruction.  If userland wishes to pass the exception
  through to the guest, it must be explicitly re-injected via
  vm_inject_exception().

- Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH
  pseudo-register.  Injecting a BP# on Intel requires setting this to
  the length of the breakpoint instruction.  AMD SVM currently ignores
  writes to this register (but reports success) and fails to read it.

- Rework the per-vCPU state tracked by the debug server.  Rather than
  a single 'stepping_vcpu' global, add a structure for each vCPU that
  tracks state about that vCPU ('stepping', 'stepped', and
  'hit_swbreak').  A global 'stopped_vcpu' tracks which vCPU is
  currently reporting an event.  Event handlers for MTRAP and
  breakpoint exits loop until the associated event is reported to the
  debugger.

  Breakpoint events are discarded if the breakpoint is not present
  when a vCPU resumes in the breakpoint handler to retry submitting
  the breakpoint event.

- Maintain a linked-list of active breakpoints in response to the GDB
  'Z0' and 'z0' packets.

Reviewed by:	markj (earlier version)
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D20309
2019-12-13 19:21:58 +00:00
Mark Johnston
5cff1f4dc3 Introduce vm_page_astate.
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state.  The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.

This change merely adds the structure and updates references to atomic
state fields.  No functional change intended.

Reviewed by:	alc, jeff, kib
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22650
2019-12-10 18:14:50 +00:00
John Baldwin
23a5b4ed65 Use 4 byte stack alignment instead of 8 byte.
This was an old bug prior to r355373 and mostly harmless as it would
waste at most a handful of bytes on the stack.
2019-12-09 19:18:05 +00:00
John Baldwin
d8010b1175 Copy out aux args after the argument and environment vectors.
Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector.  Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack.  It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.

This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).

Reviewed by:	kib
Tested on:	amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22695
2019-12-09 19:17:28 +00:00
Konstantin Belousov
3e5b13991c amd64: properly set the start of the io permission bitmap for BSP
... after the initial common TSS is copied into its final location
during PCPU reallocation.

Reported by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-12-07 00:23:19 +00:00
Brooks Davis
af796bfa71 sysent: Reduce duplication and improve readability.
Use the power of variable to avoid spelling out source and generated
files too many times.  The previous Makefiles were hard to read, hard to
edit, and badly formatted.

Reviewed by:	kevans, emaste
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22714
2019-12-06 23:59:23 +00:00
Scott Long
961aacb107 Move the mds, irbs, and ssb mitigation knobs into machdep.mitigations.
They're in both the old and new places in HEAD for the moment for
discussion and transition.  The old locations will be garbage collected
in 4 weeks.  MFCs to 12 an 11 will keep the old and new for transition
purposes.

Reviewed by:	kib
MFC after:	4 weeks
Sponsored by:	Intel
Differential Revision:	https://reviews.freebsd.org/D22590
2019-12-06 02:43:05 +00:00
Warner Losh
f86e60008b Regularize my copyright notice
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
  All Rights Reserved on same line as other copyright holders (but not
  me). Other such holders are also listed last where it's clear.
2019-12-04 16:56:11 +00:00
John Baldwin
31174518d2 Use uintptr_t instead of register_t * for the stack base.
- Use ustringp for the location of the argv and environment strings
  and allow destp to travel further down the stack for the stackgap
  and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
  stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs.  This
  used to hold translated system call arguments, but hasn't been used
  since r159992.

Reviewed by:	kib
Tested on:	md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22501
2019-12-03 23:17:54 +00:00
Jeff Roberson
0f9e06e18b Fix a few places that free a page from an object without busy held. This is
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22611
2019-12-02 22:42:05 +00:00
Anish Gupta
84474332d3 bhyve amd: amdvi_dump_cmds() log the command for which the command completion failed. Completion is checked in poll mode although it can be done using interrupts.
No need to log all the commands in command ring but only the last one for which completion failed.

Reported by: np@freebsd.org
Reviewed by: np, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22566
2019-12-01 04:00:08 +00:00
Scott Long
33ce28d137 Remove the trm(4) driver
Differential Revision:	https://reviews.freebsd.org/D22575
2019-11-28 02:32:17 +00:00
Konstantin Belousov
13189065cb amd64: assert that EARLY_COUNTER does not corrupt memory.
Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22514
2019-11-24 19:02:13 +00:00
Andrew Turner
68cad68149 Add kcsan_md_unsupported from NetBSD.
It's used to ignore virtual addresses that may have a different physical
address depending on the CPU.

Sponsored by:	DARPA, AFRL
2019-11-21 13:22:23 +00:00
Andrew Turner
1b8c58f283 Fix for style(9): use parentheses around return statements.
Reported by:	kib
Sponsored by:	DARPA, AFRL
2019-11-21 12:29:20 +00:00
Andrew Turner
849aef496d Port the NetBSD KCSAN runtime to FreeBSD.
Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in
the FreeBSD kernel. It is a useful tool for finding data races between
threads executing on different CPUs.

This can be enabled by enabling KCSAN in the kernel config, or by using the
GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later
needs a compiler change to allow -fsanitize=thread that KCSAN uses.

Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22315
2019-11-21 11:22:08 +00:00
Konstantin Belousov
da248a69aa amd64: in double fault handler, do not rely on sane gsbase value.
Typical reasons for doublefault faults are either kernel stack
overflow or bugs in the code that manipulates protection CPU state.
The later code is the code which often has to set up gsbase for
kernel.  Switching to explicit load of GSBASE MSR in the fault handler
makes it more probable to output a useful information.

Now all IST handlers have nmi_pcpu structure on top of their stacks.

It would be even more useful to save gsbase value at the moment of the
fault.  I did not this because I do not want to modify PCB layout now.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-11-20 11:12:19 +00:00
Kyle Evans
f22a592111 Convert in-tree sysent targets to use new makesyscalls.lua
flua is bootstrapped as part of the build for those on older
versions/revisions that don't yet have flua installed. Once upgraded past
r354833, "make sysent" will again naturally work as expected.

Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D21894
2019-11-18 23:28:23 +00:00
John Baldwin
03b0d68c72 Check for errors from copyout() and suword*() in sv_copyout_args/strings.
Reviewed by:	brooks, kib
Tested on:	amd64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22401
2019-11-18 20:07:43 +00:00
Mark Johnston
85e06c728c Set MALLOC_DEBUG_MAXZONES=1 in GENERIC-NODEBUG configurations.
The purpose of this option is to make it easier to track down memory
corruption bugs by reducing the number of malloc(9) types that might
have recently been associated with a given chunk of memory.  However, it
increases fragmentation and is disabled in release kernels.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-11-18 20:03:28 +00:00
Konstantin Belousov
b2e1b88984 amd64 copyout: remove irrelevant comment.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-11-17 14:41:47 +00:00
Scott Long
e372160177 TSX Asynchronous Abort mitigation for Intel CVE-2019-11135.
This CVE has already been announced in FreeBSD SA-19:26.mcu.

Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.

Control knobs are:
machdep.mitigations.taa.enable:
        0 - no software mitigation is enabled
        1 - attempt to disable TSX
        2 - use the VERW mitigation
        3 - automatically select the mitigation based on processor
	    features.

machdep.mitigations.taa.state:
        inactive        - no mitigation is active/enabled
        TSX disable     - TSX is disabled in the bare metal CPU as well as
                        - any virtualized CPUs
        VERW            - VERW instruction clears CPU buffers
	not vulnerable	- The CPU has identified itself as not being
			  vulnerable

Nothing in the base FreeBSD system uses TSX.  However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.

Reviewed by:	emaste, imp, kib, scottph
Sponsored by:	Intel
Differential Revision:	22374
2019-11-16 00:26:42 +00:00
John Baldwin
5caa67fa84 Use a sv_copyout_auxargs hook in the Linux ELF ABIs.
Reviewed by:	emaste
Tested on:	amd64 (linux64 only), i386
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22356
2019-11-15 23:01:43 +00:00