Commit Graph

7591 Commits

Author SHA1 Message Date
Mateusz Guzik
a5db8ade37 amd64: __exclusive_cache_line pv_chunks_mutex and pv_list_locks
Note that pv_list_locks is an array and currently it fits 2 locks per line.
Resizing it and/or putting more locks in different lines requires several tests.

MFC after:	1 week
2017-10-20 03:38:58 +00:00
Mateusz Guzik
d95498d44f amd64: avoid acquiring dt lock if possible (which is the common case)
Discussed with:	kib
MFC after:	1 week
2017-10-20 03:30:02 +00:00
Mark Johnston
46fcd1af63 Move kernel dump offset tracking into MI code.
All of the kernel dump implementations keep track of the current offset
("dumplo") within the dump device. However, except for textdumps, they
all write the dump sequentially, so we can reduce code duplication by
having the MI code keep track of the current offset. The new
dump_append() API can be used to write at the current offset.

This is needed to implement support for kernel dump compression in the
MI kernel dump code.

Also simplify dump_encrypted_write() somewhat: use dump_write() instead
of duplicating its bounds checks, and get rid of the redundant offset
tracking.

Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11722
2017-10-18 15:38:05 +00:00
Konstantin Belousov
ca1f624517 Fix the pv_chunks pc_lru tailq handling in reclaim_pv_chunk().
For processing, reclaim_pv_chunk() removes the pv_chunk from the lru
list, which makes pc_lru linkage invalid.  Then the pmap lock is
released, which allows for other thread to free the last pv entry
allocated from the chunk and call free_pv_chunk(), which tries to
modify the invalid linkage.

Similarly, the chunk is inserted into the private tailq new_tail
temporary.  Again, free_pv_chunk() might be run and corrupt the
linkage for the new_tail after the pmap lock is dropped.

This is a consequence of r299788 elimination of pvh_global_lock, which
allowed for reclaim to run in parallel with other pmap calls which
free pv chunks.

As a fix, do not remove the chunk from pc_lru queue, use a marker to
remember the position in the queue iteration.  We can safely operate
on the chunks after the chunk's pmap is locked, we fetched the chunk
after the marker, and we checked that chunk pmap is same as we have
locked, because chunk removal from pc_lru requires both pv_chunk_mutex
and the pmap mutex owned.

Note that the fix lost an optimization which was present in the
previous algorithm.  Namely, new_tail requeueing rotated the pv chunks
list so that reclaim didn't scan the same pv chunks that couldn't be
freed (because they contained a wired and/or superpage mapping) on
every invocation.  An additional change is planned which would improve
this.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-16 15:16:24 +00:00
Konstantin Belousov
1df04cc069 Change amd64_get_ldt() to return 'EOF' when the LDT is not yet
allocated, when requested range of descriptors does not fit into
currently allocated LDT, or trim the return if the range fits
partially.  Before, the function returned EINVAL.

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-09 16:20:39 +00:00
Mateusz Guzik
801eec865f amd64: remove unused variable from pmap_delayed_invl_genp
Reported by:	gcc
MFC after:	1 week
2017-10-05 18:51:48 +00:00
Konstantin Belousov
a6d4b1dc48 Ensure that after sucessfull i386_set_ldt() call, other threads can
use LDT segments immediately.

If the i386_set_ldt() call created a first LDT descriptor (and
consequently created the LDT) for our address space, LDTR is currently
loaded only on the CPU executing the syscall.  Other CPUs executing
threads sharing the address space, would only load LDTR after context
switch.

Uncomment set_user_ldt_rv() and call it on all CPUs.  Remove critical
section inside set_user_ldt(), it is not needed in the context of call
from smp_rendezvous().

Set md_ldt after md_ldt_sd is initialized using the same code sequence
as in user_ldt_free().  Do the whole initialization in a critical
section, to not race with the context switching while we set LDT.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 13:12:59 +00:00
Konstantin Belousov
78d58cb6bc Avoid a race betweem freeing LDT and context switches.
cpu_switch.S uses curproc->p_md.md_ldt value as the flag indicating
presence of the process LDT.  The flag is checked and then ldt segment
descriptor is copied into the CPU' GDT slot.

Disallow context switches around clearing of the curproc LDT state by
performing the cleanup in critical section.  Ensure that the md_ldt
flag is cleared before md_ldt_sd descriptor content is destroyed by
inserting fence between the operations.

We depend on the x86 memory model strong ordering guarantees, in
particular, that cpu_switch.S observes the writes to md_ldt and
md_ldt_sd in the expected order.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:50:03 +00:00
Konstantin Belousov
287c718f32 Improve amd64_get_ldt().
Provide consistent snapshot of the requested descriptors by preventing
other threads from modifying LDT while we fetch the data, lock dt_lock
around the read.  Copy the data into intermediate buffer, which is
copied out after the lock is dropped.

Use guaranteed atomic (aligned volatile) reads of the descriptors to
use same-size atomic as CPU update to set A bit in the descriptor type
field.

Improve overflow checking for the descriptors range calculations and
remove unneeded casts.

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:29:34 +00:00
Konstantin Belousov
8fc26d9612 Minor style fix.
Requested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:19:55 +00:00
Konstantin Belousov
a58679a93b Complete r323772 on amd64.
Compilers are allowed to combine plain reads into group operations,
e.g. 64bit element copies of one array into another can be
legitimately optimized back to a memcpy() call, which r323772 tried to
prevent.

Qualify accesses to LDT descriptors with volatile dereference to
ensure that each write indeed occurs.  After that, our usual claim of
native-size aligned writes being atomic applies.

This is equivalent to atomic_store(memory_order_relaxed) C11 accesses,
but our machine/atomic.h does not provide corresponding primitive.

Noted and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:16:45 +00:00
Konstantin Belousov
98af67c78e Use ANSI C declaration for amd64_get_ldt().
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:07:38 +00:00
Konstantin Belousov
83d55c8ac2 Correct format specifiers in the debug code.
Requested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:01:39 +00:00
Konstantin Belousov
687a5be47a Remove useless comments.
Requested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 11:56:04 +00:00
Konstantin Belousov
a1fc6a8c49 On amd64, mark the set_user_ldt() function as static.
On i386, the function is used from the context switch code and needs
to be accessible externally.  Amd64 MD context switch does not lock an
LDT spinlock and inlines switching in assembly.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 11:50:01 +00:00
Konstantin Belousov
37afe7dfd2 Reduce default max_ldt_segment value to 512.
This makes the LDT to use only one page with default settings,
avoiding the need to find contigous 2 pages in KVA.  It seems that
most users are fine even with 512 segments.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 11:36:55 +00:00
Konstantin Belousov
843d5752f5 Update comment to note that we skip LDT reload for kthreads as well.
Noted by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-10-05 11:34:51 +00:00
Konstantin Belousov
9674d76346 Hide kernel stuff from userspace.
Sponsored by:	Mellanox Technologies
2017-10-02 08:37:43 +00:00
Andrew Turner
0e73a61997 To prepare for adding EFI runtime services support on arm64 move the
machine independent parts of the existing code to a new file that can be
shared between amd64 and arm64.

Reviewed by:	kib (previous version), imp
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12434
2017-10-01 19:52:47 +00:00
Konstantin Belousov
3cabd93e26 Do not do torn writes to active LDTs.
Care must be taken when updating the active LDT, since parallel
threads might try to load a segment descriptor which is currently
updated. Since the results are undefined, this cannot be ignored by
claiming to be an application race.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12413
2017-09-19 17:57:04 +00:00
Ilya Bakulin
a9bfc8d2ae Add MMCCAM-enabled kernel config for IMX6, reduce debug noice in MMCCAM kernels
CAM_DEBUG_TRACE results in way too much debug output than needed now.
When debugging, it's always possible to turn on trace level using camcontrol.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12110
2017-09-13 10:56:02 +00:00
Conrad Meyer
907f50fe04 Add smn(4) driver for AMD System Management Network
AMD Family 17h CPUs have an internal network used to communicate between
the host CPU and the PSP and SMU coprocessors.  It exposes a simple
32-bit register space.

Reviewed by:	avg (no +1), mjoras, truckman
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12217
2017-09-05 15:13:41 +00:00
Josh Paetzel
9d0ec2a920 Revert r323087
This needs more thinking out and consensus, and the commit message
was wrong AND there was a typo in the commit.

pointyhat:	jpaetzel
2017-09-01 17:03:48 +00:00
Josh Paetzel
0be04b100c Take options IPSEC out of GENERIC
PR:	220170
Submitted by:	delphij
Reviewed by:	ae, glebius
MFC after:	2 weeks
Differential Revision:	D11806
2017-09-01 15:54:53 +00:00
Josh Paetzel
3b65550eec Allow kldload tcpmd5
PR:	220170
MFC after:	2 weeks
2017-08-31 20:16:28 +00:00
Alexander Motin
ed9652da5f Add NTB driver for PLX/Avago/Broadcom PCIe switches.
This driver supports both NTB-to-NTB and NTB-to-Root Port modes (though
the second with predictable complications on hot-plug and reboot events).
I tested it with PEX 8717 and PEX 8733 chips, but expect it should work
with many other compatible ones too.  It supports up to two NT bridges
per chip, each of which can have up to 2 64-bit or 4 32-bit memory windows,
6 or 12 scratchpad registers and 16 doorbells.  There are also 4 DMA engines
in those chips, but they are not yet supported.

While there, rename Intel NTB driver from generic ntb_hw(4) to more specific
ntb_hw_intel(4), so now it is on par with this new ntb_hw_plx(4) driver and
alike to Linux naming.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2017-08-30 21:16:32 +00:00
Conrad Meyer
2744a0b69b Drop CACHE_LINE_SIZE to 64 bytes on x86
The actual cache line size has always been 64 bytes.

The 128 number arose as an optimization for Core 2 era Intel processors.  By
default (configurable in BIOS), these CPUs would prefetch adjacent cache
lines unintelligently.  Newer CPUs prefetch more intelligently.

The latest Core 2 era CPU was introduced in September 2008 (Xeon 7400
series, "Dunnington").  If you are still using one of these CPUs, especially
in a multi-socket configuration, consider locating the "adjacent cache line
prefetch" option in BIOS and disabling it.

Reported by:	mjg
Reviewed by:	np
Discussed with:	jhb
Sponsored by:	Dell EMC Isilon
2017-08-28 22:28:41 +00:00
Ryan Libby
0d4e7ec5f3 amd64: drop q suffix from rd[fg]sbase for gas compatibility
Reviewed by:	kib
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12133
2017-08-26 23:13:18 +00:00
Konstantin Belousov
9fc847133e Save KGSBASE in pcb before overriding it with the guest value.
Reported by:	lwhsu, mjoras
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	18 days
2017-08-24 10:49:53 +00:00
Konstantin Belousov
761fb3ef29 Ensure that fs/gs bases are stored in pcb before copying the pcb for
new process or thread.

Reported and tested by:	ae, dhw
Sponsored by:	The FreeBSD Foundation
MFC after:	20 days
2017-08-22 18:15:47 +00:00
Konstantin Belousov
3e902b3d76 Make WRFSBASE and WRGSBASE instructions functional.
Right now, we enable the CR4.FSGSBASE bit on CPUs which support the
facility (Ivy and later), to allow usermode to read fs and gs bases
without syscalls. This bit also controls the write access to bases
from userspace, but WRFSBASE and WRGSBASE instructions currently
cannot be used, because return path from both exceptions or interrupts
overrides bases with the values from pcb.

Supporting the instructions is useful because this means that usermode
can implement green-threads completely in userspace without issuing
syscalls to change all of the machine context.

Support is implemented by saving the fs base and user gs base when
PCB_FULL_IRET flag is set. The flag is set on the context switch,
which potentially causes clobber of the bases due to activation of
another context, and when explicit modification of the user context by
a syscall or exception handler is performed. In particular, the patch
moves setting of the flag before syscalls change context.

The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on
entry from userspace can be considered a bug fixes on its own.

Reviewed by:	jhb (previous version)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D12023
2017-08-21 17:38:02 +00:00
Konstantin Belousov
9ed84d55c1 Simplify the code.
Noted by:	Oliver Pinter
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-20 11:18:16 +00:00
Konstantin Belousov
43b7b1f29b Simplify amd64 trap().
- Use more relevant name 'signo' instead of 'i' for the local variable
  which contains a signal number to send for the current exception.
- Eliminate two labels 'userout' and 'out' which point to the very end
  of the trap() function.  Instead use return directly.
- Re-indent the prot_fault_translation block by reducing if() nesting.
- Some more monor style changes.

Requested and reviewed by:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-20 09:52:25 +00:00
Konstantin Belousov
4031ebef84 Trim excessive 'extern' and remove unused declaration.
Reviewed by:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-20 09:42:09 +00:00
Konstantin Belousov
dad2e0e420 Use ANSI C declaration for trap_pfault(). Style.
Reviewed by:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-20 09:39:10 +00:00
Ruslan Bukin
5651294282 Fix module unload when SGX support is not present in CPU.
Sponsored by:	DARPA, AFRL
2017-08-18 14:47:06 +00:00
Mark Johnston
01938d3666 Rename mkdumpheader() and group EKCD functions in kern_shutdown.c.
This helps simplify the code in kern_shutdown.c and reduces the number
of globally visible functions.

No functional change intended.

Reviewed by:	cem, def
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11603
2017-08-18 04:04:09 +00:00
Mark Johnston
50ef60dabe Factor out duplicated kernel dump code into dump_{start,finish}().
dump_start() and dump_finish() are responsible for writing kernel dump
headers, optionally writing the key when encryption is enabled, and
initializing the initial offset into the dump device.

Also remove the unused dump_pad(), and make some functions static now that
they're only called from kern_shutdown.c.

No functional change intended.

Reviewed by:	cem, def
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11584
2017-08-18 03:52:35 +00:00
Conrad Meyer
dc6a82801d x86: Add dynamic interrupt rebalancing
Add an option to dynamically rebalance interrupts across cores
(hw.intrbalance); off by default.

The goal is to minimize preemption. By placing interrupt sources on distinct
CPUs, ithreads get preferentially scheduled on distinct CPUs.  Overall
preemption is reduced and latency is reduced. In our workflow it reduced
"fighting" between two high-frequency interrupt sources.  Reduced latency
was proven by, e.g., SPEC2008.

Submitted by:	jeff@ (earlier version)
Reviewed by:	kib@
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D10435
2017-08-16 18:48:53 +00:00
Ruslan Bukin
7dea76609b Rename macro DEBUG to SGX_DEBUG.
This fixes LINT kernel build.

Reported by:	lwhsu
Sponsored by:	DARPA, AFRL
2017-08-16 13:44:46 +00:00
Ruslan Bukin
2164af29a0 Add support for Intel Software Guard Extensions (Intel SGX).
Intel SGX allows to manage isolated compartments "Enclaves" in user VA
space. Enclaves memory is part of processor reserved memory (PRM) and
always encrypted. This allows to protect user application code and data
from upper privilege levels including OS kernel.

This includes SGX driver and optional linux ioctl compatibility layer.
Intel SGX SDK for FreeBSD is also available.

Note this requires support from hardware (available since late Intel
Skylake CPUs).

Many thanks to Robert Watson for support and Konstantin Belousov
for code review.

Project wiki: https://wiki.freebsd.org/Intel_SGX.

Reviewed by:	kib
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11113
2017-08-16 10:38:06 +00:00
Konstantin Belousov
baec5778ed Print whole machine state on double fault.
It is quite useful when double fault is not caused by a stack overflow.

Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-14 11:23:07 +00:00
Konstantin Belousov
0fd7ea1f21 Add {rd,wr}{fs,gs}base C wrappers for instructions.
Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-14 11:20:54 +00:00
Konstantin Belousov
7bf0049e48 Style.
Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-08-14 11:20:10 +00:00
Jung-uk Kim
b5669d0aa8 Split identify_cpu() into two functions for amd64 as we do for i386. This
reduces diff between amd64 and i386.  Also, it fixes a regression introduced
in r322076, i.e., identify_hypervisor() failed to identify some hypervisors.
This function assumes cpu_feature2 is already initialized.

Reported by:	dexuan
Tested by:	dexuan
2017-08-09 18:09:09 +00:00
Warner Losh
9057f54d74 Fail to open efirt device when no EFI on system.
libefivar expects opening /dev/efi to indicate if the we can make efi
runtime calls. With a null routine, it was always succeeding leading
efi_variables_supported() to return the wrong value. Only succeed if
we have an efi_runtime table. Also, while I'm hear, out of an
abundance of caution, add a likely redundant check to make sure
efi_systbl is not NULL before dereferencing it. I know it can't be
NULL if efi_cfgtbl is non-NULL, but the compiler doesn't.
2017-08-08 20:44:16 +00:00
Konstantin Belousov
fe04f5e9d0 Avoid DI recursion when reclaim_pv_chunk() is called from
pmap_advise() or pmap_remove().

Reported and tested by:	pho (previous version)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-07 17:29:54 +00:00
Konstantin Belousov
1a47eac0f5 Explain why delayed invalidation is not required in pmap_protect() and
pmap_remove_pages().

Submitted by:	alc
MFC after:	1 week
2017-08-07 17:23:10 +00:00
Jung-uk Kim
0105034487 Detect hypervisors early. We used to set lower hz on hypervisors by default
but it was broken since r273800 (and r278522, its MFC to stable/10) because
identify_cpu() is called too late, i.e., after init_param1().

MFC after:	3 days
2017-08-05 06:56:46 +00:00
Conrad Meyer
6f240e18b5 x86: Tag some intrinsics with __pure2
Some C wrappers for x86 instructions do not touch global memory and only act
on their arguments; they can be marked __pure2, aka __const__.  Without this
annotation, Clang 3.9.1 is not intelligent enough on its own to grok that
these functions are __const__.

Submitted by:	Anton Rang <anton.rang AT isilon.com>
Sponsored by:	Dell EMC Isilon
2017-08-03 22:28:30 +00:00
Ed Schouten
c852847584 Keep top page on CloudABI to work around AMD Ryzen stability issues.
Similar to r321899, reduce sv_maxuser by one page inside of CloudABI.
This ensures that the stack, the vDSO and any allocations cannot touch
the top page of user virtual memory.

Considering that CloudABI userspace is completely oblivious to virtual
memory layout, don't bother making this conditional based on the CPU of
the running system.

Reviewed by:	kib, truckman
Differential Revision:	https://reviews.freebsd.org/D11808
2017-08-02 13:08:10 +00:00
Mateusz Guzik
fd1d4c8159 amd64: annotate the syscall return address check with __predict_false
before:
   0xffffffff80b03ebb <+2059>:	mov    0x460(%r14),%rax
   0xffffffff80b03ec2 <+2066>:	mov    0x98(%rax),%rax
   0xffffffff80b03ec9 <+2073>:	shr    $0x2f,%rax
   0xffffffff80b03ecd <+2077>:	je     0xffffffff80b03edd <amd64_syscall+2093>
   0xffffffff80b03ecf <+2079>:	mov    0x3f8(%r14),%rax
   0xffffffff80b03ed6 <+2086>:	orl    $0x1,0xc8(%rax)
   0xffffffff80b03edd <+2093>:	add    $0xf8,%rsp

after:
   0xffffffff80b03ebb <+2059>:	mov    0x460(%r14),%rax
   0xffffffff80b03ec2 <+2066>:	mov    0x98(%rax),%rax
   0xffffffff80b03ec9 <+2073>:	shr    $0x2f,%rax
   0xffffffff80b03ecd <+2077>:	jne    0xffffffff80b03eef <amd64_syscall+2111>
   0xffffffff80b03ecf <+2079>:	add    $0xf8,%rsp

Reviewed by:	kib
MFC after:	1 week
2017-08-02 11:25:38 +00:00
Konstantin Belousov
6632a4330f Do not call trapsignal() after handling usermode fault or interrupt,
when a signal is not intended to be sent.

The variable holding the signal number to send is left uninitialized,
which sometimes triggers invalid signal checks.

For NMI, a return to usermode without ast processing is done.  On the
other hand, for spurious dtrace probe interrupt it is usermode which
triggered the interrupt, so handle it through userret() as any other
fault.

Reported by:	Nils Beyer <nbe@renzel.net>
PR:	221151
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-02 10:12:10 +00:00
Don Lewis
cd155b5603 Lower the amd64 shared page, which contains the signal trampoline,
from the top of user memory to one page lower on machines with the
Ryzen (AMD Family 17h) CPU.  This pushes ps_strings and the stack
down by one page as well.  On Ryzen there is some sort of interaction
between code running at the top of user memory address space and
interrupts that can cause FreeBSD to either hang or silently reset.
This sounds similar to the problem found with DragonFly BSD that
was fixed with this commit:
  https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20
but our signal trampoline location was already lower than the address
that DragonFly moved their signal trampoline to.  It also does not
appear to be related to SMT as described here:
  https://www.phoronix.com/forums/forum/hardware/processors-memory/955368-some-ryzen-linux-users-are-facing-issues-with-heavy-compilation-loads?p=955498#post955498

  "Hi, Matt Dillon here. Yes, I did find what I believe to be a
   hardware issue with Ryzen related to concurrent operations. In a
   nutshell, for any given hyperthread pair, if one hyperthread is
   in a cpu-bound loop of any kind (can be in user mode), and the
   other hyperthread is returning from an interrupt via IRETQ, the
   hyperthread issuing the IRETQ can stall indefinitely until the
   other hyperthread with the cpu-bound loop pauses (aka HLT until
   next interrupt). After this situation occurs, the system appears
   to destabilize. The situation does not occur if the cpu-bound
   loop is on a different core than the core doing the IRETQ. The
   %rip the IRETQ returns to (e.g. userland %rip address) matters a
   *LOT*. The problem occurs more often with high %rip addresses
   such as near the top of the user stack, which is where DragonFly's
   signal trampoline traditionally resides. So a user program taking
   a signal on one thread while another thread is cpu-bound can cause
   this behavior. Changing the location of the signal trampoline
   makes it more difficult to reproduce the problem. I have not
   been because the able to completely mitigate it. When a cpu-thread
   stalls in this manner it appears to stall INSIDE the microcode
   for IRETQ. It doesn't make it to the return pc, and the cpu thread
   cannot take any IPIs or other hardware interrupts while in this
   state."
since the system instability has been observed on FreeBSD with SMT
disabled.  Interrupts to appear to play a factor since running a
signal-intensive process on the first CPU core, which handles most
of the interrupts on my machine, is far more likely to trigger the
problem than running such a process on any other core.

Also lower sv_maxuser to prevent a malicious user from using mmap()
to load and execute code in the top page of user memory that was made
available when the shared page was moved down.

Make the same changes to the 64-bit Linux emulator.

PR:		219399
Reported by:	nbe@renzel.net
Reviewed by:	kib
Reviewed by:	dchagin (previous version)
Tested by:	nbe@renzel.net (earlier version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11780
2017-08-02 01:43:35 +00:00
Mark Johnston
2375aaa8e9 Batch updates to v_wire_count when freeing page table pages on x86.
The removed release stores are not needed since stores are totally
ordered on i386 and amd64.

Reviewed by:	alc, kib (previous revision)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11790
2017-08-01 05:26:30 +00:00
Konstantin Belousov
9adf30b0c3 Remove unused symbols.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-07-30 21:52:22 +00:00
Dmitry Chagin
c151945c86 Avoid using [LINUX_]SHAREDPAGE constant directly in the vdso code.
This is needed for https://reviews.freebsd.org/D11780.

Reported by:	kib@
2017-07-30 21:24:20 +00:00
Alan Cox
782e896088 Add support for pmap_enter(..., psind=1) to the amd64 pmap. In other words,
add support for explicitly requesting that pmap_enter() create a 2MB page
mapping.  (Essentially, this feature allows the machine-independent layer to
create superpage mappings preemptively, and not wait for automatic promotion
to occur.)

Export pmap_ps_enabled() to the machine-independent layer.

Add a flag to pmap_pv_insert_pde() that specifies whether it should fail or
reclaim a PV entry when one is not available.

Refactor pmap_enter_pde() into two functions, one by the same name, that is
a general-purpose function for creating PDE PG_PS mappings, and another,
pmap_enter_2mpage(), that is used to prefault 2MB read- and/or execute-only
mappings for execve(2), mmap(2), and shmat(2).

Submitted by:	Yufeng Zhou <yz70@rice.edu> (an earlier version)
Reviewed by:	kib, markj
Tested by:	pho
MFC after:	10 days
Differential Revision:	https://reviews.freebsd.org/D11556
2017-07-23 06:33:58 +00:00
Ryan Libby
b1a987bb34 __pcpu: gcc -Wredundant-decls
Pollution from counter.h made __pcpu visible in amd64/pmap.c.  Delete
the existing extern decl of __pcpu in amd64/pmap.c and avoid referring
to that symbol, instead accessing the pcpu region via PCPU_SET macros.
Also delete an unused extern decl of __pcpu from mp_x86.c.

Reviewed by:	kib
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11666
2017-07-21 17:11:36 +00:00
Ryan Libby
5e6f40bdef efi: restrict visibility of EFIABI_ATTR-declared functions
In-tree gcc (4.2) doesn't understand __attribute__((ms_abi))
(EFIABI_ATTR).  Avoid declaring functions with that attribute when the
compiler is detected to be gcc < 4.4.

Reviewed by:	kib, imp (previous version)
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11636
2017-07-20 06:47:06 +00:00
Alan Cox
5818f05d39 Style-only change: Consistently use the variable name "pdpg" throughout
this file.  Previously, half of the pointers to a vm_page being used as
a page directory page were named "pdpg" and the rest were named "mpde".

Discussed with:	kib
MFC after:	1 week
2017-07-15 16:42:55 +00:00
Alan Cox
68a870f3bc Extract the innermost loop of pmap_remove() out into its own function,
pmap_remove_ptes().  (This new function will also be used by an upcoming
change to pmap_enter() that adds support for psind == 1 mappings.)

Submitted by:	Yufeng Zhou <yz70@rice.edu> (an earlier version)
Reviewed by:	kib, markj
MFC after:	1 week
2017-07-15 01:49:54 +00:00
Konstantin Belousov
60686c3703 Fix size argument to vm_pager_allocate(), it is in bytes, not in pages.
It is believed to be only cosmetic.

Noted by:	andrew
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-07-13 08:23:37 +00:00
Konstantin Belousov
e766a6bb01 Revert r320936 to recommit with the correct log message. 2017-07-13 08:23:12 +00:00
Konstantin Belousov
89f91fa960 It is believed to be only cosmetic.
Noted by:	andrew
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-07-13 08:19:50 +00:00
Ian Lepore
b524a31593 Protect access to the AT realtime clock with its own mutex.
The mutex protecting access to the registered realtime clock should not be
overloaded to protect access to the atrtc hardware, which might not even be
the registered rtc. More importantly, the resettodr mutex needs to be
eliminated to remove locking/sleeping restrictions on clock drivers, and
that can't happen if MD code for amd64 depends on it. This change moves the
protection into what's really being protected: access to the atrtc date and
time registers.

This change also adds protection when the clock is accessed from
xentimer_settime(), which bypasses the resettodr locking.

Differential Revision:	https://reviews.freebsd.org/D11483
2017-07-12 02:42:57 +00:00
Warner Losh
a94a63f0a6 An MMC/SD/SDIO stack using CAM
Implement the MMC/SD/SDIO protocol within a CAM framework. CAM's
flexible queueing will make it easier to write non-storage drivers
than the legacy stack. SDIO drivers from both the kernel and as
userland daemons are possible, though much of that functionality will
come later.

Some of the CAM integration isn't complete (there are sleeps in the
device probe state machine, for example), but those minor issues can
be improved in-tree more easily than out of tree and shouldn't gate
progress on other fronts. Appologies to reviews if specific items
have been overlooked.

Submitted by: Ilya Bakulin
Reviewed by: emaste, imp, mav, adrian, ian
Differential Review: https://reviews.freebsd.org/D4761

merge with first commit, various compile hacks.
2017-07-09 16:57:24 +00:00
Ryan Libby
5228ad10d4 amd-vi: gcc build errors
amdvi_cmp_wait: gcc complained about a malformed string behind an ifdef.

struct amdvi_dte: widen the type of the first reserved bitfield so that
the packed representation would not cross an alignment boundary for that
type. Apparently that causes in-tree gcc (4.2) to insert padding
(despite packed, resulting in a wrong structure definition), and causes
more modern gcc to emit a warning.

ivrs_hdr_iterate_tbl: delete a misleading check about header length
being less than 0 (the type is unsigned) and replace it with a check
that the length doesn't exceed the table size.

Reviewed by:	anish, grehan
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11485
2017-07-07 06:37:19 +00:00
Sean Bruno
6ef9566177 Garbage collect kernel option TWA_FLASH_FIRMWARE
Submitted by:	 kevin.bowling0kev009.com
Differential Revision:	https://reviews.freebsd.org/D11387
2017-07-03 19:33:50 +00:00
Dmitry Chagin
a0c59c7afd Add support for musl consumers to the Linuxulator.
PR:		213809
Submitted by:	Yonas Yanfa
Reported by:	Yonas Yanfa
MFC after:	1 week
Relnotes:	yes
2017-07-03 10:24:49 +00:00
Alan Cox
510cdf22a2 When "force" is specified to pmap_invalidate_cache_range(), the given
start address is not required to be page aligned.  However, the loop
within pmap_invalidate_cache_range() that performs the actual cache
line invalidations requires that the starting address be truncated to
a multiple of the cache line size.  This change corrects an error in
that truncation.

Submitted by:	Brett Gutstein <bgutstein@rice.edu>
Reviewed by:	kib
MFC after:	1 week
2017-07-01 16:42:09 +00:00
Jason A. Harmening
eb36b1d0bc Clean up MD pollution of bus_dma.h:
--Remove special-case handling of sparc64 bus_dmamap* functions.
  Replace with a more generic mechanism that allows MD busdma
  implementations to generate inline mapping functions by
  defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>.  This
  is currently useful for sparc64, x86, and arm64, which all
  implement non-load dmamap operations as simple wrappers
  around map objects which may be bus- or device-specific.

--Remove NULL-checked bus_dmamap macros.  Implement the
  equivalent NULL checks in the inlined x86 implementation.
  For non-x86 platforms, these checks are a minor pessimization
  as those platforms do not currently allow NULL maps.  NULL
  maps were originally allowed on arm64, which appears to have
  been the motivation behind adding arm[64]-specific barriers
  to bus_dma.h, but that support was removed in r299463.

--Simplify the internal interface used by the bus_dmamap_load*
  variants and move it to bus_dma_internal.h

--Fix some drivers that directly include sys/bus_dma.h
  despite the recommendations of bus_dma(9)

Reviewed by:	kib (previous revision), marius
Differential Revision:	https://reviews.freebsd.org/D10729
2017-07-01 05:35:29 +00:00
Konstantin Belousov
c377ff617b Translate between abridged and full x87 tags for compat32
ptrace(PT_GETFPREGS).

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-24 11:38:31 +00:00
Konstantin Belousov
2d88da2f06 Move struct syscall_args syscall arguments parameters container into
struct thread.

For all architectures, the syscall trap handlers have to allocate the
structure on the stack.  The structure takes 88 bytes on 64bit arches
which is not negligible.  Also, it cannot be easily found by other
code, which e.g. caused duplication of some members of the structure
to struct thread already.  The change removes td_dbg_sc_code and
td_dbg_sc_nargs which were directly copied from syscall_args.

The structure is put into the copied on fork part of the struct thread
to make the syscall arguments information correct in the child after
fork.

This move will also allow several more uses shortly.

Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
X-Differential revision:	https://reviews.freebsd.org/D11080
2017-06-12 21:03:23 +00:00
Konstantin Belousov
43f41dd393 Make struct syscall_args visible to userspace compilation environment
from machine/proc.h, consistently on all architectures.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
X-Differential revision:	https://reviews.freebsd.org/D11080
2017-06-12 20:53:44 +00:00
Alan Cox
d118a00f3c Eliminate duplication of the pmap and pv list unlock operations in
pmap_enter() by implementing a single return path.  Otherwise, the
duplication will only increase with the upcoming support for psind == 1.

Reviewed by:	kib (some time ago)
2017-06-03 17:24:13 +00:00
Dmitry Chagin
9811d215b9 In r246085 some bits that are MI movied out into headers in compat/linux,
but I missed that when I commited x86_64 Linuxulator. So remove the duplicates.

MFC after:	1 week
2017-05-28 08:46:57 +00:00
Edward Tomasz Napierala
43af586011 Bump default MAXTSIZ (kern.maxtsiz) from 128MB to 32GB. The old limit
prevents one from running eg clang built with debug; the new one is
arbitrary (equal to MAXDSIZ) and... well, should be quite future-proof.

Same fix might be applicable to other 64 bit architectures; I'll ask
their respective maintainers to make sure it won't break anything.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D10758
2017-05-17 08:38:41 +00:00
Ed Maste
3e85b721d6 Remove register keyword from sys/ and ANSIfy prototypes
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by:	cem, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10193
2017-05-17 00:34:34 +00:00
Conrad Meyer
fcf0952c80 Correct page frame mask constant used in pmap_change_attr_locked
This was introduced in r290156.  It's present in 11.0, but not any 10.x
release unless someone decided to MFC it.

It affects ordinary pages right above the DMAP limit, which is effectively
system memory rounded up to a 1 GB (3rd level superpage) boundary (or up to
a minimum of 4 GB, on small systems).

Reported by:	vangyzen
Reviewed by:	kib, alc
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D4030
2017-05-16 16:20:22 +00:00
Konstantin Belousov
bd101a6648 Ensure that resume path on amd64 only accesses page tables for normal
operation after processor is configured to allow all required
features.

In particular, NX must be enabled in EFER, otherwise load of page
table element with nx bit set causes reserved bit page fault.  Since
malloc uses direct mapping for small allocations, in particular for
the suspension pcbs, and DMAP is nx after r316767, this commit tripped
fault on resume path.

Restore complete state of EFER while wakeup code is still executing
with custom page table, before calling resumectx, instead of trying to
guess which features might be needed before resumectx restored EFER on
its own.

Bisected and tested by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-05-15 20:52:43 +00:00
Sepherosa Ziehau
c23a0b35c1 pcicfg: Fix direct calls of pci_cfg{read,write} on systems w/o PCI host bridge.
Reported by:	dexuan@
Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision: https://reviews.freebsd.org/D10564
2017-05-04 05:28:46 +00:00
Anish Gupta
07ff474a68 Add AMD IOMMU/AMD-Vi support in bhyve for passthrough/direct assignment to VMs. To enable AMD-Vi, set hw.vmm.amdvi.enable=1.
Reviewed by:bcr
Approved by:grehan
Tested by:rgrimes
Differential Revision:https://reviews.freebsd.org/D10049
2017-04-30 02:08:46 +00:00
Jung-uk Kim
af1973281e Use kmem_malloc() instead of malloc(9) for the native amd64 filter.
r316767 broke the BPF JIT compiler for amd64 because malloc()'d space is no
longer executable.

Discussed with:	kib, alc
2017-04-17 22:02:09 +00:00
Jung-uk Kim
e329e330d4 Move declarations for a machine-dependent function to the header file. 2017-04-17 21:51:26 +00:00
Gleb Smirnoff
83c9dea1ba - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place.  To do per-cpu stats, convert all fields that previously were
  maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
  before we have set up UMA and we can do counter_u64_alloc(), provide an
  early counter mechanism:
  o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
  o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
    so that at early stages of boot, before counters are allocated we already
    point to a counter that can be safely written to.
  o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
  to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by:	kib, gallatin, marius, lidl
Differential Revision:	https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
Gleb Smirnoff
75c4b0b5ac Remove unused assembly symbols pointing to vmmeter. 2017-04-17 17:18:07 +00:00
Gleb Smirnoff
9ed01c32e0 All these files need sys/vmmeter.h, but now they got it implicitly
included via sys/pcpu.h.
2017-04-17 17:07:00 +00:00
Konstantin Belousov
33c72b24de Map DMAP as nx.
Demotions preserve PG_NX, so it is enough to set nx bit for initial
lowest-level paging entries.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-04-13 15:49:55 +00:00
Patrick Kelsey
67d955aab4 Corrected misspelled versions of rendezvous.
The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().

Reviewed by:	gnn, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10313
2017-04-09 02:00:03 +00:00
Tai-hwa Liang
7ece126ed8 Trying to be more compatible with Linux if.h definitions:
- renaming l_ifreq::ifru_metric to l_ifreq::ifru_ivalue;
	- adding a definition for ifr_ifindex which points to l_ifreq::ifru_ivalue.

A quick search indicates that Linux already got the above changes since 2.1.14.

Reviewed by:	kib, marcel, dchagin
MFC after:	1 week
2017-04-08 14:41:39 +00:00
Andriy Gapon
978f3da16f revert r315959 because it causes build problems
The change introduced a dependency between genassym.c and header files
generated from .m files, but that dependency is not specified in the
make files.

Also, the change could be not as useful as I thought it was.

Reported by:	dchagin, Manfred Antar <null@pozo.com>, and many others
2017-03-27 12:34:29 +00:00
Bruce Evans
f434f3515b Fix printing of negative offsets (typically from frame pointers) again.
I fixed this in 1997, but the fix was over-engineered and fragile and
was broken in 2003 if not before.  i386 parameters were copied to 8
other arches verbatim, mostly after they stopped working on i386, and
mostly without the large comment saying how the values were chosen on
i386.  powerpc has a non-verbatim copy which just changes the uncritical
parameter and seems to add a sign extension bug to it.

Just treat negative offsets as offsets if they are no more negative than
-db_offset_max (default -64K), and remove all the broken parameters.

-64K is not very negative, but it is enough for frame and stack pointer
offsets since kernel stacks are small.

The over-engineering was mainly to go more negative than -64K for the
negative offset format, without affecting printing for more than a
single address.

Addresses in the top 64K of a (full 32-bit or 64-bit) address space
are now printed less well, but there aren't many interesting ones.
For arches that have many interesting ones very near the top (e.g.,
68k has interrupt vectors there), there would be no good limit for
the negative offset format and -64K is a good as anything.
2017-03-26 18:46:35 +00:00
Andriy Gapon
a7b4c009e1 specific end of interrupt implementation for AMD Local APIC
The change is more intrusive than I would like because the feature
requires that a vector number is written to a special register.
Thus, now the vector number has to be provided to lapic_eoi().
It was readily available in the IO-APIC and MSI cases, but the IPI
handlers required more work.
Also, we now store the VMM IPI number in a global variable, so that it
is available to the justreturn handler for the same reason.

Reviewed by:	kib
MFC after:	6 weeks
Differential Revision: https://reviews.freebsd.org/D9880
2017-03-25 18:45:09 +00:00
Dmitry Chagin
6c2a934b79 Implement Linux mincore() system call.
This is necessary for the upcoming drm-next.

Suggested by:	hselasky@
MFC after:	1 month
2017-03-25 15:47:29 +00:00
Bruce Evans
4e501eb7cc Remove buggy adjustment of page tables in db_write_bytes().
Long ago, perhaps only on i386, kernel text was mapped read-only and
it was necessary to change the mapping to read-write to set breakpoints
in kernel text.  Other writes by ddb to kernel text were also allowed.
This write protection is harder to implement with 4MB pages, and was
lost even for 4K pages when 4MB pages were implemented.  So changing
the mapping became useless.  It was actually worse than useless since
it followed followed various null and otherwise garbage pointers to
not change random memory instead of the mapping.  (On i386s, the
pointers became good in pmap_bootstrap(), and on amd64 the pointers
became bad in pmap_bootstrap() if not before.)

Another bug broke detection of following of null pointers on i386,
except early in boot where not detecting this was a feature.  When
I fixed the bug, I accidentally broke the feature and soon got traps
in db_write_bytes().  Setting breakpoints early in ddb was broken.

kib pointed out that a clean way to do the adjustment would be to use
a special [sub]map giving a small window on the bytes to be written.

The trap handler didn't know how to fix up errors for pagefaults
accessing the map itself.  Such errors rarely need fixups, since most
traps for the map are for the first access which is a read.

Reviewed by:	kib
2017-03-24 17:34:55 +00:00
Ed Schouten
ebfc28088b Stop providing the compat_3_brand.
As of r315860, the ELF image activator works fine for CloudABI without it.

Reviewed by:	kib
MFC after:	2 weeks
2017-03-23 14:12:21 +00:00
Konstantin Belousov
2274ab3d7b Update r315753 with the proper flag name.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-22 22:28:13 +00:00
Konstantin Belousov
1438fe3cf2 Add a flag BI_BRAND_ONLY_STATIC to specify that the brand only
matches static binaries.

Interpretation of the 'static' there is that the binary must not
specify an interpreter.  In particular, shared objects are matched by
the brand if BI_CAN_EXEC_DYN is also set.

This improves precision of the brand matching, which should eliminate
surprises due to brand ordering.

Revert r315701.

Discussed with and tested by:	ed (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-22 22:23:01 +00:00
Mark Johnston
3d6732549d Add support for 8- and 16-bit atomic_(f)cmpset to x86.
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10068
2017-03-22 17:29:04 +00:00
Ed Schouten
ae2373da91 Set the interpreter path to /nonexistent.
CloudABI executables are statically linked and don't have an
interpreter. Setting the interpreter path to NULL used to work
previously, but r314851 introduced code that checks the string
unconditionally. Running CloudABI executables now causes a null pointer
dereference.

Looking at the rest of imgact_elf.c, it seems various other codepaths
already leaned on the fact that the interpreter path is set. Let's just
go ahead and pick an obviously incorrect interpreter path to appease
imgact_elf.c.

MFC after:	1 week
2017-03-22 07:05:27 +00:00
Dmitry Chagin
b1ba0846f1 Implement getrandom() syscall.
Note. GRND_RANDOM option is not supported for now.

MFC after:	1 month
2017-03-18 18:34:29 +00:00
Dmitry Chagin
857129394d To reduce code duplication move socket defines to the MI path.
MFC after:	1 week
2017-03-18 18:23:30 +00:00
Bruce Evans
ff17a6773e Don't access the reserved registers %dr4 and %dr5 on i386.
On the original i386, %dr[4-5] were unimplemented but not very clearly
reserved, so debuggers read them to print them.  i386 was still doing
this.

On the original athlon64, %dr[4-5] are documented as reserved but are
aliased to %dr[6-7] unless CR4_DE is set, when accessing them traps.

On 2 of my systems, accessing %dr[4-5] trapped sometimes.  On my Haswell
system, the apparent randomness was because the boot CPU starts with
CR4_DE set while all other CPUs start with CR4_DE clear.  FreeBSD
doesn't support the data breakpoints enabled by CR4_DE and it never
changes this flag, so the flag remains different across CPUs and
the behaviour seemed inconsistent except while booting when the CPU
doesn't change.

The invalid accesses broke:
- read access for printing the registers in ddb "show watches" on CPUs
  with CR4_DE set
- read accesses in fill_dbregs() on CPUs with CR4_DE set.  This didn't
  implement panic(3) since the user case always skipped %dr[4-5].
- write accesses in set_dbregs().  This also didn't affect userland.
  When it didn't trap, the aliasing made it fragile.

Don't print the dummy (zero) values of %dr[4-5] in "show watches" for
i386 or amd64.  Fix style bugs near this printing.

amd64 also has space in the dbregs struct for the reserved %dr[8-15]
and already didn't print the dummy values for these, and never accessed
any of the 10 reserved debug registers.

Remove cpufuncs for making the invalid accesses.  Even amd64 had these.
2017-03-17 13:49:05 +00:00
Peter Grehan
3da443021f Hide the AMD MONITORX/MWAITX capability.
Otherwise, recent Linux guests will use these instructions, resulting
in #UD exceptions since bhyve doesn't implement MONITOR/MWAIT exits.

This fixes boot-time hangs in recent Linux guests on Ryzen CPUs
(and probably Bulldozer aka AMD FX as well).

Reviewed by:	kib
MFC after:	1 week
2017-03-16 03:21:42 +00:00
Emmanuel Vadot
aa6b345634 Remove i915drm and radeondrm from NOTES and conf.
This unbreak LINT kernel.

Reported by:	lwhsu
2017-03-12 00:52:16 +00:00
Dmitry Chagin
ab60bc8488 Reduce code duplication between MD Linux code by moving SYSV IPC 64-bit
related struct definitions out into the MI path.

Invert the native ipc structs to the Linux ipc structs convesion logic.
Since 64-bit variant of ipc structs has more precision convert native ipc
structs to the 64-bit Linux ipc structs and then truncate 64-bit values
into the non 64-bit if needed. Unlike Linux, return EOVERFLOW if the
values do not fit.

Fix SYSV IPC for 64-bit Linuxulator which never sets IPC_64 bit.

MFC after:	1 month
2017-03-07 17:07:16 +00:00
Mahdi Mokhtari
881b1219aa Regenerated Linuxulator syscall tables for r314782
Approved by:	dchagin
MFC after:	1 month
2017-03-06 18:20:37 +00:00
Mahdi Mokhtari
8049c6bfb8 Add UNIMPLEMENTED() placeholder macro for
the syscalls that are not implemented in Linux kernel itself.
Cleanup DUMMY() macros.

Reviewed by:	dchagin, trasz
Approved by:	dchagin
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D9804
2017-03-06 18:11:38 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Konstantin Belousov
2e6e48fb59 Initialize pcb_save for thread0.
Otherwise kernel traps on NULL dereference if fpu_kern(9) is used from the
thread0 context.

Reported by:	cem
Reviewed by:	cem, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-02-28 22:54:52 +00:00
Gleb Smirnoff
efe3b0de14 Remove SVR4 (System V Release 4) binary compatibility support.
UNIX System V Release 4 is operating system released in 1988. It ceased
to exist in early 2000-s.
2017-02-28 05:14:42 +00:00
Dmitry Chagin
af68739567 Regen for r314312 (Linux epoll_pwait).
MFC after:	1 month
2017-02-26 19:59:28 +00:00
Dmitry Chagin
f8ae1bb64d Change Linux epoll_pwait syscall definition to match Linux actual one.
MFC after:	1 month
2017-02-26 19:57:18 +00:00
Alan Cox
0314966858 Refine the fix from r312954. Specifically, add a new PDE-only flag,
PG_PROMOTED, that indicates whether lingering 4KB page mappings might
need to be flushed on a PDE change that restricts or destroys a 2MB
page mapping.  This flag allows the pmap to avoid range invalidations
that are both unnecessary and costly.

Reviewed by:	kib, markj
MFC after:	6 weeks
Differential Revision:	https://reviews.freebsd.org/D9665
2017-02-26 19:54:02 +00:00
Dmitry Chagin
dd93b628e9 Implement timerfd family syscalls.
MFC after:	1 month
2017-02-26 09:48:18 +00:00
Dmitry Chagin
354aa2dd56 Regen after r314291 (timerfd definition).
MFC after:	1 month
2017-02-26 09:37:25 +00:00
Dmitry Chagin
1064d53fde Change Linuxulator timerfd syscalls definition to match actual Linux one.
MFC after:	1 month
2017-02-26 09:35:44 +00:00
Edward Tomasz Napierala
e801ac7852 Fix linux_fstatfs() to return proper value for f_frsize. Without it,
linux df(1) binary from Xenial shows garbage.

Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9692
2017-02-25 20:32:37 +00:00
Mahdi Mokhtari
bd911530b7 Add linux_preadv() and linux_pwritev() syscalls to Linuxulator.
Reviewed by:	dchagin
Approved by:	dchagin, trasz (src committers)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9722
2017-02-24 20:04:02 +00:00
Dmitry Chagin
8665c4d9cd Revert r314217. Commit is not match that I have approved. 2017-02-24 19:47:27 +00:00
Mahdi Mokhtari
21d23e3249 Add linux_preadv() and linux_pwritev() syscalls to Linuxulator.
Reviewed by:	dchagin
Approved by:	dchagin, trasz (src committers)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9722
2017-02-24 19:22:17 +00:00
Pedro F. Giffuni
e099b90b80 sys: Replace zero with NULL for pointers.
Found with:	devel/coccinelle
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D9694
2017-02-22 02:35:59 +00:00
Mark Johnston
a384a37df8 ddb show pte: use pmap of kdb_thread
show pte from the pmap of the process of the current DDB thread, instead
of necessarily the PCPU pmap.

Submitted by:	Ryan Libby <rlibby@gmail.com>
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D9645
2017-02-21 21:06:12 +00:00
Edward Tomasz Napierala
5a49cd0099 Reimplement linux_arch_prctl() as a wrapper around sysarch(2).
This also adds support for LINUX_ARCH_SET_GS.

Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9372
2017-02-20 16:13:40 +00:00
Alan Cox
4b4c84cfc3 In pmap_enter(), set the PG_MANAGED flag on the new PTE in one place,
rather two places, and do so before the pmap lock is acquired.

Submitted by:	Yufeng Zhou <yz70@rice.edu>
Reviewed by:	kib
MFC after:	1 week
2017-02-19 18:00:57 +00:00
Dmitry Chagin
486a06bdf0 Implement rt_tgsigqueueinfo system call used by glibc for pthread_sigqueue(3).
MFC after:	2 week
2017-02-19 07:38:11 +00:00
Konstantin Belousov
d9440197b4 Microoptimize amd64/pmap.c pmap_protect_pde().
For the loop that dirties vm_pages in case superpage was written to,
check the complete condition before the loop.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-02-19 03:33:20 +00:00
Jason A. Harmening
e2a8d17887 Bring back r313037, with fixes for mips:
Implement get_pcpu() for amd64/sparc64/mips/powerpc, and use it to
replace pcpu_find(curcpu) in MI code.

Reviewed by:	andreast, kan, lidl
Tested by:	lidl(mips, sparc64), andreast(powerpc)
Differential Revision:	https://reviews.freebsd.org/D9587
2017-02-19 02:03:09 +00:00
Konstantin Belousov
b1fa987835 Merge i386 and amd64 mtrr drivers.
Reviewed by:	royger, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D9648
2017-02-17 21:08:32 +00:00
Roger Pau Monné
43b00aeb88 x86: fix MTRR initialization if EARLY_AP_STARTUP is used
MTRR handlers are set in {amd64/i686}_mem_drvinit, which is called at
SI_SUB_DRIVERS, and that's too late when EARLY_AP_STARTUP is set because APs
have already started at this point. {amd64/i686}_mrinit is also called too late
for the BSP, since that happens when the memory device is attached, also after
APs have already started.

Move the position to SI_SUB_CPU, and also initialize the state for the BSP, so
that the APs can correctly get to the same state as the BSP.

Sponsored by:		Citrix Systems R&D
MFC after:		1 week
Reviewed by:		jhb, kib
Differential Revision:	https://reviews.freebsd.org/D9630
2017-02-17 12:47:51 +00:00
Edward Tomasz Napierala
d82de05480 Implement linux version of ptrace(2). It's nowhere near complete,
but it allows to use 64 bit linux strace(1) on 64 bit linux binaries.

Reviewed by:	dchagin (earlier version)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9406
2017-02-16 13:32:15 +00:00
Edward Tomasz Napierala
c6639ffe4e Regen after r313769.
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-02-15 14:25:50 +00:00
Edward Tomasz Napierala
4ac1825ce3 Fix definition of linux64 ptrace syscall.
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-02-15 14:12:39 +00:00
John Baldwin
bb9b710477 Regenerate all the system call tables to drop "created from" lines.
One of the ibcs2 files contains some actual changes (new headers) as
it hasn't been regenerated after older changes to makesyscalls.sh.
2017-02-10 19:45:02 +00:00
Eric Joyner
cb6b8299fd ixl(4): Update to 1.7.12-k
Refresh upstream driver before impending conversion to iflib.

Major new features:

- Support for Fortville-based 25G adapters
- Support for I2C reads/writes

(To prevent getting or sending corrupt data, you should set
dev.ixl.0.debug.disable_fw_link_management=1 when using I2C
[this will disable link!], then set it to 0 when done. The driver implements
the SIOCGI2C ioctl, so ifconfig -v works for reading I2C data,
but there are read_i2c and write_i2c sysctls under the .debug sysctl tree
[the latter being useful for upper page support in QSFP+]).

- Addition of an iWARP client interface (so the future iWARP driver for
  X722 devices can communicate with the base driver).
  - Compiling this option in is enabled by default, with "options IXL_IW" in
    GENERIC.

Differential Revision:	https://reviews.freebsd.org/D9227
Reviewed by:	sbruno
MFC after:	2 weeks
Sponsored by:	Intel Corporation
2017-02-10 01:04:11 +00:00
Dmitry Chagin
12bc0fb56f Regen after r313284.
MFC after:	2 week
2017-02-05 14:19:19 +00:00
Dmitry Chagin
8b756d40a7 Update syscall.master to 4.10-rc6. Also fix comments, a typo,
and wrong numbering for a few unimplemented syscalls.

For 32-bit Linuxulator, socketcall() syscall was historically
the entry point for the sockets API. Starting in Linux 4.3, direct
syscalls are provided for the sockets API. Enable it.

The initial version of patch was provided by trasz@ and extended by me.

Submitted by:	trasz
MFC after:	2 week
Differential Revision:	https://reviews.freebsd.org/D9381
2017-02-05 14:17:09 +00:00
Jason A. Harmening
ad62ba6e96 Revert r313037
The switch to get_pcpu() in MI code seems to cause hangs on MIPS.
Back out until we can get a better idea of what's happening there.

Reported by:	kan, lidl
2017-02-04 06:24:49 +00:00
Jason A. Harmening
65ed483615 Implement get_pcpu() for the remaining architectures and use it to
replace pcpu_find(curcpu) in MI code.
2017-02-01 03:32:49 +00:00
Edward Tomasz Napierala
ae6b6ef6cb Replace sys_ftruncate() with kern_ftruncate() in various compats.
Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9368
2017-01-30 11:50:54 +00:00
Konstantin Belousov
a0f64f38a1 Do not leave stale 4K TLB entries on pde (superpage) removal or
protection change.

On superpage promotion, x86 pmaps do not invalidate existing 4K
entries for the superpage range, because they are compatible with the
promoted 2/4M entry.  But the invalidation on superpage removal or
protection change only did single INVLPG with the base address of the
superpage.  This reliably flushed superpage TLB entry, and 4K entry
for the first page of the superpage, potentially leaving other 4K TLB
entries lingering.  Do the invalidation of the whole superpage range
to correct the problem.

Note that the precise invalidation is done by x86 code for kernel_pmap
only, for user pmaps whole (per-AS) TLB is flushed.  This made the bug
well hidden, because promotions of the kernel mappings require
specific load.

Reported and tested by:	Jonathan Looney <jtl@netflix.com> (previous version)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-29 19:14:48 +00:00
Baptiste Daroussin
b4b4b5304b Revert crap accidentally committed 2017-01-28 16:31:23 +00:00
Baptiste Daroussin
814aaaa7da Revert r312923 a better approach will be taken later 2017-01-28 16:30:14 +00:00
Tijl Coosemans
86e01d5add Apply r210555 to 64 bit linux support:
The interpreter name should no longer be treated as a buffer that can be
overwritten.

PR:		216346
MFC after:	3 days
2017-01-24 16:13:59 +00:00
Konstantin Belousov
5611aaa195 Use SFENCE for ordering CLFLUSHOPT.
SDM states that CLFLUSHOPT instructions can be ordered with other
writes by SFENCE, heavier MFENCE is not required.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-20 19:08:44 +00:00
Andriy Gapon
b4a5a4d0d9 vmm_dev: work around a bogus error with gcc 6.3.0
The error is:
vmm_dev.c: In function 'alloc_memseg':
vmm_dev.c:261:11: error: null argument where non-null required (argument 1) [-Werror=nonnull]

Apparently, the gcc is unable to figure out that if a ternary operator
produced a non-NULL value once, then the operator with exactly the same
operands would produce the same value again.

MFC after:	1 week
2017-01-20 13:21:27 +00:00
Ed Schouten
4423244072 Catch up with changes to structure member names.
Pointer/length pairs are now always named ${name} and ${name}_len.
2017-01-17 22:05:52 +00:00
Conrad Meyer
1d64db52f3 Fix a variety of cosmetic typos and misspellings
No functional change.

PR:		216096, 216097, 216098, 216101, 216102, 216106, 216109, 216110
Reported by:	Bulat <bltsrc at mail.ru>
Sponsored by:	Dell EMC Isilon
2017-01-15 18:00:45 +00:00
Mark Johnston
bd7abab0c9 Coalesce TLB shootdowns of global PTEs in pmap_advise() on x86.
We would previously invalidate such entries individually, resulting in more
IPIs than necessary.

Reviewed by:	alc, kib
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D9094
2017-01-10 21:52:48 +00:00