Commit Graph

8130 Commits

Author SHA1 Message Date
kib
3a13615ee3 Remove redundand CLD instructions.
We already clear %RFLAGS.DF on the kernel entry due to the compiler's
ABI requirements.

Suggested by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2018-01-11 13:22:13 +00:00
kib
2d65a3eac2 Do not clear %RFLAGS.DF on fast syscall entry.
Hardware already did it for us due to the mask loaded into the
MSR_SF_MASK msr register.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13838
2018-01-11 12:54:33 +00:00
kib
f00163c316 Move the hardware setup for fast syscalls into a common function.
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-01-11 12:40:43 +00:00
kib
8d33b830f9 Rename COMMON_TSS_RSP0 to TSS_RSP0.
The symbol is just an offset in the hardware TSS structure, it is not
limited to the common_tss instance.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-01-11 12:28:08 +00:00
kib
7429338dc0 Update comment explaining the check, to reality.
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-01-11 12:07:24 +00:00
cem
8899c9cd84 x86: Document purpose of _safe variants of {rd,wr}msr()
Sponsored by:	Dell EMC Isilon
2018-01-10 22:41:00 +00:00
avg
67ddd57974 vmm/svm: contigmalloc of the whole svm_softc is excessive
This is a followup to r307903.

struct svm_softc takes more than 200 kilobytes while what we really need
is 3 contiguous pages for I/O permission map and 2 contiguous pages for
MSR permission map.  Other physically mapped structures have a size of
a single page, so a proper alignment is sufficient for their correct
mapping.

Thus, only the permission maps are allocated with contigmalloc now,
the softc is allocated with a regular malloc.

Additionally, this commit adds a check that malloc returns memory with the
expected page alignment and that contigmalloc does not fail.
Unfortunately, at present svm_vminit() is expected to always succeed and
there is no way to report an error.
So, a contigmalloc failure leads to a panic.
We should probably fix this.

MFC after:	2 weeks
2018-01-09 14:22:18 +00:00
kib
dc8d51112c Make it possible to re-evaluate cpu_features.
Add cpuctl(4) ioctl CPUCTL_EVAL_CPU_FEATURES which forces re-read of
cpu_features, cpu_features2, cpu_stdext_features, and
std_stdext_features2.

The intent is to allow the kernel to see the changes in the CPU
features after micocode update.  Of course, the update is not atomic
across variables and not synchronized with readers.  See the man page
warning as well.

Reviewed by:	imp (previous version), jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13770
2018-01-05 21:06:19 +00:00
avg
85a273200a Fix a couple of comments in AMD Virtual Machine Control Block structure
MFC after:	1 week
2018-01-05 19:15:24 +00:00
kib
e478c0d2cf Avoid re-check of usermode condition.
It does not change anything in the behavior of trap_pfault(), while
eliminating obfuscation of jumping to the code which checks for the
condition reversed of the goto cause.  Also avoid force initialize the
rv variable, since it is now only accessed after storing vm_fault()
return value.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13725
2018-01-01 20:47:03 +00:00
kib
10fbef9845 Remove MP SAFE marks and stray register name in comments.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-12-31 17:07:59 +00:00
cperciva
0858f13002 Use the TSLOG framework to record entry/exit timestamps for hammer_time.
The entry must be logged "manually" using TSRAW rather than TSENTER
since PCPU data structures have not yet been initialized and thus
curthread cannot be accessed; &thread0 is what will become curthread
later in hammer_time.

Other MD initialization code should be similarly instrumented in order
to gain visibility into the time spent before entering mi_startup; this
will require some care and testing from people with access to such
hardware.
2017-12-31 09:22:07 +00:00
eadler
421a929b1e kernel: Fix several typos and minor errors
- duplicate words
- typos
- references to old versions of FreeBSD

Reviewed by:	imp, benno
2017-12-27 03:23:21 +00:00
tychon
02cc877968 Recognize a pending virtual interrupt while emulating the halt instruction.
Reviewed by:	grehan, rgrimes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13573
2017-12-21 18:30:11 +00:00
kib
3d18a9d66f Add atomic_load(9) and atomic_store(9) operations.
They provide relaxed-ordered atomic access semantic.  Due to the
FreeBSD memory model, the operations are syntaxical wrappers around
the volatile accesses.  The volatile qualifier is used to ensure that
the access not optimized out and in turn depends on the volatile
semantic as implemented by supported compilers.

The motivation for adding the operation is to help people coming from
other systems or knowing the C11/C++ standards where atomics have
special type and require use of the special access operations.  It is
still the case that FreeBSD requires plain load and stores of aligned
integer types to be atomic.

Suggested by:	jhb
Reviewed by:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13534
2017-12-19 09:59:20 +00:00
markj
b0b9b4fcf4 Pass the trap frame to fasttrap hooks.
The DTrace fasttrap entry points expect a struct reg containing the
register values of the calling thread. Perform the conversion in
fasttrap rather than in the trap handler: this reduces the number of
ifdefs and avoids wasting stack space for traps that don't involve
DTrace.

MFC after:	2 weeks
2017-12-11 19:21:39 +00:00
bde
d81660feeb Move instantiation of msgbufp from 9 MD files to subr_prf.c.
This variable should be pure MI except possibly for reading it in MD
dump routines.  Its initialization was pure MD in 4.4BSD, but FreeBSD
changed this in r36441 in 1998.  There were many imperfections in
r36441.  This commit fixes only a small one, to simplify fixing the
others 1 arch at a time.  (r47678 added support for
special/early/multiple message buffer initialization which I want in
a more general form, but this was too fragile to use because hacking
on the msgbufp global corrupted it, and was only used for 5 hours in
-current...)
2017-12-07 07:55:38 +00:00
avg
209d01f38b amd-vi: set iommu msi configuration using pci_enable_msi method
This is better than directly changing PCI configuration space of the
device because it makes the PCI bus aware of the configuration.
Also, the change allows to drop a bunch of code that duplicated
pci_enable_msi() functionality.

I wonder if it's possible to further simplify the code by using
pci_alloc_msi().
2017-12-04 17:10:52 +00:00
avg
057e143faa vmm/amd: add ivhd device with a higher order
ivhd should attach after the root PCI bus and, thus, after the ACPI
Host-PCI bridge off which the bus hangs.  This is because ivhd changes
PCI configuration of a PCI IOMMU device that is located on the root bus.
If the bus attaches after ivhd it clears the MSI portion of the
configuration.  As a result IOMMU event interrupts would never be
delivered.

For regular ACPI devices the order is calculated as
    ACPI_DEV_BASE_ORDER + level * 10
where level is a depth of the device in the ACPI namespace.
I expect the depth of the Host-PCI bridge to be two or three,
so ACPI_DEV_BASE_ORDER + 10 * 10 should be a sufficiently safe order
for ivhd.

This should fix the setup of the AMD-Vi event interrupt when vmm is
preloaded (as opposed to kldload-ed).
2017-12-04 17:08:03 +00:00
avg
a4b48b00e7 amd-vi: clear event interrupt and overflow bits upon handling the interrupt
This ensures that we can receive further event interrupts.
See the description of the bits in the specification for
MMIO Offset 2020h IOMMU Status Register.
The bits are defined as set-by-hardware write-1-to-clear, same as all
the bits in the status register.

Discussed with:	anish
2017-12-04 17:02:53 +00:00
scottl
49fb5dd79b It's time to retire AHC_REG_PRETTY_PRINT and AHD_REG_PRETTY_PRINT from
the standard kernels.  They are still available as custom compile
options.
2017-11-29 23:41:49 +00:00
brooks
c6fbed1a3a Disable vim syntax highlighting.
Vim's default pick doesn't understand that ';' is a comment character
and the result looks horrible.

Reviewed by:	emaste
2017-11-28 18:23:17 +00:00
kib
dbbac96e0d Fix index calculation for the page table pages for efirt 1:1 map.
Stop issuing pre-assigned number to enumerate all page table pages,
the assignment is incorrect.  Instead automatically calculate the next
unused index. This index in fact does not serve any purpose except to
be unique to satisfy vm_page_grab() interface, we do not look up the
page by the index later.

Reported and tested by:	emaste
Reviewed by:	andrew
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
PR:	223906
Differential revision:	https://reviews.freebsd.org/D13273
2017-11-28 09:34:43 +00:00
fsu
24e4690114 Remap ENOATTR to ENODATA in the linuxulator.
In the linux ENOADATA is frequently #defined as ENOATTR.
The change is required for an xattrs support implementation.

MFC after: 1 week
Discussed with: netchild
Approved by: pfg

Differential Revision: https://reviews.freebsd.org/D13221
2017-11-27 17:03:11 +00:00
pfg
d754734f5c sys/amd64: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:03:07 +00:00
ed
d13a2ec254 Use TO_PTR() to convert integers to pointers.
For FreeBSD/arm64's cloudabi32 support, I'm going to need a TO_PTR() in
this place. Also use it for all of the other source files, so that the
difference remains as minimal as possible.

MFC after:	2 weeks
2017-11-26 14:45:56 +00:00
hselasky
091ce9badd Merge ^/head r326132 through r326161. 2017-11-24 12:13:27 +00:00
hselasky
7b5126003a Merge ^/head r325999 through r326131. 2017-11-23 14:28:14 +00:00
kib
873f304292 Remove lint support from system headers and MD x86 headers.
Reviewed by:	dim, jhb
Discussed with:	imp
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13156
2017-11-23 11:40:16 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
hselasky
c6f05b2594 Merge ^/head r325842 through r325998. 2017-11-19 12:36:03 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
hselasky
7f64b39d2d Merge ^/head r325663 through r325841. 2017-11-15 11:28:11 +00:00
hselasky
b909dfecb7 Remove no longer supported mthca driver.
Sponsored by:	Mellanox Technologies
2017-11-13 10:59:38 +00:00
mjg
3552994d2f amd64: stop nesting preemption counter in spinlock_enter
Discussed with:	jhb
2017-11-12 03:13:01 +00:00
jeff
3c355d849c Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated.  This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons.  This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by:	alc, kib, markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2017-11-08 02:39:37 +00:00
kib
d8557dc01f Zero the structure instead of the pointer to it.
Reported by:	Don Morris <Don.Morris@dell.com>
MFC after:	4 days
2017-11-05 20:03:57 +00:00
kib
8113f89b71 x86: Do not emit unused TD_TID symbols.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-11-04 10:51:52 +00:00
kib
e5cf015156 Restore an optimization that was temporary disabled by r324665.
In reclaim_pv_chunk(), rotate the pv chunks list so that next
invocations of the reclaim do not scan the same pv chunks that could
not be freed.  Only do the rotation when there is no parallel scan,
tracked by active_reclaims counter.

To rotate, move all chunks that are before current iteration marker,
after another marker that is inserted at the list tail on start of the
reclaim.

Reviewed by:	alc
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-11-01 18:06:44 +00:00
kib
a6dcbd1557 Consistently ensure that we do not load MXCSR with reserved bits set.
Some callers of fpusetregs()/npxsetregs(), most importantly
set_fpcontext(), clear reserved bits.  But some did not.  Do the
clearing in fpusetregs() and remove now redundand operation from
set_fpcontext().

Reported by:	Maxime Villard <max@m00nbsd.net>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-11-01 10:32:44 +00:00
grehan
02db9e4f6d Emulate the "OR reg, r/m" instruction (opcode 0BH).
This is	needed for the HDA emulation with FreeBSD guests.

Reviewed by:	marcelo
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D12832
2017-11-01 03:26:53 +00:00
tijl
173e2ebded Set the return address for stack entry points to zero.
Stack unwinders treat zero as a stop condition.  The value on the stack can
be non-zero because thread stacks may be arbitrary memory provided via
pthread_attr_setstack(3) or may be recycled from previous threads.

Reference:
https://lists.freebsd.org/pipermail/freebsd-current/2017-August/066855.html
https://lists.freebsd.org/pipermail/freebsd-current/2017-October/067254.html

Discussed with:	kib
MFC after:	1 week
2017-10-31 11:51:34 +00:00
ian
ae3d035140 Improve the performance of the hpet timer in bhyve guests by making the
timer frequency a power of two.  This changes the frequency from 10 to
16.7 MHz (2 ^ 24 HZ).  Using a power of two avoids roundoff errors when
doing arithmetic in sbintime_t units.

Testing shows this can fix erratic ntpd behavior in guests using the
hpet timer (which is the default for multicore guests).

Reported by:	bsam@
2017-10-29 20:50:03 +00:00
eadler
45275e3a26 Update several more URLs
- Primarily http -> https
- Primarily FreeBSD project URLs
2017-10-29 08:17:03 +00:00
jhb
7db3e0e96c Rework pass through changes in r305485 to be safer.
Specifically, devices that do not support PCI-e FLR and were not
gracefully shutdown by the guest OS could continue to issue DMA
requests after the VM was terminated.  The changes in r305485 meant
that those DMA requests were completed against the host's memory which
could result in random memory corruption.  Instead, leave ppt devices
that are not attached to a VM disabled in the IOMMU and only restore
the devices to the host domain if the ppt(4) driver is detached from a
device.

As an added safety belt, disable busmastering for a pass-through device
when before adding it to the host domain during ppt(4) detach.

PR:		222937
Tested by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	grehan
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12661
2017-10-27 14:57:14 +00:00
markj
1588800df4 Fix the VM_NRESERVLEVEL == 0 build.
Add VM_NRESERVLEVEL guards in the pmaps that implement transparent
superpage promotion using reservations.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12764
2017-10-23 15:34:05 +00:00
mjg
ca37b93652 Make the sleepq chain hash size configurable per-arch and bump on amd64.
While here cache-align chains.

This shortens longest found chain during poudriere -j 80 from 32 to 16.

Pushing this higher up will probably require allocation on boot.
2017-10-22 20:43:50 +00:00
bz
48b1992757 With r181803 on 2008-08-17 23:27:27Z the first VIMAGE commit went into
HEAD.  Enable VIMAGE in GENERIC kernels and some others (where GENERIC does
not exist) on HEAD.

Disable building LINT-VIMAGE with VIMAGE being default.

This should give it a lot more exposure in the run-up to 12 to help
us evaluate whether to keep it on by default or not.
We are also hoping to get better performance testing.
The feature can be disabled using nooptions.

Requested by:		many
Reviewed by:		kristof, emaste, hiren
X-MFC after:		never
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D12639
2017-10-20 21:40:59 +00:00
mjg
397cc565b8 amd64: plug missed dt_lock in cpu_fork 2017-10-20 18:58:11 +00:00
mjg
86705ae197 amd64: __exclusive_cache_line pv_chunks_mutex and pv_list_locks
Note that pv_list_locks is an array and currently it fits 2 locks per line.
Resizing it and/or putting more locks in different lines requires several tests.

MFC after:	1 week
2017-10-20 03:38:58 +00:00
mjg
c26a9dfc26 amd64: avoid acquiring dt lock if possible (which is the common case)
Discussed with:	kib
MFC after:	1 week
2017-10-20 03:30:02 +00:00
markj
c87fb69add Move kernel dump offset tracking into MI code.
All of the kernel dump implementations keep track of the current offset
("dumplo") within the dump device. However, except for textdumps, they
all write the dump sequentially, so we can reduce code duplication by
having the MI code keep track of the current offset. The new
dump_append() API can be used to write at the current offset.

This is needed to implement support for kernel dump compression in the
MI kernel dump code.

Also simplify dump_encrypted_write() somewhat: use dump_write() instead
of duplicating its bounds checks, and get rid of the redundant offset
tracking.

Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11722
2017-10-18 15:38:05 +00:00
kib
95ee5810e6 Fix the pv_chunks pc_lru tailq handling in reclaim_pv_chunk().
For processing, reclaim_pv_chunk() removes the pv_chunk from the lru
list, which makes pc_lru linkage invalid.  Then the pmap lock is
released, which allows for other thread to free the last pv entry
allocated from the chunk and call free_pv_chunk(), which tries to
modify the invalid linkage.

Similarly, the chunk is inserted into the private tailq new_tail
temporary.  Again, free_pv_chunk() might be run and corrupt the
linkage for the new_tail after the pmap lock is dropped.

This is a consequence of r299788 elimination of pvh_global_lock, which
allowed for reclaim to run in parallel with other pmap calls which
free pv chunks.

As a fix, do not remove the chunk from pc_lru queue, use a marker to
remember the position in the queue iteration.  We can safely operate
on the chunks after the chunk's pmap is locked, we fetched the chunk
after the marker, and we checked that chunk pmap is same as we have
locked, because chunk removal from pc_lru requires both pv_chunk_mutex
and the pmap mutex owned.

Note that the fix lost an optimization which was present in the
previous algorithm.  Namely, new_tail requeueing rotated the pv chunks
list so that reclaim didn't scan the same pv chunks that couldn't be
freed (because they contained a wired and/or superpage mapping) on
every invocation.  An additional change is planned which would improve
this.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-16 15:16:24 +00:00
kib
8bde9fde3b Change amd64_get_ldt() to return 'EOF' when the LDT is not yet
allocated, when requested range of descriptors does not fit into
currently allocated LDT, or trim the return if the range fits
partially.  Before, the function returned EINVAL.

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-09 16:20:39 +00:00
mjg
ea8bf70599 amd64: remove unused variable from pmap_delayed_invl_genp
Reported by:	gcc
MFC after:	1 week
2017-10-05 18:51:48 +00:00
kib
6a6f4a405d Ensure that after sucessfull i386_set_ldt() call, other threads can
use LDT segments immediately.

If the i386_set_ldt() call created a first LDT descriptor (and
consequently created the LDT) for our address space, LDTR is currently
loaded only on the CPU executing the syscall.  Other CPUs executing
threads sharing the address space, would only load LDTR after context
switch.

Uncomment set_user_ldt_rv() and call it on all CPUs.  Remove critical
section inside set_user_ldt(), it is not needed in the context of call
from smp_rendezvous().

Set md_ldt after md_ldt_sd is initialized using the same code sequence
as in user_ldt_free().  Do the whole initialization in a critical
section, to not race with the context switching while we set LDT.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 13:12:59 +00:00
kib
656af25b35 Avoid a race betweem freeing LDT and context switches.
cpu_switch.S uses curproc->p_md.md_ldt value as the flag indicating
presence of the process LDT.  The flag is checked and then ldt segment
descriptor is copied into the CPU' GDT slot.

Disallow context switches around clearing of the curproc LDT state by
performing the cleanup in critical section.  Ensure that the md_ldt
flag is cleared before md_ldt_sd descriptor content is destroyed by
inserting fence between the operations.

We depend on the x86 memory model strong ordering guarantees, in
particular, that cpu_switch.S observes the writes to md_ldt and
md_ldt_sd in the expected order.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:50:03 +00:00
kib
4f3ec8ce9a Improve amd64_get_ldt().
Provide consistent snapshot of the requested descriptors by preventing
other threads from modifying LDT while we fetch the data, lock dt_lock
around the read.  Copy the data into intermediate buffer, which is
copied out after the lock is dropped.

Use guaranteed atomic (aligned volatile) reads of the descriptors to
use same-size atomic as CPU update to set A bit in the descriptor type
field.

Improve overflow checking for the descriptors range calculations and
remove unneeded casts.

Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:29:34 +00:00
kib
a71964674c Minor style fix.
Requested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:19:55 +00:00
kib
00ef4a21e2 Complete r323772 on amd64.
Compilers are allowed to combine plain reads into group operations,
e.g. 64bit element copies of one array into another can be
legitimately optimized back to a memcpy() call, which r323772 tried to
prevent.

Qualify accesses to LDT descriptors with volatile dereference to
ensure that each write indeed occurs.  After that, our usual claim of
native-size aligned writes being atomic applies.

This is equivalent to atomic_store(memory_order_relaxed) C11 accesses,
but our machine/atomic.h does not provide corresponding primitive.

Noted and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:16:45 +00:00
kib
3684a76fa7 Use ANSI C declaration for amd64_get_ldt().
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:07:38 +00:00
kib
08f33cdf7c Correct format specifiers in the debug code.
Requested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 12:01:39 +00:00
kib
c73f51f840 Remove useless comments.
Requested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 11:56:04 +00:00
kib
fdbe2e91b7 On amd64, mark the set_user_ldt() function as static.
On i386, the function is used from the context switch code and needs
to be accessible externally.  Amd64 MD context switch does not lock an
LDT spinlock and inlines switching in assembly.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 11:50:01 +00:00
kib
404978fe18 Reduce default max_ldt_segment value to 512.
This makes the LDT to use only one page with default settings,
avoiding the need to find contigous 2 pages in KVA.  It seems that
most users are fine even with 512 segments.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-05 11:36:55 +00:00
kib
cfbf7b56f3 Update comment to note that we skip LDT reload for kthreads as well.
Noted by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-10-05 11:34:51 +00:00
kib
b20e21edec Hide kernel stuff from userspace.
Sponsored by:	Mellanox Technologies
2017-10-02 08:37:43 +00:00
andrew
f448611e08 To prepare for adding EFI runtime services support on arm64 move the
machine independent parts of the existing code to a new file that can be
shared between amd64 and arm64.

Reviewed by:	kib (previous version), imp
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12434
2017-10-01 19:52:47 +00:00
kib
466bfe2553 Do not do torn writes to active LDTs.
Care must be taken when updating the active LDT, since parallel
threads might try to load a segment descriptor which is currently
updated. Since the results are undefined, this cannot be ignored by
claiming to be an application race.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12413
2017-09-19 17:57:04 +00:00
kibab
241835d4e9 Add MMCCAM-enabled kernel config for IMX6, reduce debug noice in MMCCAM kernels
CAM_DEBUG_TRACE results in way too much debug output than needed now.
When debugging, it's always possible to turn on trace level using camcontrol.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D12110
2017-09-13 10:56:02 +00:00
cem
3bf35bb3c4 Add smn(4) driver for AMD System Management Network
AMD Family 17h CPUs have an internal network used to communicate between
the host CPU and the PSP and SMU coprocessors.  It exposes a simple
32-bit register space.

Reviewed by:	avg (no +1), mjoras, truckman
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12217
2017-09-05 15:13:41 +00:00
jpaetzel
bfd734f77c Revert r323087
This needs more thinking out and consensus, and the commit message
was wrong AND there was a typo in the commit.

pointyhat:	jpaetzel
2017-09-01 17:03:48 +00:00
jpaetzel
612bb8539d Take options IPSEC out of GENERIC
PR:	220170
Submitted by:	delphij
Reviewed by:	ae, glebius
MFC after:	2 weeks
Differential Revision:	D11806
2017-09-01 15:54:53 +00:00
jpaetzel
f7739d7e09 Allow kldload tcpmd5
PR:	220170
MFC after:	2 weeks
2017-08-31 20:16:28 +00:00
mav
5849e8f575 Add NTB driver for PLX/Avago/Broadcom PCIe switches.
This driver supports both NTB-to-NTB and NTB-to-Root Port modes (though
the second with predictable complications on hot-plug and reboot events).
I tested it with PEX 8717 and PEX 8733 chips, but expect it should work
with many other compatible ones too.  It supports up to two NT bridges
per chip, each of which can have up to 2 64-bit or 4 32-bit memory windows,
6 or 12 scratchpad registers and 16 doorbells.  There are also 4 DMA engines
in those chips, but they are not yet supported.

While there, rename Intel NTB driver from generic ntb_hw(4) to more specific
ntb_hw_intel(4), so now it is on par with this new ntb_hw_plx(4) driver and
alike to Linux naming.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2017-08-30 21:16:32 +00:00
cem
6659d8ab68 Drop CACHE_LINE_SIZE to 64 bytes on x86
The actual cache line size has always been 64 bytes.

The 128 number arose as an optimization for Core 2 era Intel processors.  By
default (configurable in BIOS), these CPUs would prefetch adjacent cache
lines unintelligently.  Newer CPUs prefetch more intelligently.

The latest Core 2 era CPU was introduced in September 2008 (Xeon 7400
series, "Dunnington").  If you are still using one of these CPUs, especially
in a multi-socket configuration, consider locating the "adjacent cache line
prefetch" option in BIOS and disabling it.

Reported by:	mjg
Reviewed by:	np
Discussed with:	jhb
Sponsored by:	Dell EMC Isilon
2017-08-28 22:28:41 +00:00
rlibby
0cabf7f260 amd64: drop q suffix from rd[fg]sbase for gas compatibility
Reviewed by:	kib
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12133
2017-08-26 23:13:18 +00:00
kib
dd1b856d37 Save KGSBASE in pcb before overriding it with the guest value.
Reported by:	lwhsu, mjoras
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	18 days
2017-08-24 10:49:53 +00:00
kib
7bf099a0ef Ensure that fs/gs bases are stored in pcb before copying the pcb for
new process or thread.

Reported and tested by:	ae, dhw
Sponsored by:	The FreeBSD Foundation
MFC after:	20 days
2017-08-22 18:15:47 +00:00
kib
f495f3ebd8 Make WRFSBASE and WRGSBASE instructions functional.
Right now, we enable the CR4.FSGSBASE bit on CPUs which support the
facility (Ivy and later), to allow usermode to read fs and gs bases
without syscalls. This bit also controls the write access to bases
from userspace, but WRFSBASE and WRGSBASE instructions currently
cannot be used, because return path from both exceptions or interrupts
overrides bases with the values from pcb.

Supporting the instructions is useful because this means that usermode
can implement green-threads completely in userspace without issuing
syscalls to change all of the machine context.

Support is implemented by saving the fs base and user gs base when
PCB_FULL_IRET flag is set. The flag is set on the context switch,
which potentially causes clobber of the bases due to activation of
another context, and when explicit modification of the user context by
a syscall or exception handler is performed. In particular, the patch
moves setting of the flag before syscalls change context.

The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on
entry from userspace can be considered a bug fixes on its own.

Reviewed by:	jhb (previous version)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D12023
2017-08-21 17:38:02 +00:00
kib
ebad4d0743 Simplify the code.
Noted by:	Oliver Pinter
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-20 11:18:16 +00:00
kib
2dd98dac16 Simplify amd64 trap().
- Use more relevant name 'signo' instead of 'i' for the local variable
  which contains a signal number to send for the current exception.
- Eliminate two labels 'userout' and 'out' which point to the very end
  of the trap() function.  Instead use return directly.
- Re-indent the prot_fault_translation block by reducing if() nesting.
- Some more monor style changes.

Requested and reviewed by:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-20 09:52:25 +00:00
kib
42ae7ee929 Trim excessive 'extern' and remove unused declaration.
Reviewed by:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-20 09:42:09 +00:00
kib
9c588ebbf6 Use ANSI C declaration for trap_pfault(). Style.
Reviewed by:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-20 09:39:10 +00:00
br
11ea615d8d Fix module unload when SGX support is not present in CPU.
Sponsored by:	DARPA, AFRL
2017-08-18 14:47:06 +00:00
markj
ce8e2801bf Rename mkdumpheader() and group EKCD functions in kern_shutdown.c.
This helps simplify the code in kern_shutdown.c and reduces the number
of globally visible functions.

No functional change intended.

Reviewed by:	cem, def
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11603
2017-08-18 04:04:09 +00:00
markj
f6dd3eb223 Factor out duplicated kernel dump code into dump_{start,finish}().
dump_start() and dump_finish() are responsible for writing kernel dump
headers, optionally writing the key when encryption is enabled, and
initializing the initial offset into the dump device.

Also remove the unused dump_pad(), and make some functions static now that
they're only called from kern_shutdown.c.

No functional change intended.

Reviewed by:	cem, def
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11584
2017-08-18 03:52:35 +00:00
cem
5a79b729aa x86: Add dynamic interrupt rebalancing
Add an option to dynamically rebalance interrupts across cores
(hw.intrbalance); off by default.

The goal is to minimize preemption. By placing interrupt sources on distinct
CPUs, ithreads get preferentially scheduled on distinct CPUs.  Overall
preemption is reduced and latency is reduced. In our workflow it reduced
"fighting" between two high-frequency interrupt sources.  Reduced latency
was proven by, e.g., SPEC2008.

Submitted by:	jeff@ (earlier version)
Reviewed by:	kib@
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D10435
2017-08-16 18:48:53 +00:00
br
4b4a278df8 Rename macro DEBUG to SGX_DEBUG.
This fixes LINT kernel build.

Reported by:	lwhsu
Sponsored by:	DARPA, AFRL
2017-08-16 13:44:46 +00:00
br
c51ad5f03a Add support for Intel Software Guard Extensions (Intel SGX).
Intel SGX allows to manage isolated compartments "Enclaves" in user VA
space. Enclaves memory is part of processor reserved memory (PRM) and
always encrypted. This allows to protect user application code and data
from upper privilege levels including OS kernel.

This includes SGX driver and optional linux ioctl compatibility layer.
Intel SGX SDK for FreeBSD is also available.

Note this requires support from hardware (available since late Intel
Skylake CPUs).

Many thanks to Robert Watson for support and Konstantin Belousov
for code review.

Project wiki: https://wiki.freebsd.org/Intel_SGX.

Reviewed by:	kib
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11113
2017-08-16 10:38:06 +00:00
kib
50c6ae0cb9 Print whole machine state on double fault.
It is quite useful when double fault is not caused by a stack overflow.

Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-14 11:23:07 +00:00
kib
8353aff0e8 Add {rd,wr}{fs,gs}base C wrappers for instructions.
Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-14 11:20:54 +00:00
kib
7df8e8c06e Style.
Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-08-14 11:20:10 +00:00
jkim
d871b34dbc Split identify_cpu() into two functions for amd64 as we do for i386. This
reduces diff between amd64 and i386.  Also, it fixes a regression introduced
in r322076, i.e., identify_hypervisor() failed to identify some hypervisors.
This function assumes cpu_feature2 is already initialized.

Reported by:	dexuan
Tested by:	dexuan
2017-08-09 18:09:09 +00:00
imp
3c9b264a76 Fail to open efirt device when no EFI on system.
libefivar expects opening /dev/efi to indicate if the we can make efi
runtime calls. With a null routine, it was always succeeding leading
efi_variables_supported() to return the wrong value. Only succeed if
we have an efi_runtime table. Also, while I'm hear, out of an
abundance of caution, add a likely redundant check to make sure
efi_systbl is not NULL before dereferencing it. I know it can't be
NULL if efi_cfgtbl is non-NULL, but the compiler doesn't.
2017-08-08 20:44:16 +00:00
kib
e8a9dbdea0 Avoid DI recursion when reclaim_pv_chunk() is called from
pmap_advise() or pmap_remove().

Reported and tested by:	pho (previous version)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-07 17:29:54 +00:00
kib
fdfd6e6d57 Explain why delayed invalidation is not required in pmap_protect() and
pmap_remove_pages().

Submitted by:	alc
MFC after:	1 week
2017-08-07 17:23:10 +00:00
jkim
0ac013123c Detect hypervisors early. We used to set lower hz on hypervisors by default
but it was broken since r273800 (and r278522, its MFC to stable/10) because
identify_cpu() is called too late, i.e., after init_param1().

MFC after:	3 days
2017-08-05 06:56:46 +00:00
cem
d4a7946bbd x86: Tag some intrinsics with __pure2
Some C wrappers for x86 instructions do not touch global memory and only act
on their arguments; they can be marked __pure2, aka __const__.  Without this
annotation, Clang 3.9.1 is not intelligent enough on its own to grok that
these functions are __const__.

Submitted by:	Anton Rang <anton.rang AT isilon.com>
Sponsored by:	Dell EMC Isilon
2017-08-03 22:28:30 +00:00
ed
28a0fcb4f0 Keep top page on CloudABI to work around AMD Ryzen stability issues.
Similar to r321899, reduce sv_maxuser by one page inside of CloudABI.
This ensures that the stack, the vDSO and any allocations cannot touch
the top page of user virtual memory.

Considering that CloudABI userspace is completely oblivious to virtual
memory layout, don't bother making this conditional based on the CPU of
the running system.

Reviewed by:	kib, truckman
Differential Revision:	https://reviews.freebsd.org/D11808
2017-08-02 13:08:10 +00:00